Why I Built This

I've been working as a data engineer for the last 8 months with BigQuery as our main tech stack. It didn't take long to realize that the BigQuery web UI is... challenging to work with efficiently.

Browsing scheduled queries, navigating between pages - it all takes forever. Since I spend most of my time in Cursor anyway, I started exploring BigQuery extensions:

I used it for a while but still had to open the BigQuery web UI for one key thing: checking scheduled queries. I need to monitor scheduled queries regularly for improvements and bug fixes.

Unfortunately, I couldn't find any extension that handled scheduled queries well. So instead of reinventing the wheel, I forked minodisk's extension and added the missing piece

What I Built

Prerequisites

Before diving into the code, make sure your service account or principal has the right permissions:

Scheduled Query Tracking in VS Code

The core feature: see your scheduled queries and their execution history without leaving your editor.

Step 1: First, I needed to list all scheduled queries for a project:

// Simple: Get all scheduled queries
async function listScheduledQueries(projectId: string, region: string) {
  const client = new DataTransferServiceClient();
  const parent = `projects/${projectId}/locations/${region}`;
  
  const [configs] = await client.listTransferConfigs({
    parent,
    dataSourceIds: ['scheduled_query']  // Only get scheduled queries
  });
  
  return configs;
}

Step 2: Then get the execution history for each scheduled query:

// Get recent runs for a scheduled query
async function getRunHistory(configName: string, maxResults = 5) {
  const [runs] = await client.listTransferRuns({
    parent: configName,
    states: ['SUCCEEDED', 'FAILED', 'CANCELLED'],
    pageSize: maxResults
  });
  
  return runs;
}

Step 3: The tricky part - correlating transfer runs with actual BigQuery jobs:

// Complex: Find the BigQuery job for each scheduled query run
async function getBigQueryJobDetails(projectId: string, runId: string) {
  const bigquery = new BigQuery({ projectId });
  const jobId = `scheduled_query_${runId}`;  // BigQuery's naming pattern
  
  try {
    const [job] = await bigquery.job(jobId).get();
    const [metadata] = await job.getMetadata();
    
    return {
      jobId: job.id,
      status: metadata.status,
      statistics: metadata.statistics,
      configuration: metadata.configuration
    };
  } catch (error) {
    // Job not found - this happens sometimes
    return null;
  }
}

Step 4: Finally, extract useful information from the job details:

// Extract query and destination info
if (jobDetails?.configuration?.query) {
  const query = jobDetails.configuration.query.query;
  const stats = jobDetails.statistics;
  
  console.log(`Query: ${query.substring(0, 100)}...`);
  console.log(`Bytes processed: ${stats.totalBytesProcessed}`);
  
  // Calculate duration (BigQuery timestamps are Unix timestamps in milliseconds as strings)
  if (stats.endTime && stats.startTime) {
    const duration = Number(stats.endTime) - Number(stats.startTime);
    console.log(`Duration: ${(duration / 1000).toFixed(2)}s`);
  }
  
  // Get destination table if available
  if (jobDetails.configuration.query.destinationTable) {
    const table = jobDetails.configuration.query.destinationTable;
    console.log(`Destination: ${table.projectId}.${table.datasetId}.${table.tableId}`);
  }
}

๐Ÿ’ก Duration Calculation Note: BigQuery returns timestamps as Unix timestamps in milliseconds formatted as strings (e.g., "1749989041841"). That's why we use Number() to convert them before calculating the difference. This was one of those "fun" API quirks I discovered while building this!

Now I can quickly check if my data pipelines are running smoothly without context switching to the BigQuery console! ๐ŸŽ‰

Sample Output

Here's what the code produces when you run it:

๐Ÿ“‹ Found 2 scheduled queries

   [1] shakespeare_test_top_1
       Schedule: every day 12:04
       State: SUCCEEDED

๐Ÿ“Š Found 36 recent runs

๐Ÿ” Correlating with BigQuery jobs...

   Run 1:
   - State: SUCCEEDED
   - Start Time: 2024-01-15T12:04:00.000Z
   - BigQuery Job ID: scheduled_query_68663782-0000-2154-9748-34c7e91daa93
   - Query: select * from `testDataset.shakespeare_test_1` order by 1 desc limit 1...
   - Bytes processed: 335
   - Duration: 1.59s
   - Destination: my-project.testDataset.shakespeare_top_1

What's Next

Planning to add ability to create scheduled queries from the extension.

In Cursor, there is no way of communicating the extentions with the assistant, but I would like to find a way to implement a system that automaticly adds table schemas to Cursor's knowledge base.


Forked from minodisk/bigquery-runner because it already works great. Check it out!

Check the extension at VS Code Marketplace

Check the code at GitHub