Why I Built This
I've been working as a data engineer for the last 8 months with BigQuery as our main tech stack. It didn't take long to realize that the BigQuery web UI is... challenging to work with efficiently.
Browsing scheduled queries, navigating between pages - it all takes forever. Since I spend most of my time in Cursor anyway, I started exploring BigQuery extensions:
- First one I tried could only run queries โ lacked resource exploration and dry run functionality (which is a must!)
- Then found minodisk's extension which was much better - had resource searching and dry run, making BigQuery work way easier
I used it for a while but still had to open the BigQuery web UI for one key thing: checking scheduled queries. I need to monitor scheduled queries regularly for improvements and bug fixes.
Unfortunately, I couldn't find any extension that handled scheduled queries well. So instead of reinventing the wheel, I forked minodisk's extension and added the missing piece
What I Built
Prerequisites
Before diving into the code, make sure your service account or principal has the right permissions:
bigquery.transfers.get
andbigquery.transfers.list
- To access scheduled query configs and runsbigquery.jobs.get
- To fetch BigQuery job details
Scheduled Query Tracking in VS Code
The core feature: see your scheduled queries and their execution history without leaving your editor.
Step 1: First, I needed to list all scheduled queries for a project:
// Simple: Get all scheduled queries
async function listScheduledQueries(projectId: string, region: string) {
const client = new DataTransferServiceClient();
const parent = `projects/${projectId}/locations/${region}`;
const [configs] = await client.listTransferConfigs({
parent,
dataSourceIds: ['scheduled_query'] // Only get scheduled queries
});
return configs;
}
Step 2: Then get the execution history for each scheduled query:
// Get recent runs for a scheduled query
async function getRunHistory(configName: string, maxResults = 5) {
const [runs] = await client.listTransferRuns({
parent: configName,
states: ['SUCCEEDED', 'FAILED', 'CANCELLED'],
pageSize: maxResults
});
return runs;
}
Step 3: The tricky part - correlating transfer runs with actual BigQuery jobs:
// Complex: Find the BigQuery job for each scheduled query run
async function getBigQueryJobDetails(projectId: string, runId: string) {
const bigquery = new BigQuery({ projectId });
const jobId = `scheduled_query_${runId}`; // BigQuery's naming pattern
try {
const [job] = await bigquery.job(jobId).get();
const [metadata] = await job.getMetadata();
return {
jobId: job.id,
status: metadata.status,
statistics: metadata.statistics,
configuration: metadata.configuration
};
} catch (error) {
// Job not found - this happens sometimes
return null;
}
}
Step 4: Finally, extract useful information from the job details:
// Extract query and destination info
if (jobDetails?.configuration?.query) {
const query = jobDetails.configuration.query.query;
const stats = jobDetails.statistics;
console.log(`Query: ${query.substring(0, 100)}...`);
console.log(`Bytes processed: ${stats.totalBytesProcessed}`);
// Calculate duration (BigQuery timestamps are Unix timestamps in milliseconds as strings)
if (stats.endTime && stats.startTime) {
const duration = Number(stats.endTime) - Number(stats.startTime);
console.log(`Duration: ${(duration / 1000).toFixed(2)}s`);
}
// Get destination table if available
if (jobDetails.configuration.query.destinationTable) {
const table = jobDetails.configuration.query.destinationTable;
console.log(`Destination: ${table.projectId}.${table.datasetId}.${table.tableId}`);
}
}
๐ก Duration Calculation Note: BigQuery returns timestamps as Unix timestamps in milliseconds formatted as strings (e.g., "1749989041841"). That's why we use
Number()
to convert them before calculating the difference. This was one of those "fun" API quirks I discovered while building this!
Now I can quickly check if my data pipelines are running smoothly without context switching to the BigQuery console! ๐
Sample Output
Here's what the code produces when you run it:
๐ Found 2 scheduled queries
[1] shakespeare_test_top_1
Schedule: every day 12:04
State: SUCCEEDED
๐ Found 36 recent runs
๐ Correlating with BigQuery jobs...
Run 1:
- State: SUCCEEDED
- Start Time: 2024-01-15T12:04:00.000Z
- BigQuery Job ID: scheduled_query_68663782-0000-2154-9748-34c7e91daa93
- Query: select * from `testDataset.shakespeare_test_1` order by 1 desc limit 1...
- Bytes processed: 335
- Duration: 1.59s
- Destination: my-project.testDataset.shakespeare_top_1
What's Next
Planning to add ability to create scheduled queries from the extension.
In Cursor, there is no way of communicating the extentions with the assistant, but I would like to find a way to implement a system that automaticly adds table schemas to Cursor's knowledge base.
Forked from minodisk/bigquery-runner because it already works great. Check it out!
Check the extension at VS Code Marketplace
Check the code at GitHub