⚡️ BigQuery Query Optimization: Partition Pruning with Subqueries vs SQL Variables

📋 Overview
BigQuery Partitioning divides a large table into smaller, manageable segments called partitions based on the values of a specific column. BigQuery supports three types of partitioning:
- Time-unit partitioning: Partition based on a TIMESTAMP, DATE, or DATETIME column into daily, hourly, daily, monthly, or yearly segments.
- Integer-range partitioning: Partition based on ranges of values in a specific INTEGER column, defined by a start value, end value, and interval.
- Ingestion-time partitioning: Automatically partitions data based on the time when BigQuery ingests the data using a pseudo column
_PARTITIONTIME.
This strategy improves query performance by limiting the amount of data scanned and reducing the cost of the query by reading only the relevant partitions of a table. This process is known as partition pruning.
BigQuery automatically prunes partitions when the query contains filters on the partitioning column with static literal values. However, using subqueries to determine partition values can prevent BigQuery from effectively pruning partitions, leading to higher costs and slower performance.
In this article, we will explore an approach that disables partition pruning and demonstrate how to rewrite the query to enable it.
📊 The Scenario: Querying the “Latest N” Partitions
Imagine there is a large historical dataset partitioned by day on the date column. This table holds years of data, amounting to several gigabytes. The task is to analyze only the data from the 10 most recent days.
The following sample dataset is used throughout this article to demonstrate partition pruning behavior.
| Property | Value |
|---|---|
| Project ID | xxxxxxxxxxx |
| Dataset ID | lab_dataset |
| Table Name | lcl_data |
| Table Size | 9.37 GB |
| Number of Partitions | 830 |
| Partitioning Column | date_processed (DATE type) |
| Partition Type | Time-unit partitioning (daily) |
| Number of records | 167932474 |
🕵️♂️ The Flawed Approach: Dynamic Subquery in the WHERE Clause
A common approach is to use a subquery to find the latest partition dates and feed them into the main query’s WHERE clause.
| |
⚠️ Critical Issue: While this looks correct, it forces BigQuery to scan the entire table because the query planner cannot resolve the list of dates from the subquery before it executes the query, so it must scan all partitions.
After running the query, a look at the job information will show a high number for “Bytes billed.”
💸 Result: Scanning full table (9.37 GB) instead of just 10 partitions.

🛠️ The Solution: Replace the Subquery with a SQL Variable
✨ Key Insight: The correct approach is to decouple the discovery of partitions from the main data query. You do this by finding the partition dates first and storing them in a SQL variable. The main query then uses this SQL variable which has the partition dates in it, allowing the query planner to perform pruning correctly.
This single principle can be applied whether you are working in the BigQuery console or programmatically.
1️⃣ In the BigQuery Console (using SQL Scripting)
You can use a BEGIN...END block to run a multi-statement script. The DECLARE statement runs first, populating the variable. The main query then uses it.
| |
2️⃣ Programmatically (e.g., in Python)
The same two-step logic applies. First, you run a query to get the dates. Second, you use those dates as a parameter in your main query.
| |
📚 Full code example: Check out the 🔗 bq_partition_pruning_subquery_vs_sqlparam_demo.py script
✅ Result: Partition Pruning Enabled!
🎉 After implementing the above solution, running the query will show a significantly lower “Bytes billed” value, indicating that partition pruning is now effectively enabled. It has 2 queries instead of 1:
- 🔍 First query gets the partition dates
- 📊 Second query fetches data from only those partitions
📈 Breakdown of bytes billed:
- 🔹 First query (partition discovery): ~10 MB
- 🔹 Second query (main data query with pruning): ~124 MB
- 🟢 Total: ~134 MB (vs. ❌ 9.37 GB without pruning) — 98% savings!
Below is a parent-and-children layout to present the result meaningfully: the parent shows both queries (BEGIN…END), and the two children show each job detail.

![]() Child 1: Job information for partition dates query. | ![]() Child 2: Job information for pruned partitions data query. |
🔍 Verify Only 10 Partitions Were Scanned
✔️ To confirm that partition pruning is working correctly, you can query the INFORMATION_SCHEMA.PARTITIONS table to calculate the total billable bytes for the 10 most recent partitions:
| |
✅ Expected Result: The total_billable_bytes_sum column should be around ~124 MB (not 9.37 GB!)

💰 The Financial Impact: A Stunning Difference
🚀 This single optimization can dramatically reduce your BigQuery costs. By eliminating the subquery and using a SQL variable, you ensure BigQuery only scans the partitions you actually need.
📊 The table below compares the resource consumption before and after implementing this fix, revealing the substantial impact on both performance and cost:
| Metric | ❌ Subquery Approach | ✅ SQL Variable Approach |
|---|---|---|
| Bytes Billed | 9.37 GB | ~134 (10 + 124) MB |
| Partition Pruning | ❌ NO (All partitions scanned) | ✅ YES (Only 10 partitions scanned) |
| Cost Savings | Baseline | Over 98% |
🎯 Conclusion
📌 Key Takeaway: Using subqueries in the WHERE clause to determine partition values can disable BigQuery’s partition pruning, leading to inefficient queries that scan the entire table.
✨ The Fix: By separating the partition discovery step from the main data query and using SQL variables or query parameters, you can enable partition pruning effectively.
🏆 The Impact: This approach results in significant performance improvements and cost savings (often 90%+) when querying large partitioned tables in BigQuery.

