Insights

Unlock the Power of Materialized Views BigQuery

by Shannon Gantt on Aug 13, 2024

When diving into the world of Google BigQuery, you'll quickly encounter two powerful tools: regular views and materialized views. Both are incredibly useful for different purposes, but understanding their unique benefits and features can help you make the most of your data. Let's explore how materialized views stand out compared to regular views.

Regular Views

Regular views in BigQuery are pretty straightforward. Views are logical tables that are created by using a SQL query. It is helpful to think of them as saved SQL queries that you can run whenever you need. Each time you access a regular view, BigQuery executes the underlying SQL query to fetch the most current data. This means you're always working with the latest information, which is a significant advantage if your data changes frequently.

Regular views are easy to set up and maintain. Since they don't store any data themselves, there's no additional storage cost involved. You're essentially saving the SQL script and running it on demand. This simplicity makes regular views a go-to choice for many quick and flexible data access needs.

Some common uses of regular views include:

  • Abstract and store calculation and join logic in a common object to simplify query use
  • Provide access to a subset of data and calculation logic without accessing to the base tables
  • Using a view as a data source for a visualization tool such as Looker Studio.

See more about the official introduction to views in Google Cloud BigQuery documentation.

Materialized Views

Materialized views in BigQuery are a bit more sophisticated. Instead of just saving the SQL query, materialized views store the precomputed results of the query. This means that when you access a materialized view, BigQuery doesn't have to re-run the query—it simply fetches the already computed data. This can lead to significant performance improvements, especially with complex or resource-intensive queries.

One of the standout features of materialized views is their automatic refresh capability. BigQuery can keep materialized views up-to-date with changes in the underlying data, either on a schedule or as data changes. This balance between up-to-date information and performance optimization is where materialized views truly shine.

Materialized view support several options and key concepts which allow for fine tuning the freshness of the view data:

  • enable_refresh: By default, materialized views are automatically refreshed within 5 minutes of a change to the base tables, but no more frequently than every 30 minutes. Automatic refresh can be enabled or disabled at any time using this option.
  • refresh_interval_minutes: You can configure a frequency cap on how often automatic refresh is run. The minimum refresh frequency cap is 1 minute. The maximum refresh frequency cap is 7 days.
  • max_staleness: With the max_staleness parameter, you can adjust the freshness of the results to tune query performance

There are two types of materialized views — incremental and non-incremental. Understanding that they have pretty significant differences as well as different limitations can help you choose the best approach for your data needs.

Incremental Views

Incremental views is the default type of view in BigQuery. They update only the parts of the data that have changed since the last time the view was refreshed.

This method is most efficient because instead of reprocessing the entire dataset, BigQuery only processes the new or modified data, saving a lot of time and computational resources. Imagine you're updating a report that summarizes daily sales. With an incremental view, you don't need to re-calculate the entire year's sales every day—just the sales from the most recent day.

Incremental Views may require more setup initially, but the performance benefits often outweigh the initial effort.

The following example creates an incremental materialized view with a refresh interval and optional max staleness set:

    
        CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
        OPTIONS (enable_refresh = true, refresh_interval_minutes = 60,
            max_staleness = INTERVAL "4:0:0" HOUR TO SECOND)
        AS
        SELECT date, AVG(net_paid) AS avg_paid
        FROM project-id.my_dataset.my_base_table 
        GROUP BY date
    

Non-Incremental Views

Non-Incremental views are the more traditional type of views you're likely familiar with. Whenever you query a non-incremental view, BigQuery executes the entire SQL query defined by that view from scratch, every single time.

If you only need a specific record or subset of records, BigQuery must reprocess all the data each time, which can be time-consuming and resource-intensive, especially if your dataset is huge. This ensures you're always working with the freshest data, but it can be a bit slow and expensive if the query is complex.

Non-Incremental Views are simpler to set up because they don't require additional configuration for incremental processing.

The following example creates an non-incremental view materialized using the allow_non_incremental_definition option:

    
        CREATE MATERIALIZED VIEW project-id.my_dataset.my_mv_table
        OPTIONS (allow_non_incremental_definition = true
            enable_refresh = true, refresh_interval_minutes = 60,
                max_staleness = INTERVAL "4:0:0" HOUR TO SECOND)
        AS
        SELECT date, AVG(net_paid) AS avg_paid
        FROM project-id.my_dataset.my_base_table 
        GROUP BY date
        
    

A Performance Edge

The biggest advantage of materialized views is speed. Since the results are precomputed, queries against materialized views are much faster than those against regular views. This can be a game-changer for applications requiring quick access to data, such as dashboards or reporting tools. Instead of waiting for a complex query to run, you get near-instant results, which enhances the user experience and boosts productivity.

This is especially true of incremental materialized views because they only process the new or changed data.

Cost, Data Freshness and Performance Considerations

While materialized views offer improved performance, they do come with storage costs since they store the precomputed results. However, these costs are often outweighed by the savings in query costs. Running a complex query repeatedly can be expensive, but accessing a materialized view is much cheaper because it avoids the need to re-run the entire query each time.

An additional key difference between regular and materialized views is how they handle data freshness. Regular views always provide the most up-to-date data because they run the query every time you access them. Materialized views, while not always 100% current, can be set to refresh frequently enough to meet your needs. For many applications, the slight delay in data freshness is a reasonable trade-off for the significant performance gains.

We Can Help

Choosing between regular and materialized views in BigQuery depends on your specific needs. If you need real-time data and simplicity, regular views are the way to go. But if performance and cost efficiency are your priorities, especially for complex queries, materialized views offer substantial advantages.

Calibrate Analytics can help you understand the strengths of materialized views so that you can make an informed decision and unlock the full potential of your BigQuery environment.

Get in Touch

Share this post:
  • Shannon Gantt

    About the Author

    Shannon is head of technology at Calibrate Analytics. With over 24 years of experience focused on delivering technology solutions via a customer-first approach. Having successfully overseen the development and delivery of large-scale applications that span cloud, he is focused on developing creative business intelligence and e-commerce products.