Insights

Getting Started With the Google Analytics 4 BigQuery Export

by Paul Cote on Oct 10, 2023

Monitoring and analyzing data from your website or app through Google Analytics 4 is essential to optimizing your business, but viewing that data directly in GA4 is just the beginning. To take your analytics to the next level, you'll want to export that data to BigQuery, where you'll have the freedom to combine it with other datasets, manipulate it to uncover new insights, and connect it to visualization and business intelligence apps.

Set Up a Google Cloud Platform Account

In order to establish the GA4 BigQuery export, you'll need a Google Cloud Platform (GCP) account with an active billing account. Be sure to take advantage of the $300 credit Google offers for new accounts.

Establish the Connection Between GA4 and BigQuery

After you have created your GCP account, you should be able to establish the connection between Google Analytics 4 and Google BigQuery. You'll want to make sure that the user who is going to establish the connection has admin permissions in both systems and ideally is using an account owned and controlled by someone in your organization.

Note: If you are moving your GA4 connection to a new BigQuery account, you'll need to manually configure Google's GCP service account on the new account. This does not happen automatically when you are switching the BigQuery export to a new GCP account.

Familiarize Yourself With the Data and Schema

Before you begin to incorporate the data into your dashboards, you'll want to make sure you are familiar with the GA4 schema and the underlying tagging setup (usually in Google Tag Manager). GA4 captures all of the site activity as individual events and connects the events via User IDs and Session IDs.

Use the following query to see the list of events that are being logged by GA4 on your website or app.

    
        SELECT event_name FROM `your-project.analytics_your-dataset.events_20231006`
        GROUP BY event_name
    

The results of this query will show you all of the standard GA4 events that are collected automatically by the GA4 configuration tag and data layer, as well as any custom events you have set up on the website.

After you have familiarized yourself with the events that are being tracked, you'll want to look at how the events have been decorated with parameters. Within each event you'll notice there are nested records with additional information about the event. To access the event parameters and display them in a visualization, you'll need to familiarize yourself with the UNNEST function in Google BigQuery. Unnesting records can be a lot to wrap your head around when you're first starting, so don't get discouraged — there is still a wealth of information that can be extracted from the parent-level records, and the unnesting will become easier for you as you continue to work with this data.

Run a Basic Query

The first query you run should be basic in nature and only include values from the parent record. Use the schema screen in BigQuery to determine which dimensions and metrics are parent vs. nested. Nested records will be marked as “REPEATED” in the Mode column (as in the screenshot below).

Try the following query and be sure to replace your project and your dataset IDs where indicated in the FROM clause:

Dimensions

  • date
  • source
  • medium
  • devicecategory

Metrics

  • users
    
    SELECT  
        PARSE_DATE("%Y%m%d", event_date) AS date,
        traffic_source.medium AS medium,
        traffic_source.source AS source,
        device.category AS device_type,
        COUNT(DISTINCT user_pseudo_id) AS users
    
    FROM `your-project.analytics_your-dataset.events_202309*` /* month of September 2023 */
    
    GROUP BY
        date,
        medium,
        source,
        device_type
    
    ORDER BY users desc
    

Note: In the above query, the source and medium is the first way the user found the site. Source and medium are more commonly used at the session level with attribution logic applied. The session source and medium is decorated on the session events and will need to be unnested if you're going to try and mirror the attributed channels in GA4 as closely as possible.

Learn to UNNEST Records

If you've made it this far and your head hasn't exploded, then you're ready to begin unnesting data from the child records and joining them with their parent records. To do this, you'll need to use a CROSS JOIN function along with the UNNEST function. Use the following examples of CROSS JOIN and UNNEST to start your journey with unnesting records in the GA4 BigQuery events table.

Example UNNEST Query With CROSS JOIN:

    
    SELECT  
        PARSE_DATE("%Y%m%d", event_date) AS date,
        value.int_value AS session_id
       
    FROM `your-project.analytics_your-dataset.events_20231006`
    
    CROSS JOIN UNNEST(event_params)
    
    WHERE key = 'ga_session_id' AND event_name = 'session_start'
    
    GROUP BY
        date,
        session_id
    

Example UNNEST Query Without CROSS JOIN:

    
    SELECT  
        PARSE_DATE("%Y%m%d", event_date) AS date,
        (
            SELECT value.int_value
            FROM UNNEST (event_params)
            WHERE key = 'ga_session_id' AND event_name = 'session_start'
        ) AS session_id
       
    FROM `your-project.analytics_your-dataset.events_20231006`
    
    WHERE event_name = 'session_start'
    
    GROUP BY
        date,
        session_id
    

Time to Level Up

Once you've become comfortable working with your GA4 data in BigQuery, you'll likely begin to fall in love (as we have) and look for ways to load data from other systems that hold valuable information about your business. The easiest way to do this is by leveraging an extract, transform, and load (ETL) application like Launchpad. Launchpad connects to the most common digital marketing platforms used today and can consume data from any application that offers an email-able report as a CSV file. We've also decorated our help section with a wealth of tutorials (like this one), sample BigQuery queries, and solutions that we've implemented directly for customers with similar needs.

Try Launchpad

Share this post:
  • Paul Cote

    About the Author

    Paul is head of analytical products at Calibrate Analytics. He is responsible for creating digital analytical solutions that enable better business decisions. He has over 19 years of digital focused leadership, along with vast experience in analytics solutions aiming to deliver the right insights.