As more and more users begin to explore the BigQuery export feature that's now available to all Google Analytics users with GA4, we're seeing a lot of confusion around the discrepancy between sessions and users metrics in BigQuery dataset and the values displayed in the GA4 admin for the same time period.
How Do I Query Users and Sessions in the BigQuery GA4 Dataset?
One of our favorite things about the GA4 dataset is how easy it is to derive these metrics. We can simply count the unique user and session IDs that were collected, and BQ brings back the exact counts that were measured over the desired time frame. You can use the following SQL examples to query sessions and users.
- Sessions: is the COUNT of all DISTINCT ga_session_id values
- Users: is the COUNT of all DISTINCT user_pseudo_id values
Sample Query: GA4 Sessions and Users
SELECT
COUNT(DISTINCT(SELECT value.int_value FROM UNNEST(event_params) WHERE key = 'ga_session_id')) AS sessions,
COUNT(DISTINCT user_pseudo_id) AS users
FROM `your-project-id.your-dataset.events_20230815`
Sessions and Users Example Query Result
Sessions and Users in GA4 Interface for the Same Time Period
Okay, So Why Doesn't the GA4 Interface Match the BigQuery Value?
In GA4, Google uses an approximation algorithm in all instances when you request the sessions or users metrics. The approximation algorithm is different from sampling because it's always present and there is no way to work around it (with sampling, you can get around this by reducing your query size). The details of the approximation algorithm Google is using on the sessions and users metrics can be found here: Unique count approximation in Google Analytics
Why Don't My Users Match When I Download From the GA API?
Trying to obtain an accurate measurement of your users when loading data from the GA API is very challenging, and it's much less accurate if you're loading the data in a database and not sending live queries. The reason behind this is that users often visit websites across multiple days, months, and years. For example, if your API query is only requesting one day of activity, it will return all the users that visited the website on that particular date. When you aggregate across all the days of your stored data, the query does not know if the user visited on multiple days, resulting in an inflated metric.
Sessions, on the other hand, typically do not span multiple days (although technically possible), so downloading this metric by day, week, or month via the API and aggregating it should yield a value that is similar to the one displayed directly in the GA interface.
If your dashboard app (such as Looker Studio) is able to send live queries to the API with a dynamic start and end date, the users value in your dashboard will match the GA admin exactly. Using an app that can send live queries is usually the best approach for smaller organizations that do not require scale and/or complex transformations with their GA data.
So What's the Right Answer? BigQuery or the API?
Well, if you're determined to display the exact figure for these metrics, then the right answer is to use the GA4 dataset in BigQuery. The figures will always be exact, and while you'll need to pay for the queries, the associated costs are typically pretty insignificant for most businesses. There's one catch though — you'll need to be prepared for your end stakeholders to question these metrics until the end of time. It doesn't sit well with most people when the values in your reports do not match the values they are seeing in Google Analytics. Explaining the reason for the discrepancy and making the case that the BigQuery value is more accurate can become exhausting and distract from the important conversations you should be having about actioning the data.
If you're a bigger fan of avoiding the accuracy questions and the doubt that comes with them, then using the API is probably the better solution for you. You will still have issues with the users metric (as described above), but if you batch your requests by month, you can generally get pretty close to the value that is being displayed in the GA interface. It is also worth noting that the source, medium, and channel attribution in the BigQuery dataset cannot be matched exactly, and you'll need to develop your own logic to determine how you attribute credit across multiple sessions. You can also use the API to query the values exactly as they appear in the admin.
If you've made it this far and are leaning toward using the API for your reports, be sure to try Launchpad, our no-code data transfer application for GA4. We've mapped the entire GA4 schema and have solutions outlined in our help documentation that address many of the challenges digital marketers have encountered when trying to produce reports outside of the GA4 interface.
You may also find our Five Practical Reasons to Use the GA4 API Over the Native BigQuery Transfer article helpful.
How can we help?
If you need any assistance transferring your data and building insightful reports, then the experts at Calibrate Analytics would be happy to help. We can also walk you through our full-service ETL application, Launchpad, which lets you automate your reporting with ease.