Observations and Musings
I spent a while this month in a rabbit hole named “last non-direct click session attribution in GA4.” If you happen to pass by the same rabbit hole, I recommend staying the #$%& away.
My journey down the hole started because I’d set up a GA4 BigQuery data pipeline for a client, but my session numbers weren’t matching GA4. I’d created basically the same pipeline for other clients, and everything matched up well, but this client was different because the majority of their traffic is repeat visitors, and completing a purchase often spans multiple sessions.
If you started to nod off in that last paragraph, I’ll leave you with this and recommend that you skip ahead:
Being good at working with data and understanding data are not the same thing.
If you work with someone who is really proficient at data engineering, don’t let your guard down. Start with the assumption that what they tell you is wrong and look for evidence to support your assumption. I know, I sound like “mr. negative”, but optimism has no place in data analysis.
Here are some of the issues I had to deal with to get my numbers to match GA4:
-
- If a given session is attributed to direct, you need to look back 90 days to see if the same user arrived via a different source previously. Thank god for window functions.
-
- You also need to look to see if the user had sessions previously on the same day – that was the main source of discrepancy in my case. This is a non-issue if you can afford to query the raw export back 90 days every time, but doing so can get expensive for higher-volume properties.
-
- Beware of all of the attribution fields that have been added to the GA4 export. Some of them are not accurately documented, some of them are just plain inaccurate, and NONE of them are backwards compatible.
-
- The export still mis-attributes some Google Ads traffic to Google organic, which has been an issue from the beginning. I’d heard rumors that this had gotten better, but the rumors appear to be false.
-
- The client I was working with gets a LOT of traffic, so the 90-day lookback could get expensive. I was able to bring costs way down by storing the data in an optimized helper table, then querying that table for the lookback.
Now that you’ve made it this far, here’s my second piece of advice: strongly consider using a third-party solution for processing GA4 data. If someone tells you it’s easy to build reporting off of the BigQuery export, they either haven’t done it right or they haven’t done it at all. Last month I wrote about ga4dataform.com, which should be on your short list to consider. And I’m in the process of testing another solution from PipedOut, which I’ll cover in my next roundup. I did eventually make it out of the GA4 attribution rabbit hole I was in, and I learned some useful things while there, but in general I’d rather spend my time solving new problems than replicating GA4 metrics.
Slow Month for The Google Analytics Product Stack
I usually write about GA4, Looker Studio and Google Tag Manager updates that stand out, but I guess even Google engineers get a little down time now and then. I was pretty excited when they announced a major update to charts in Looker Studio last week, but they unannounced it shortly thereafter:
I’m not sure what happened, but hopefully they’ll announce it again soon.
Google Meridian
Google Meridian is an open-source marketing-mix-modeling framework and toolset that has just been made generally available.
If you are not familiar with marketing mix modeling, here is how Wikipedia describes it:
Marketing Mix Modeling (MMM) is a forecasting methodology used to estimate the impact of various marketing tactic scenarios on product sales. MMMs use statistical models, such as multivariate regressions, and use sales and marketing time-series data. They are often used to optimize advertising mix and promotional tactics with respect to sales, revenue, or profit to maximize their return on investment.
I haven’t had a chance to play with it yet, but it fits perfectly with my priorities and goals for 2025 so expect to hear more about it in months to come.
Content we’ve published
-
- Filtering GA4 Bot Traffic in Looker Studio
A short video describing an efficient way to eliminate bot traffic from your GA4 reports.
- Filtering GA4 Bot Traffic in Looker Studio
-
- Sortable Date Comparisons in Looker Studio
I shared a blog post about this last month – this is the video companion. It’s a bit tricky to set up, but very helpful.
- Sortable Date Comparisons in Looker Studio
-
- Understand Your Target Audience With Google Search Results
How to learn what SERPs are telling you to improve your content & visibility.
- Understand Your Target Audience With Google Search Results
-
- Top of 2024
Our five most popular videos and blog posts of 2024.
- Top of 2024
Articles/Videos/Resources that made me smarter
-
- What Happens When Brands Stop Advertising?, Matt Voda
I’ve experienced much of what Matt describes, and have seen first-hand that cutting all but bottom-funnel marketing spend can end in a death spiral. This article and related case study does a great job of explaining why.
- What Happens When Brands Stop Advertising?, Matt Voda
-
- Attribution is Dying. Clicks are Dying. Marketing is Going Back to the 20th Century, Rand Fishkin
This is a great companion piece to the previous article. In it, Rand provides detail and data on many of the technical and behavioral changes that are affecting measurability. He also got me thinking about the vast universe of organic social media and web content that plays a big role in consumer decision making, and a relatively small role in most marketing strategies.
- Attribution is Dying. Clicks are Dying. Marketing is Going Back to the 20th Century, Rand Fishkin
-
- How GA4 BigQuery raw data impacted by Consent Mode and what analysis we can do, Anton Konchakovskyi
Some good suggestions for working with GA4 event data for users who have declined consent. He also makes some bad suggestions – circumventing a user’s wishes by adding your own unique user_id is a clear violation of GDPR and other privacy regulations.
- How GA4 BigQuery raw data impacted by Consent Mode and what analysis we can do, Anton Konchakovskyi
-
- 10 Optimisation Tips to Reduce Your Mighty BigQuery Cost, Growthrunner
Great tips. I didn’t know about TABLESAMPLE at all, and they shed light on some misconceptions I had about clustering, partitioning and sharding.
- 10 Optimisation Tips to Reduce Your Mighty BigQuery Cost, Growthrunner
-
- A client asked if I could add an average daily revenue scorecard to a dashboard. Dana DiTomaso for the win with this clever hack: How to Calculate the Number of Days Between Start and End Dates Using DATE_DIFF in Looker Studio
Don’t Miss our Next Analytics Roundup
Sign up for our newsletter to get Nico’s monthly Analytics Roundups delivered to your email box.