Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

ancianita

(36,130 posts)
Thu Jan 11, 2024, 07:54 AM Jan 2024

Bellingcat to techies: Using the Wayback Machine and Google Analytics to Uncover Disinformation Networks

Not seeing a dedicated tech forum, I figured I might as well post this to the general DU community -- and intrepid techies among us -- going into 2024.

Previous related post: https://www.democraticunderground.com/11744393


Google Analytics is a popular service for tracking and analysing traffic to a website. Through a short code placed in the source of a website, a user can monitor the performance of all their online properties. These tracking codes can also clearly indicate when multiple websites are run by a single user or entity — meaning they have been a particularly useful breadcrumb for open source researchers.

But there is a catch: Google is phasing out these codes and replacing them with ones that contain less data and make it harder to track who controls sites.

To address this problem, Bellingcat has developed a lightweight open source research tool—Wayback Google Analytics — which automates the collection of tracking codes and discovery of relationships between websites using copies of sites maintained by The Internet Archive’s Wayback Machine. This will help researchers sidestep recent changes to how Google manages its analytics data...

What is a Google Analytics Code?
Google uses a series of unique tracking codes to gather analytics data on websites. For over a decade, the most popular Google tracking code was the Universal Analytics (UA) ID: a small tracker buried in a script tag in a webpage’s source code.

A UA code looks like this:

UA-123456789-1

There’s a lot of useful information here. The centre code is a unique tracking ID issued to multiple websites managed by the same user or entity. The trailing digit separates multiple online properties owned by that entity (e.g., UA-123456789-1, UA-123456789-2, etc.).

Tracking UA codes is a staple strategy in the OSINT toolkit that is regularly used by investigators. In 2017, journalists in South Africa used Google Analytics data to uncover a coordinated disinformation campaign funded and managed by a member of the notorious billionaire Gupta family. In 2015, Bellingcat contributor Lawrence Alexander used this same method to connect dozens of websites pushing pro-Kremlin narratives about Syria and Ukraine to a single individual based in St. Petersburg, Russia. In both cases, shared UA codes between multiple web pages were a key data point in the investigation.

However, conducting such an investigation in 2023 is much more difficult due to sweeping changes in how Google manages its tracking IDs.

Google Analytics 4
Earlier this year, Google rolled out Google Analytics 4 — a new analytics framework that replaces UA codes with less uniform tracking IDs that are significantly more difficult to glean information from. It is no longer possible to obtain a new UA code, and most major websites have updated their tracking IDs to the new G and GTM codes.

This is bad news for investigators, as ProPublica reporter Craig Silverman explained earlier this year. While GTM and G codes are still worthwhile breadcrumbs, fewer online services keep databases of these trackers. Moreover, gone is the useful suffix that helps indicate when multiple sites are using the same tracking code.

There is some good news, though: Google says it isn’t planning to force websites to remove existing UA codes and, with time, more services will likely begin to catalogue G and GTM codes to help find relationships between websites.

In the meantime, we can still extract legacy UA codes from websites that continue to use them. Plus, we can also use the Wayback Machine to examine the source code of websites in the past and find any overlapping UA codes.

Using Wayback Google Analytics
Bellingcat’s Wayback Google Analytics automates the gathering of analytics codes and checking their usage on multiple websites. We can give the tool a list of websites, a time range and a desired output format (.csv, .json, etc) so we can quickly get a bird’s eye view of any shared data between websites.

You can read a more in-depth guide on usage and installation on our Github page, but let’s examine what a typical use case might look like. We’ll use a few (now-defunct) Russia-linked disinformation websites covered in an earlier Bellingcat investigation to demonstrate the tool’s usefulness.

https://yapatriot.ru
https://zanogu.com
https://whoswho.com.ua
https://adamants.ru

We’ll assume that we only want data from 2015 until the present. Since we’re looking for relationships between the websites, we’ll output the data into an Excel spreadsheet. Our command looks like this:

wayback-google-analytics -u https://yapatriot.ru https://zanogu.com https://whoswho.com.ua https://adamants.ru -s 01/01/2015 -f yearly -o xlsx
...

Wayback Google Analytics is an open source project anyone can contribute to. Visit Bellingcat’s Github page to view contribution guidelines or look at this project’s active issues to get started.


See detailed images at
https://www.bellingcat.com/resources/2024/01/09/using-the-wayback-machine-and-google-analytics-to-uncover-disinformation-networks/
2 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
Bellingcat to techies: Using the Wayback Machine and Google Analytics to Uncover Disinformation Networks (Original Post) ancianita Jan 2024 OP
This is super useful! GreenWave Jan 2024 #1
I was hoping... ! Heaven knows that when media aren't trustworthy, 2024 voters need all hands on deck. ancianita Jan 2024 #2

ancianita

(36,130 posts)
2. I was hoping... ! Heaven knows that when media aren't trustworthy, 2024 voters need all hands on deck.
Thu Jan 11, 2024, 10:43 AM
Jan 2024
Latest Discussions»Issue Forums»Editorials & Other Articles»Bellingcat to techies: U...