Problem

Analyzing data from Google Analytics is an interesting and very effective way of understanding the market of a company. In this case, the project takes information from a Google Merchandise Store with the objective of extracting valuable information and setting up strategies for further investments. The main objectives are:

  • Study the markets' temporal tendencies.
  • Predict buying users.

The main difficulties here were dealing with millions of users, while selecting only the useful indicators.

Once the data is understood and clean, Machine Learning algorithms can learn the characteristics of those costumers that buy, in order to predict those costumer conversions, helping the owner to set up strategies for efficient marketing campaigns.

Solution

When dealing with big amounts of data, there are different approaches to the problem. In this case, the information was loaded by fragments, cleaned and the uploaded to a database in MongoDB. Once in MongoDB, the access to the information is faster, though it's still needed to be handed in batches so that the Machine Learning algorithms can process that information.

Once cleaned, the temporal tendency study showed no interesting results, as the tendency component is flat (blue line), therefore all left is the seasonal component (orange line), which is not of much interest due to the short time fragment of the data.

Another important thing to point out is that unbalanced datasets, such as this one, where the e-commerce rule of 80/20 is more than confirmed, requires certain strategies. This rule dictates that the percentage of users that buy is around the 20%. In this case, it was around the 1%. That´s why the number of users that don't buy is highly superior, leading to several problems when teaching the Machine Learning algorithms.

Distribution of buyers (revenue) and non buyers (non revenue)

The strategy applied in this case was the under-balance, which consists in using a balanced fragment of the dataset where the number of buyers is equal tu the value of non buyers.

Balanced strategies.

Once the algorithm was trained, the predictions where made. The results where highly dependant of the balanced used (50/50,33/66,25/75). And the results given opened the horizon for different marketing strategies.

For further understanding, you can download the full report.

Google Analytics Customer Revenue Prediction

Study of Google Analytics data for an e-commerce for predicting the customers revenue.

Client:
Release Date:
April 2020
Category:
Analysis + Machine Learning
Full project here

More Case Studies