January 23, 2022

1719 words 9 mins read

Bellabeat Health-Tracker Analysis 👣

Scenario

Stakeholders and products

Stakeholders

Urška Sršen: Bellabeat’s cofounder and Chief Creative Oﬃcer
Sando Mur: Mathematician and Bellabeat’s cofounder; key member of the Bellabeat executive team
Bellabeat marketing analytics team: A team of data analysts responsible for collecting, analyzing, and reporting data that helps guide Bellabeat’s marketing strategy. You joined this team six months ago and have been busy learning about Bellabeat’’s mission and business goals — as well as how you, as a junior data analyst, can help Bellabeat achieve them.

Products

Bellabeat app: The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.
Leaf : Bellabeat’s classic wellness tracker can be worn as a bracelet, necklace, or clip. The Leaf tracker connects to the Bellabeat app to track activity, sleep, and stress.
Time : This wellness watch combines the timeless look of a classic timepiece with smart technology to track user activity, sleep, and stress. The Time watch connects to the Bellabeat app to provide you with insights into your daily wellness.
Spring: This is a water bottle that tracks daily water intake using smart technology to ensure that you are appropriately hydrated throughout the day. The Spring bottle connects to the Bellabeat app to track your hydration levels.
Bellabeat membership: Bellabeat also oﬀers a subscription-based membership program for users. Membership gives users 24/7 access to fully personalized guidance on nutrition, activity, sleep, health and beauty, and mindfulness based on their lifestyle and goals.

About the company

Sršen knows that an analysis of Bellabeat’s available consumer data would reveal more opportunities for growth. She has asked the marketing analytics team to focus on a Bellabeat product and analyze smart device usage data in order to gain insight into how people are already using their smart devices. Then, using this information, she would like high-level recommendations for how these trends can inform Bellabeat marketing strategy.

Ask

Sršen asks you to analyze smart device usage data in order to gain insight into how consumers use non-Bellabeat smart devices. She then wants you to select one Bellabeat product to apply these insights to in your presentation. These questions will guide your analysis:

What are some trends in smart device usage?
How could these trends apply to Bellabeat customers?
How could these trends help inﬂuence Bellabeat marketing strategy?

Exploratory Data Analysis (EDA)

1. Our data

From the datasets provided, I have selected the datasets that would bring most insights to important metrics for a healthcare application, which are;

daily_activity - provides information about their daily activities (during the day time).

The columns in this dataframe include; Id, ActivityDate, TotalSteps, TotalDistance, TrackerDistance, LoggedActivitiesDistance, VeryActiveDistance, ModeratelyActiveDistance, LightActiveDistance, SedentaryActiveDistance, VeryActiveMinutes, FairlyActiveMinutes, LightlyActiveMinutes, SedentaryMinutes, Calories
sleep_day - provides their night time information, may be crucial to consider sleeping behavior.

The columns in this dataframe include; Id, SleepDay, TotalSleepRecords, TotalMinutesAsleep, TotalTimeInBed
heart_rate - provides information on the clients’ heart rates as recorded by the trackers.

The columns in this dataframe include; Id, Time, Value
weight_log - provides information about clients’ weights and body mass index.

The columns in this dataframe include; Id, Date, WeightKg, WeightPounds, Fat, BMI, IsManualReport, LogId

Note;

Our data has the names of clients removed for anonymity, so we will be working with assigned IDs
Not all clients made their data available for each of the datasets, we will explore further on this on the next section

Understanding some general information on our data collection

How many unique participants are there in each dataframe? It looks like there may be more participants in the daily activity dataset than the sleep dataset.

n_distinct(daily_activity$Id)
## [1] 33
n_distinct(sleep_day$Id)
## [1] 24
n_distinct(heart_rate$Id)
## [1] 14
n_distinct(weight_log$Id)
## [1] 8

How many observations are there in each dataframe?

nrow(daily_activity)
## [1] 940
nrow(sleep_day)
## [1] 413
nrow(heart_rate)
## [1] 2483658
nrow(weight_log)
## [1] 67

Key takeaways;

Only 8 members provided their weight info
heart_rate data has over 2 million rows

We can see that most of our data is numeric so our visualization and exploration methods would reflect that nature

2. Data Visualization

density plots

daily_activity

sleep_day

heart_rate

weight_log

Key takeaways;

Unimodal distributions

TotalMinutesAsleep
TotalTimeInBed

Multi-modal distributions include

SedentaryMinutes
Calories
WeightKg
WeightPounds
BMI

Skewed distributions include

TotalSteps
TotalDistance
TrackerDistance
Value
BMI

uniform distribution

Fat

histograms

daily_activity

weight_log

Key takeaways;

The most common sedentary minutes are around 600-800, 1050-1300, 1450
Peak sedentary minutes recorded at maximum of 1450
Most common calories around 2000 and 3000
Most common weights were recorded at 55-65 kgs and 85-95 kgs
Outlier weight around 130 kgs
Most BMI vary between 24-26 with an outlier at 48

box plots

daily_activity

weight_log

Key takeaways;

cluster of data points on the zero mark on the graphs for TotalSteps, TotalDistance, TrackerDistance & on maximum value of SedentaryMinutes, might imply no data recorded, clients might not be wearing their trackers often
we have three outliers in the boxplots for TotalSteps, TotalDistance, TrackerDistance. I t would be interesting to see if its the same three individuals in all three charts, They would be our top performers.
we have a single outlier on both WeightKg and BMI

time-frame bar plots

sleep_day

heart_rate

Inspect if there are patterns in the data with 0 activity

Inspect if there are patterns in the data with max SedentaryMinutes

Key takeaways;

from our sleep_day dataset we can see general elevated resting periods on Sundays.
from our heart_rate dataset we can see heigtened fluctuations in heart rate values on Mondays.
by inspecting the patterns of the data that registered 0 active minutes and maximum sedentary minutes, they seem to have come from the same clients hence the matching patterns of occurence, however the patterns do not tell us any much more.

Inspect the outliers

Key takeaways;

the three outlier points were readings from two candidates
we might need store the IDs in case we will need them for future analysis or for a possible reward incentive

Cluster Analysis

Lets perform cluster analysis with our daily_activities dataset since we have the most users data.

1. create statistics per client

2. scale our variables

3. perform clustering

complete method is more suitable for our data as it clusters out data points better(better distribution of the data points)
initial exploratory analysis suggested that we have 2-3 groups in our data(ref. the multimodal distributions derived from the density plots)

4. split our data and visualize

We are going to go with the 2 group cluster (on the left) as having a third cluster with only one client does not make much sense.

5. assign our data with their respective clusters

6. compute general stats for each cluster

We can see contrasting behavior between cluster 1 and cluster 2, for instance;

cluster 1 one clients have a more active behavior overall.
cluster 2 clients clocked almost twice the amount of SedimentaryMinutes.
cluster 1 generally burnt more calories as a result.
we have 0 VeryActiveMinutes from cluster 2 which might mean they do not have an exercise schedule or they might be taking the trackers off during exercise, further survey might be required.

The main point of clustering the data was to be able to segment our customers so we know what marketing approach is more suitable for which customer, and now that we have all our data labelled between cluster 1 & 2 for a more targeted approach.

We might decide to label our data into more descriptive labels for the clusters like “active” and “less active”.

Combined Key Findings

Lets consolidate all our key takeaways

Only 8 members provided their weight info
heart_rate data has over 2 million rows
TotalMinutesAsleep - Unimodal distribution
TotalTimeInBed - Unimodal distribution
SedentaryMinutes - Multi-modal distribution
Calories - Multi-modal distribution
WeightKg - Multi-modal distribution
WeightPounds - Multi-modal distribution
BMI - Multi-modal distribution
TotalSteps - Skewed distribution
TotalDistance - Skewed distribution
TrackerDistance - Skewed distribution
Value - Skewed distributions
BMI - Skewed distributions
Fat - uniform distribution
The most common sedentary minutes are around 600-800, 1050-1300, 1450
Peak sedentary minutes recorded at maximum of 1450
Most common calories around 2000 and 3000
Most common weights were recorded at 55-65 kgs and 85-95 kgs
Outlier weight around 130 kgs
Most BMI vary between 24-26 with an outlier at 48
cluster of data points on the zero mark on the graphs for TotalSteps, TotalDistance, TrackerDistance & on maximum value of SedentaryMinutes, might imply no data recorded, clients might not be wearing their trackers often
we have three outliers in the boxplots for TotalSteps, TotalDistance, TrackerDistance. It would be interesting to see if its the same three individuals in all three charts, They would be our top performers.
we have a single outlier on both WeightKg and BMI
from our sleep_day dataset we can see general elevated resting periods on Sundays.
from our heart_rate dataset we can see heigtened fluctuations in heart rate values on Mondays.
by inspecting the patterns of the data that registered 0 active minutes and maximum sedentary minutes, they seem to have come from the same clients hence the matching patterns of occurence, however the patterns do not tell us any much more.
the three outlier points were readings from two candidates (not from three candidates as previously assumed)
we might need store the IDs in case we will need them for future analysis or for a possible reward incentive
now that we have our customers segmented into two definitive clusters (“active” and “less active”), We can derive our targeted marketing strategies from the above key points and apply to the most relevant cluster!

Bellabeat Health-Tracker Analysis 👣

Scenario

Stakeholders and products

About the company

Ask

Exploratory Data Analysis (EDA)

1. Our data

Understanding some general information on our data collection

Key takeaways;

2. Data Visualization

density plots

Key takeaways;

histograms

Key takeaways;

box plots

Key takeaways;

time-frame bar plots

Key takeaways;

Inspect the outliers

Key takeaways;

Cluster Analysis

1. create statistics per client

2. scale our variables

3. perform clustering

4. split our data and visualize

5. assign our data with their respective clusters

6. compute general stats for each cluster

Combined Key Findings

Clustering Bustabit Gambling Behavior 💷

COVID-19 Outbreak Analysis 🌍 🔬