How I lifted 2.8 tonnes with the power of pandas. (Part II update here)
Project Design
I wanted to design a database to track my workouts.
Part of it was about combining my data skills with tracking my workouts. I wanted a reason to practice pandas
, and was inspired by this excellent video — it looks doable.
I also spent some money on sessions with a personal trainer, so I wanted to track my workouts more seriously. Personal training sessions are expensive, and I wanted to be sure I got full value from them.
Also, my handwriting is atrocious, and my notes are very messy:
Data structure design
a. Data Collection
When the notes are transcribed, you can see that actually, the workout is not all that complicated:
- Session 1: Arms
- Deadlifts (2 warmup x 5 reps @ 30kg, 5 sets x 8 reps @ 50 kg)
- Diagonal Dumbbell Chest Press (4 sets [12, 12, 15, 15] @ 7.5 kg)
- Close-Grip Dumbbells Chess Press (4 sets [15, 15, 18, 12] @ 7.5 kg)
- Supine Lat Pulldown (4 sets x 12 reps @ 13.5 kg)
- Seated Row (4 sets [12, 12, 15, 15] @ 17.5 kg)
- Session 2: Legs
- Barbell Squats (2 warmup x 30 reps @ empty bar, 5 sets x 8 reps @ 50 kg)
- Sumo Goblet Squats (4 sets x 15 reps @ 27.5 kg)
- One-Sided Dumbbell Split Squats (4 sets x 8 reps @ 7.5 kg)
- Heel-Elevated Squats (20 bodyweight, 15 @ 10kg, 15 @ bodyweight)
But already, we can see some issues here:
- Many moving parts — Each exercise has different numbers of sets; each set has different reps, and is sometimes done with different reps.
I decided to create three columns:sets
,reps
, andweight_kgs
to track for eachactivity
. I chose to track in kilos because free weights are set in kilograms, but machine plates are often set in pounds – it seems important to track that.
I could also have set one column for weight and just have five different columns for five (maximum) sets, but I decided to create vertical data. I’m not entirely sure I understand why yet, but for now it means I don’t have empty cells in the fifth set column.
- What about warmups? — Leg swings and stretches are done for 30 seconds with no weight. I decided to encode the
weight_kgs
as n/a, theset
astime_seconds
and thereps
as the seconds. It’s not ideal, but it allows me to store timed warmups in the columns
After these considerations, I created a Google Sheets with all the exercises carefully encoded:
I also separated out session data in a seperate sheet. Session data contains date/time/duration stamp, but I didn’t want to repeat this for every single exercise.
b. Data Cleaning
I downloaded the Google Sheets as a csv and imported it into VScode:
First lesson of data analyst: read the head()
of the dataframe. I also found a nifty command, dtypes
to identify the types of each column:
That’s weird — why is weight_kgs
a float instead of an integer? A little rooting around showed that using the 7.5 dumbbells means that some numbers are decimal for now.
There’s another mistake – the first warmup activity just says “front”:
This gives me a chance to practice using the .replace()
, which is a method that’s typed like an object attribute but performs as a function:
This would’ve taken 30 seconds in Excel, but I’m proud of taking 2 hours to do this:
c. Playing with Pandas
Alright, now to practice some math. I spent a lot of time moving that damn barbell; how much weight did I actually lift?
First, I added a column for weight lifted
:
And then I used .sum()
to add up the columns. Strings/objects were concatenated, but the final number under lifted
is 4900.0 kg:
Looking only at session 2, it’s only 2600 kg – which is 2.8 tonnes over 50 reps. Not bad.
As a final step, I wanted to figure out how to update the csv:
Reflections
This looked easy, but the path was strewn with tons of troubleshooting and Googling errors:
In particular, I used .copy()
due to an obscure SettingWithCopyWarning in pandas. I’m not entirely sure I understand the issue, but I figured out a way around it.
I also wonder if I could design the tables a little better. On the one hand, since each activity
usually targets one muscle_group
, it’s repeated information. I was also thinking of building out an activation
table with one-hot encoding to create a database of potential exercises by muscle groups targeted.
Ideas For Future Projects
Visualization — Growth in weights over time
- The goal being to visualize how much more I can lift
- Should be possible after 2 months of regular working out
Workout Planner — What should I work out next?
- My idea is to write a function in Python that spits out the next set of exercises in the gym
- The MVP for the function would just loop between arms/legs and auto-input the warmup
- A more complicated function would check for several conditions
- Progressive Loading: if all reps are hit, increase weight by 2.5 kg (or next weight increment)
- Failure : if deload is necessary, maintain the weight
- Skipped: if reps = 0, re-schedule exercise for next session
Your Year In Squats — Dashboard
- Like Strava and Spotify, to give a sense of accomplishment at the end of the year
- It’d be nice to know how often I’ve helped the barbell defy gravity.
Read more in the [Data x Workouts] series: