Designing a data structure to track workouts

How I lifted 2.8 tonnes with the power of pandas. (Part II update here)

pandas, not pandas

Project Design

I wanted to design a database to track my workouts.

Part of it was about combining my data skills with tracking my workouts. I wanted a reason to practice pandas, and was inspired by this excellent video — it looks doable.

I also spent some money on sessions with a personal trainer, so I wanted to track my workouts more seriously. Personal training sessions are expensive, and I wanted to be sure I got full value from them.

Also, my handwriting is atrocious, and my notes are very messy:

My arms were trembling after the workout.

Data structure design

a. Data Collection

When the notes are transcribed, you can see that actually, the workout is not all that complicated:

  • Session 1: Arms
    • Deadlifts (2 warmup x 5 reps @ 30kg, 5 sets x 8 reps @ 50 kg)
    • Diagonal Dumbbell Chest Press (4 sets [12, 12, 15, 15] @ 7.5 kg)
    • Close-Grip Dumbbells Chess Press (4 sets [15, 15, 18, 12] @ 7.5 kg)
    • Supine Lat Pulldown (4 sets x 12 reps @ 13.5 kg)
    • Seated Row (4 sets [12, 12, 15, 15] @ 17.5 kg)
  • Session 2: Legs
    • Barbell Squats (2 warmup x 30 reps @ empty bar, 5 sets x 8 reps @ 50 kg)
    • Sumo Goblet Squats (4 sets x 15 reps @ 27.5 kg)
    • One-Sided Dumbbell Split Squats (4 sets x 8 reps @ 7.5 kg)
    • Heel-Elevated Squats (20 bodyweight, 15 @ 10kg, 15 @ bodyweight)

But already, we can see some issues here:

  • Many moving parts — Each exercise has different numbers of sets; each set has different reps, and is sometimes done with different reps.

    I decided to create three columns: sets, reps, and weight_kgs to track for each activity. I chose to track in kilos because free weights are set in kilograms, but machine plates are often set in pounds – it seems important to track that.

    I could also have set one column for weight and just have five different columns for five (maximum) sets, but I decided to create vertical data. I’m not entirely sure I understand why yet, but for now it means I don’t have empty cells in the fifth set column.
  • What about warmups? — Leg swings and stretches are done for 30 seconds with no weight. I decided to encode the weight_kgs as n/a, the set as time_seconds and the reps as the seconds. It’s not ideal, but it allows me to store timed warmups in the columns

After these considerations, I created a Google Sheets with all the exercises carefully encoded:

It looks silly to have exercise_id

I also separated out session data in a seperate sheet. Session data contains date/time/duration stamp, but I didn’t want to repeat this for every single exercise.

Database Begins

b. Data Cleaning

I downloaded the Google Sheets as a csv and imported it into VScode:

This is where the fun begins

First lesson of data analyst: read the head() of the dataframe. I also found a nifty command, dtypes to identify the types of each column:

We all float64 here

That’s weird — why is weight_kgs a float instead of an integer? A little rooting around showed that using the 7.5 dumbbells means that some numbers are decimal for now.

There’s another mistake – the first warmup activity just says “front”:

And this is true for the activity in the second session as well:

This gives me a chance to practice using the .replace(), which is a method that’s typed like an object attribute but performs as a function:

This would’ve taken 30 seconds in Excel, but I’m proud of taking 2 hours to do this:

Maybe I should’ve called it activity_id

c. Playing with Pandas

Alright, now to practice some math. I spent a lot of time moving that damn barbell; how much weight did I actually lift?

First, I added a column for weight lifted:

Once, I aspired to be a power lifter

And then I used .sum() to add up the columns. Strings/objects were concatenated, but the final number under lifted is 4900.0 kg:

Looking only at session 2, it’s only 2600 kg – which is 2.8 tonnes over 50 reps. Not bad.

As a final step, I wanted to figure out how to update the csv:

to_csv, to_csv, to_csv we go

Reflections

This looked easy, but the path was strewn with tons of troubleshooting and Googling errors:

I am listing “Searching on StackOverflow” as a skill on my LinkedIn

In particular, I used .copy() due to an obscure SettingWithCopyWarning in pandas. I’m not entirely sure I understand the issue, but I figured out a way around it.

I also wonder if I could design the tables a little better. On the one hand, since each activity usually targets one muscle_group, it’s repeated information. I was also thinking of building out an activation table with one-hot encoding to create a database of potential exercises by muscle groups targeted.

Ideas For Future Projects

Visualization — Growth in weights over time

  • The goal being to visualize how much more I can lift
  • Should be possible after 2 months of regular working out

Workout Planner — What should I work out next?

  • My idea is to write a function in Python that spits out the next set of exercises in the gym
  • The MVP for the function would just loop between arms/legs and auto-input the warmup
  • A more complicated function would check for several conditions
    • Progressive Loading: if all reps are hit, increase weight by 2.5 kg (or next weight increment)
    • Failure : if deload is necessary, maintain the weight
    • Skipped: if reps = 0, re-schedule exercise for next session

Your Year In Squats — Dashboard

  • Like Strava and Spotify, to give a sense of accomplishment at the end of the year
  • It’d be nice to know how often I’ve helped the barbell defy gravity.

Read more in the [Data x Workouts] series:

  1. Designing a data structure to track workouts (you are here)
  2. Building a dashboard to track workouts