RUNNING DATA ANALYSIS

View associated files on GitHub

Tools and Languages: SQL, SQL Server

Description and Intent: This project used Microsoft SQL Server to analyse data collected from fitness devices linked with the Strava API in 2019. The dataset, obtained from Kaggle, sampled 116 amateur runners that consented to share nearly 42 000 runs. 

Where relevant, runs were filtered to exclude those below 100 m and with times of 0 s. The dataset source claimed that it had been cleared of outliers, but initial checks highlighted running speeds vastly above the world record of 10.44 m/s set by Usain Bolt in 2009; rows yielding such speeds were filtered out where relevant. The data was then explored to investigate run distances, times and speeds; heart rates; and when runs occur during the year. SQL techniques employed included Aggregate Functions, Converting Data Types, Renaming Columns, Joins, CTEs, Case Statements, Subqueries, and Windows Functions.

Insights and Reporting: Analyses of the data were conducted with respect to all runners prior to breaking results down by gender. Runners that identified as 'male' and 'female' made up approximately 76% and 22% of the data, respectively, making comparisons between them interesting, but not entirely reliable. Nonetheless, the average distance run was approximately 11 km, which reflected the anticipated 10 km usual average, but the men's data included significantly longer distances, longer times, and greater speeds. Interestingly, the women's data indicated a noticeably lower average running heart rate (78 bpm compared to the men's 85 bpm). 

Formal time data management was not a focus of this project; instead, time values were kept as varchar types and analsyed by isolating times through substring functions. There was an overall increase in collected data in years leading up to 2019, which raises additional interesting questions about increases to the number of runners in the world, increases to the number of Strava/fitness app users, etc. An approximately equal number of runs occurred across all four seasons with a small drop in summer that was counter to my assumption that the warmer weather would see greater numbers of runners active and outside. 

Finally, a small set of running data from a personal contact, collected with the Zeopoxa app, was compared to the main table through a crude, albeit sufficient join. His running speed was found to be between that of the average male and female Strava speeds.

Copyright © 2023, Brandon O'Donnell

LinkedInGitHubEmail