Though perceived to be a safe sport, Venturelli et al. (2011) evidence how football has a higher injury rate per hour of exposure than both Rugby and American football (sports considered dangerous), as players are 1000x more likely to suffer injuries playing than workers in the most dangerous industrial professions (Drawer and Fuller 2002). With the impacts resulting from said injuries having wide-ranging consequences like a team’s overall performance being impacted (Hägglund et al. 2013), a player's career progression being hindered (Bangert et al. 2024) and a player’s long-term health being affected (Salzmann et al. 2017). Financially the implications are equally as severe, as Ekstrand (2013) estimates that a single injury can cost a club up to £500,000 per month, a figure driven by continued salary payments during a player’s unavailability (PFA 2023).
Because of these wide-ranging impacts, the importance of injury prediction methods in football cannot be understated, but despite the urgency, many methods remain limited in their effectiveness. Traditional approaches based on regression assume linearity and thus fail to capture the complex and non-linear interactions between multiple factors (Bittencourt et al. 2016; Bullock et al. 2022). A key component of predicting injuries, given that injuries result from range of physical, environmental, socio-cultural, and psychological variables (Wiese-Bjornstal et al. 1998; Ruddy et al. 2019).
In contrast ML models offer an adaptive, non-linear data driven alternative that can detect non-linear patterns in large, multidimensional datasets while simultaneously improving its own predictive power as it learns from new data (Chen et al. 2024: Rossi et al. 2021). Yang et al. (2022) showed ML models outperforming logistic regression in a large medical risk prediction context, supporting the broader methodological premise, yet Bullock et al. (2022) reports that 60% of injury prediction studies still rely on traditional techniques.
The aims of this study are therefore to: (1) critically review current literature on football injury risk factors and existing prediction methodologies; (2) design and implement a Decision Tree classifier capable of predicting whether a player will sustain an injury in a season; and (3) evaluate the model’s predictive performance, interpretability, and potential application for injury prevention strategies, while situating the findings within the broader academic and professional discourse on football injury prevention.