Developing Machine Learning Models for Assessing Fantasy Premier League Player Performance

Andrew Jones


Supervised by Federico Liberatore; Moderated by Martin Caminada


Creating machine learning models, which will use open source event and Fantasy Premier League (FPL) data, in order to evaluate performance of players. For example is a player over/under performing, is this performance sustainable or is it likely to regress to mean. Creating a model for 'expected points', this can be compared to actual points scored by a player to help inform a person's decisions when selecting a fantasy football team.

The Problem in More Depth

Football in recent years has made huge leaps in the use of statistics and data to better inform decisions in the game, such that in professional football the use of analytics and data scouting is now common practice. FPL is played by millions of people and yet it has failed to keep pace with the Premier League’s advances in decision making, despite far more people being directly involved. There are advanced models and statistics for football analysis that are independent from outcome bias. For example statistics such as goals scored by a player in a small sample of 10 games are highly outcome biased however statistics such as 'expected goals (xG)' look at how many goals this player is likely to have scored from the chances that they had. FPL points are based on these outcome biased statistics and therefore points scored are poor predictors of future performance. Many FPL 'managers' are interested in the statistical analysis of football and will use this to inform their decision making, but there are no tools available for a more casual manager or a manager that does not know which statistics to look at.

My Proposed Solution

My proposal is to create a machine learning model, based on underlying metrics, from which the insights can inform FPL managers to help them build their team. The models will produced an expected points statistic, this will be the primary statistic to help evaluate players, and will be calculated from several metrics depending on which position the player occupies. Expected points from goals are likely to have a linear relationship since the number of points awarded for a goal can be multiplied by the player’s xG. Other metrics that contribute to points will require more complex modelling, for example clean sheets does not have a correlating 'expected clean sheets metric'. There can also be variations on this such as expected points per 90 minutes played, or expected points per player cost.

Final Report (22/09/2022) [Zip Archive]

Publication Form