Leveraging Ensemble Learning for Calculating Expected Goals (xG)

Chaitanya Sanjay Taru


Supervised by Oktay Karakus; Moderated by Xianfang Sun

The expected number of goals, or shortly the xG, is one of the most important football statistics that can be used to analyse today’s football in a quantitative manner. Its importance and usage have greatly increased during the last couple of years, and xG has a crucial meaning for football teams, scouting, betting firms, etc.

Particularly, the xG is a football statistic which calculates the probability of a shot being converted into a goal. This allows match results to be explained better using predictive analysis. As in football, the scoreline does not always indicate all that there is to know about a fixture. Since goals are not that frequent in a game, it is worth assessing the value of chances to indicate the likelihood of a win based on the chances in the game. Each shot event is scored as a probability and combined to create an aggregate scoreline per fixture.

Despite this importance, the math behind the xG models is less known, and for commercial issues, most companies, and football teams are keeping their models a secret. This project is asking a crucial question: whether a better model of xG is available? What kind of statistical aspects of football has a high likelihood, which do not? Can something new be done, or a new model be developed to compete with leading companies such as Statsbomb?

This project will basically try to find answers to these questions. We will develop an ensemble learning approach to model the xG statistic in football matches. The high amount of football data statistics belonging to teams and players will be investigated. Detailed data analysis and fusion approaches will be developed to propose a robust ML model for modelling the xG statistic.

The aim of the project is to: ** Build a competitive, more reliable and robust ensemble learning model in comparison to the currently existing ML models. ** Allow information on xG and football analytics to be more available for academic institutions, as the industry is currently privatised.

Final Report (22/09/2023) [Zip Archive]

Publication Form