This project aims to create a historical data set for NCAA player statistics and combine this with NBA draft information to predict how a rookie can do in challenging NBA? A "Boom" or a "Bust"? The created data set will be used to train a machine learning algorithm and the predicted target will be a "rating point" - namely the expected Career Rating (xCR) - that potentially shows how this player is going to climb the steps in the early years of their career.
This project requires handling a huge amount of data, scraping it from various internet sources, cleaning, preprocessing, and various other data science pipeline stages.
Knowledge of statistics and general basketball rules is essential. This project requires good programming skills (preferably in Python but other tools like R can also be used), and knowledge of machine learning tools such as sklearn, pandas, statsmodels, and pyMC3.