Stock prediction and quantitative analysis based on machine learning method

Han Zheng

18/09/2020

Supervised by Yukun Lai; Moderated by Matthias Treder

Project Aims & Objectives:

•Select different features from different aspects of the stocks on the A-shares market using the multifactor model.

•Implement the machine learning method to select a basket of stocks with higher predicted return yield.

•Compare the selected stocks return with the CSI 300 Index benchmark return based on historical data.

•Preliminary exploration of automated trading.

Brief description of the project:

The stock market is affected by many variables such as politics, finances or different kind of information which leads to drastic internal changes that hard to predict. Traditional stock analysis usually starts with fundamental or technical analysis manually which is subjected to emotional influences. With the rapid development of the A-shares market, large quantities of data have been generated concerning stocks which are suitable for implementing machine learning methods to analyze and learn from the data to find the internal laws. Quantitative analysis can also focus on thousands of stocks at the same time which has great advantages compared with human, thus has great prospects in the future which is a topic worth studying.

There are multiple ways to get access to the historical data of thousands of stocks on the A-share market such as Wind, Tushare, Rqalpha, Yahoo Finance, CSMAR and so on.

The study is going to start from feature selection. There may be dozens or even hundreds of features that a stock can choose from. Distinguish from sects, features can be split into fundamental and technical analysis. Fundamental analysis based on long-term intrinsic values of a company which can be differentiated into growth and value factors. Basic financial information of the company such as price-earnings ratio, return on equity, gross profit rate are all belong to this category. Whereas technical analysis focuses on short-term methodologies such as price and volume(Li 2019). To find out features with high efficiency, several steps need to be taken, such as feature preprocessing, feature correlation test and so on(Fang 2018). After the process of feature selection, tens of factors should eventually be returned.

The next step is to select a basket of stocks with higher predicted return yield using the machine learning method. The project will select one of the machine learning methods that has better performance in stock selection. Tuning process of the hyperparameters and its influence, model selection and feature number selection also can be done within this part. After getting the results, comparison and analysis between the selected stocks return yield with the CSI 300 Index benchmark return yield based on historical data could be conducted.

Usually, automated trading sends entrusted orders for execution through API. Currently, quantitative transaction API has not opened to the public market by the government yet. However, automatic trading system is an essential part of quantitative analysis. It should include functions such as buy, sell, get position, get current price and so on. There is no particularly mature solution to this at present. Python library easytrader has implemented pywinauto library to simulate the keyboard and mouse actions to complete these progresses which gives inspiration, however, there are still many problems in details in actual use. Exploration can be taken further into this to see if there are any potential better solutions.

Resource Requirement (hardware/software requirement):

Python, Quantitative financial platforms' API

References:

Fang, P. 2018. [A stock prediction and quantitative investment system based on machine learning]. MSc Dissertation, Zhejiang University. (in Chinese)

Li, Y. 2019. [Design and Implementation of Stock Intelligence Forecasting System based on Deep Network]. MSc Dissertation, Northwest University. (in Chinese)

Final Report (18/09/2020) [Zip Archive]

Stock prediction and quantitative analysis based on machine learning method

Final Report (18/09/2020) [Zip Archive]

Publication Form