25

5 Reasons Why Stock Prediction Projects Fail

 4 years ago
source link: https://towardsdatascience.com/5-reasons-why-stock-prediction-projects-fail-a3dddf30d242?gi=957c1cd99043
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

ZnYZj23.jpg!web

Photo by Patrick Tomasso on Unsplash

W ith the resurgence of machine learning and artificial intelligence, never has it been easier to implement predictive algorithms both new and old. With just a few lines of code, state of the art models can be readily accessible at the fingertips of the budding data enthusiast, ready to conquer whatever insurmountable digital task may lay at hand. But a little bit of knowledge can be a dangerous thing. While much of machine learning can be attributed to statistics and programming what is equally important, but often skipped over in favor of instant gratification, is domain knowledge.

Nowhere is this more true than in investing. While the environment is rich with stock price, and fundamental data that is both accessible and free, indiscriminate application of pre-processing techniques and machine learning algorithms will produce indiscriminate results. Financial time series data are incredibly nuanced with the signal to noise ratio systemically low, practitioners spend their careers trying to achieve the elusive aim of generating consistent outperformance, with only a few succeeding. Thus the need for a more intimate understanding of the data is pertinent to achieving some semblance of success. As such, this article aims to shed light on some common reasons why stock prediction projects may fail once put into production.

1. Selection Bias

Many projects start with the arbitrary selection of a stock with which an algorithm is to be applied to, this stock is often a tech stock such as Apple or Amazon, the simple reason being these companies are well known and ingrained in the everyday lives of consumers. This is problematic as stock selection is not an arbitrary process, it is part of the investment decision making process that requires a model in itself.

Take Apple for example, if we look at its performance in 2019 vs the broader SP 500 index we see that it outperformed the index by nearly 60%.

myuYfyr.png!web

Apple vs S&P500

the profile is broadly the same for Amazon, Microsoft and Google as the US tech was the best performing sector over 2019. Arbitrarily picking a stock from that sector as a starting point will materially misrepresent the characteristics of the investment opportunity set.

2. Portfolio Construction

Controlling risk is as important to a robust investment strategy as generating returns. If stock selection is the first part of the investment process, then portfolio construction is the vital next step. Many projects will suggest a strategy buying or selling a particular stock but often with the assumption 100% of the potential portfolio will be invested in that stock. This is rarely the case in practice, a single exposure leaves an investor vulnerable to a tremendous amount of concentration risk. Prudently constructed portfolios are well diversified as it is one of the most important sources of risk control. A viable machine learning investment strategy should consider both stock selection and portfolio construction.

3. Incorrect Application of Pre-processing

Standard rinse, wash and repeat data pre-processing techniques like standardization cannot be directly applied to stock prices. The below plot of the yearly distribution of S&P500 price levels should give some intuition as to why:

jUjy6b2.png!web

S&P500 Index Price Levels from 2007–2008

Within the standard train/test split paradigm of machine learning, pre-processing is applied by taking a transformation, using the parameters of the training set, applied to the test set with the explicit assumption that the training and test samples are drawn from the same distribution .

We can see clearly that the distribution of stock prices change from year to year , meaning that the mean and standard deviations will also change. This property of financial time series is called non-stationarity and it remains an open problem in financial prediction. It can also be observed that the distributions are rarely normal, rendering parametric measures such as mean and standard deviation meaningless.

In addition, applying other common procedures like min-max normalization does not solve this problem as the lower bound will also change from year to year and there is no theoretically upper bound on prices. Practitioners will often apply a price differencing transformation (stock price returns) however this does not entirely remove some of the non-favorable properties of stock prices.

4. Look Ahead Bias

While nowadays it only takes a couple of lines of code to pull down a meaningful history of stock and macroeconomic fundamental data we need to be cognizant that this data is plagued with look ahead bias. Frequently, observations associated with particular dates would not have actually been available at that date. For example stock fundamental data is reliant on reporting quoted as at an effective date which usually corresponds with company’s fiscal calendar, however, this reporting is not released until months after the effective date, reflecting the time for preparation.

In macroeconomic data this bias as a result of revisions to prior period data that are made sometimes a quarter after the initial information was released. This is particularly problematic for short term trading strategies. Any project using these datasets should take into consideration the appropriate lags and revisions.

5. The Project is Unfinished!

Many stock prediction projects will conclude like a regular machine learning project with the disclosure of a performance metric such as accuracy or RMSE with a line plot of the test vs training performance, reasoning that if the two lines are sufficiently close and the error reasonably low then the project was a success. This premature conclusion omits a vital step of discovering a successful strategy; testing the financial outcome . Investing cannot be reduced down to a simple exercise of minimizing an intangible error rate as the consequences of being wrong is very real. The final step should be to backtest this strategy as if it was held through time and calculate profit/loss or returns. The tester also need to consider that if the portfolio had suffered significant drawdowns during testing before recovering, whether they would have had the risk tolerance to proceed with the strategy given the losses.

A simple example can be drawn using an Exponential Weighted Moving Average strategy(EWMA), which incorporates a decaying average of past prices as a prediction of future prices.

BNnAJzE.png!web

At first glance the EWMA predicts the S&P500 extremely well, but if we take a closer look around the period of market drawdown early this year we see things aren’t as they appear to be.

UfuaEzr.png!web

EWMA strategy vs S&P500 return

Even though the blue and orange lines still appear to be closely anchored, the EWMA strategy cannot navigate the intra-day volatility as it only incorporates past information, it looks to be constantly chasing the true price level, often causing it to predict the market to be up when it is actually down and vice versa. Following this strategy over this period would have underperformed the S&P500.

Conclusion

Before embarking on a stock prediction project, especially one that you intend to invest real money in, it pays to do some prior research on the subject and to understand the data. If the results are too good to be true(e.g accuracy materially over 50%) than it probably is, with the number of participants and the increasing sophistication of these participants, the market is extremely efficient with price discovery, especially in stocks. Although this might not preclude the possibility of potential opportunities, it does mean that it requires a bit more effort than an out of the box algorithm and standard pre-processing techniques to find it.

Disclaimer: t his post is purely an expression of personal views and opinions. It does not represent advice in any way.


Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK