Keras provides an abstraction layer that allows you to dive into testing much more quickly. It can handle several Machine Learning platforms. Our choice is TensorFlow.
To begin with supervised learning, the first step is teaching the ML the perfect output given the train data. Since you have all the training data, the best idea is to act as if you could see the future.
Generate outputs vector Y with the values you would like the machine to give if it too could see the future.
As a first approach, buy/sell can be treated as a binary classification problem. The closer to 1 the prediction, the more likely it’s a good time to sell, conversely, as the prediction approaches 0, it will be a good time to buy.
You can for example filter the data. All filtering induces a delay, but since you now play as if you could see the future, you can undo that delay to generate ideal outputs. For example, SMA induces (m+1)/2 age on average while EMA induces (1-α)/α.
It is very important to make appropriate choices of input features depending on available data:
- Beware sparse classes: They will have either little influence, or cause your model to overfit. Danger here, over-fitting the model is the No. 1 cause of bad real-life performance.
- Look for correlations. Correlations with the target variable are desired, correlations between input features might indicate that they could be merged.
- Pivot, shift and melt data rows as needed. Generate an structure that makes it easy to work with data. Pandas dataframes are very convenient for that.
These are parameters that you must decide before fitting the model, because they can not be learned from data.
You will want an estimate of how well the model represents data. X-fold validation splits data in X sets, trains it with X-1 and tests the results against the remaining one. Using cross-validation, you can test many algorithms, with many hyperparameter values, and find out which combination performs better.
It tends to overfit as the number of input features rise. As it seems intuitive, it can’t adapt to non-linearities.
A first step towards better handling this is making the coefficients similar in size. Common types of algorithms are Lasso, Ridge and Elastic-Net. To better adapt to non-linearities, we can use Decision Trees. To avoid the tendency of overfitting in this case we would use Bagging or Boosting algorithms such as Random Forest or Boosted Tree.
— Garage inside Garage (@GinsideG) March 20, 2018