WallStreetBots: Stock Price Prediction

Description

Price prediction and portfolio/market analysis using NLP and reinforcement learning.

Introduction

Creating a stock trading AI is far more sophisticated than predicting future prices based on numerical methods such as momentum or machine learning methods such as LSTM or transformers. While the latter involves pure data analysis and machine learning, the former covers much greater scopes, including price prediction, portfolio management strategies, market condition analysis, stock sentiment analysis, data communications, and server deployment. In this project, we aim to create an AI that is practically useful—yes, that means it makes money. We will try a variety of ML algorithms, from simple RNN to NLP, and hopefully reinforcement learning—and in the end, our work will be deployed as a bot to trade with paper money. Will it be able to make money? Let’s find out!

Framework

Dataset

To obtain real-time data such as order types, store stock information, and simulate the trading environment realistically, we built our framework on top of the Alpaca paper trade API. Whenever a user places an order at wallstreetbots.org, the same information is updated at Alpaca paper trade and vice versa.

Webapp

The web app was built using the core technologies of Docker, Python+Django, and PostgreSQL. The web app is roughly split into the basic CRUD functionality of the dashboard for showing portfolios and entering orders, and the actual stock trading engine that makes the decisions. ML strategies are fed into the generic pipeline that is called by the web app every day to automatically rebalance for each user. Each new machine learning strategy inherits from a generic strategy and overrides with a different rebalancing method. This ensures that adding new strategies is as easy as possible.

Functionalities

Our ML models predict next-day closing/open stock prices by probing into both traditional market data and sentiment analysis from real-time news headlines and Reddit discussions. The predictions are then interpreted by our portfolio balancing pipeline, where we place trade orders based on the differences between the optimal weights and our current portfolio weights each day.

Architecture

The image above summarizes the entire pipeline of WallStreetBots.

Besides the market data obtained from Alpaca, we also pulled textual information for sentiment analysis from:

NYT News Archive

FinViz News Headlines

r/wallstreetbets Daily Discussion Comments

RSS feeds from various news sources

These textual information are combined with traditional market data when fed into our distilled LSTM model. The input features are summarized in the table below:

Results

Performance of our best model -- Distilled LSTM

Future Works

Building on top of the success of this project in 2022, we will be expanding our framework to the market of cryptocurrencies, where quantitative analysis is expected to yield much better correlations. We are expecting to perform price predictions for the 14 most common cryptos by the end of this year. For more information regarding our proposal and recruitment, please visit https://utmist.gitlab.io/projects/wallstreetbots2/.

We also plan to improve our textual analysis pipeline by incorporating Twitter data, which has been shown to influence stock prices over a wide range of time spans. Our NLP subteam will be experimenting with novel methodologies this year to understand the correlations between tweets and their temporal influence on stock/crypto prices.

The Team