Stock Market Prediction with OpenAI and Economic Events

Diving deep into the forces that impact the stock price direction with AI

Nikhil Adithyan

Published in

Level Up Coding

10 min readApr 2, 2024

Introduction

Amidst the rise of Artificial Intelligence (AI) and Machine Learning (ML), the task of analyzing vast troves of economic data has become not only feasible but increasingly efficient.

In this article, we are going to delve into the realm of leveraging AI, particularly through the utilization of the GPT-3.5 model, in conjunction with economic event data sourced from EODHD’s Economic Events Data API.

So, without further ado, let us embark on this interesting exploration of the intersection of OpenAI and economic events for predicting the movement of the stock market.

Understanding Economic Event Data

Economic event data is the foundation for understanding and predicting economic trends. Imagine it as a giant puzzle with many pieces, each piece representing a significant economic event. By fitting these pieces together, we can get a clearer picture of the overall economic health.

What economic events are included?

Economic event data encompasses a wide range of important indicators like:

Gross Domestic Product (GDP) Growth Rate: Measures the economy’s overall growth or decline.
Unemployment Rate: Indicates the percentage of the labor force that is unemployed.
Consumer Sentiment Index: Reflects how confident consumers are about the economy, impacting spending habits.
Manufacturing Output: Tracks the production of goods in a country, signifying economic activity.
Interest Rates: The rate at which banks lend money, impacting borrowing costs for businesses and consumers.

What details does each data point include?

Each piece of economic event data typically includes specific details to provide context:

Type of Event: Identifies the economic indicator being measured (e.g., GDP growth rate, unemployment rate).
Country or Region: Specifies the geographic location where the event occurred.
Date and Time: Indicates when the event happened.
Actual Value: The real number recorded for the economic indicator.
Previous Value: The value of the indicator before the current event.
Estimate: The predicted value of the indicator before the actual data release.
Change: The difference between the actual value and the previous value.
Change Percentage: The change that is calculated as a percentage of the previous value.

Example of Economic Event Data Format

Here’s an example of economic event data you would get from EODHD’s Economic Events Data API, presented in a JSON format:

**EODHD Economic Events Data API example output** (Image by Author)

In this example, the data point shows the Markit Manufacturing Purchasing Managers’ Index (PMI) for the United States on October 25th, 2023. The PMI indicates manufacturing activity, with a value above 50 suggesting expansion and a value below 50 suggesting contraction. Here, the actual PMI (52.3) was slightly higher than both the previous value (52.0) and the estimate (51.8), indicating continued growth in the manufacturing sector but at a slower pace.

By analyzing vast amounts of economic event data, we can identify trends, predict future events, and make more informed decisions in various sectors. Now, let’s explore how AI and Machine Learning can be used with economic event data for powerful economic forecasting!

Fetching & Cleaning Up the Data: Preprocessing Economic Event Data

Before we can use economic event data for analysis or AI models, we need to get it in tip-top shape! This process is called data preprocessing. Imagine you have a messy toolbox filled with tools — some rusty, some missing parts. Preprocessing is like cleaning and organizing your toolbox so you can use the tools effectively.

Why is preprocessing important?

Missing Values: Some data points might be missing information (like “change” being blank). We need to decide what to do with these missing pieces.
Data Types: Data might be stored in different formats (e.g., dates as text, numbers as strings). We need to convert them to a consistent format for analysis.
Scaling and Normalization: Values might be measured in different units (e.g., GDP in trillions, unemployment rate as a percentage). We might need to adjust them to a similar scale for better comparison.
Categorical Variables: Some data points might be categories (e.g., “type” of event). We need to convert these into a format usable by AI models.

Step-by-Step Preprocessing:

1. Importing Packages

Importing the required packages into the Python environment is a non-skippable step. The packages which we are going to use are:

Pandas — for data formatting, clearing, manipulating, and other related purposes
NumPy — for numerical functions
eodhd — for extracting the economic events data

The following code imports all three packages into our Python environment:

from eodhd import APIClient
import pandas as pd
import numpy as np

Now that we have imported all necessary packages into our environment, let’s move on to activating the API key.

2. API Key Activation

We are going to use the data provided by the EODHD API. To get access to the data, you need to register on EODHD (it’s free), after which you will receive your active API key.

To implement the API key into our project, use:

api_key = '<YOUR API KEY>'
client = APIClient(api_key)

Replace <YOUR API KEY> with your secret EODHD API key.

Apart from directly storing the API key as text, there are other ways to enhance security, such as utilizing environmental variables, and so on.

3. Fetching Data:

We’ll use EODHD’s official Python library to access the Economic Events Data API. The following code uses the library to extract economic events data:

# Function to fetch data from the API
def fetch_data(start_date, end_date):
    data = client.get_economic_events_data(date_from = start_date, date_to = end_date, offset = 1000, limit = 1000)
    return data

df = pd.DataFrame(fetch_data('2021-01-01', '2023-12-31'))
df.tail()

4. Extracting Features:

We extract relevant features from each data point:

event_type: Type of economic event (e.g., PMI)
country: The country where the event occurred
date: Date of the event
actual: Actual value of the economic indicator
previous: Previous value of the indicator
estimate: Estimated value before the release
change: Difference between actual and previous value
change_percentage: Change as a percentage

5. Feature Engineering (Optional):

We can create new features based on existing ones. For example, we might calculate the difference between the estimate and the actual value.

6. Encoding Categorical Variables:

The event_type is a category. We might convert it into a numerical format that AI models can understand (e.g., using one-hot encoding).

Putting it all Together:

By following these steps, we transform raw economic event data into a clean and usable format for further analysis or feeding into AI models. This allows us to identify patterns, make predictions, and gain valuable insights from the economic landscape.

from sklearn.preprocessing import LabelEncoder

def preprocess_data(data):
    processed_data = []
    for item in data:
        if item['change'] is not None:  # Considering only data points with 'change' value available
            event_type = item['type']
            country = item['country']
            # date = pd.datetime.strptime(item['date'], '%Y-%m-%d %H:%M:%S')
            actual = item['actual']
            previous = item['previous']
            estimate = item['estimate']
            change = item['change']
            change_percentage = item['change_percentage']
           
            # Label encoding for categorical variables
            label_encoder = LabelEncoder()
            event_type_encoded = label_encoder.fit_transform([event_type])[0]
            country_encoded = label_encoder.fit_transform([country])[0]

            # Additional feature engineering can be done here
            processed_data.append([event_type_encoded, country_encoded, actual, previous, estimate, change, change_percentage])
    
    return processed_data

The function called preprocess_data, is designed to pre-process the dataset. It begins by iterating through each item in the dataset, focusing only on those with a ‘change’ value available. It then extracts relevant features such as event type, country, actual value, previous value, estimate, change, and change percentage. Prior to further processing, categorical variables like event type and country undergo label encoding using scikit-learn’s LabelEncoder class, converting them into numerical values. The encoded features are then compiled into a new list named processed_data for subsequent analysis.

Harnessing the Power of GPT-3.5

Imagine having a writing assistant that can not only complete your sentences but also predict future events based on vast amounts of information! This is the potential of GPT-3.5, a powerful language model from OpenAI.

What is GPT-3.5?

GPT-3.5 is a cutting-edge AI model trained on a massive dataset of text and code. It can generate human-quality text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

How can GPT-3.5 help with economic forecasting?

By feeding GPT-3.5 carefully crafted prompts that include historical economic data and relevant context, we can potentially leverage the model to generate predictions about future economic events.

Example Prompt:

Here’s a simplified example prompt we could use for GPT-3.5:

Based on historical data and current trends, predict the change in the Markit Manufacturing PMI for Australia in the next quarter. 

**Context:**

* Recent economic reports suggest a slowdown in global manufacturing activity.
* Australia's central bank recently raised interest rates.

**Data:**

* Include historical PMI data for Australia and other developed economies.
* Include recent economic indicators for Australia (e.g., inflation rate, consumer spending).

Important Note:

While GPT-3.5’s predictions can be intriguing, it’s crucial to remember that they are not guaranteed to be accurate. It’s essential to use them as a starting point for further analysis and combine them with other forecasting techniques.

Generating a Prediction:

In this section, I’ll walk you through the process of setting up the necessary tools and then using the GPT-3.5 model to generate a prediction for a specific economic event. We’ll start by installing the OpenAI Python library, which provides an interface to interact with OpenAI’s language models, including GPT-3.5. Once we have the library installed, we can import the required modules and create a client instance with an API key for authentication.

Next, we’ll define the details of the economic event for which we want to generate a prediction. We’ll store these details in variables, such as event_type, country, and date_str.

With the event details in hand, we’ll generate a prompt for the GPT-3.5 model. This prompt will likely include the event details, as well as any additional context or historical data that might be relevant to the prediction task. The quality and completeness of the prompt can significantly impact the accuracy and usefulness of the model’s response.

To interact with the GPT-3.5 model, we’ll define a function called get_completion. This function takes the prompt and the model name as arguments, formats the prompt into a message suitable for the model, and sends it to the OpenAI API to generate a response. The response from the GPT-3.5 model is then retrieved, and the content of the most likely response (the first choice) is printed to the console.

Example #1:

import openai

# Function to generate a prompt for a new economic event
def generate_prompt(event_type, country, date_str):
    prompt = f"You are a financial expert and have a great knowledge about predicting the movement of the stock market after a certain economic event. A new economic event has occurred. It is an '{event_type}' event in {country} on {date_str}. Predict the change after this event."
    return prompt

# Use GPT-3.5 to predict change
from openai import OpenAI
client = OpenAI(api_key ="YOUR OPENAI API KEY")

def get_completion(prompt, model="gpt-3.5-turbo"):
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model = model,
        messages = messages,
        temperature = 0, # this is the degree of randomness of the model's output
    )
    return response

# Generate a prompt for a new economic event
event_type = df['type'][0]
country = df['country'][0]
date_str = df['date'][0]
prompt = generate_prompt(event_type, country, date_str)
r = get_completion(prompt)
print(r.choices[0].message.content)

Output:

**OpenAI API response** (Image by Author)

The example that is taken into consideration is predicting the impact of an event related to the Inflation Rate that is to take place in the United States. Our model takes these inputs to predict and interpret the movement of the stock market. As shown in the above picture, our model does a great job of assessing the economic event and making a possible prediction of the stock market’s movement.

Example #2:

# Generate a prompt for a new economic event

event_type = df['type'][1]
country = df['country'][1]
date_str = df['date'][1]
prompt = generate_prompt(event_type, country, date_str)
r = get_completion(prompt)
print(r.choices[0].message.content)

Output:

These predictions give us really helpful ideas about what might happen when important things change in the economy, like how much things cost or how well factories are doing. For example, if there’s a big event about how well factories are doing in Australia or how prices are changing in the United States, the model looks at different possibilities and what they might mean for the economy, and how it affects the stock market. It helps us understand what could happen next based on these events to a great extent.

Conclusion

The combination of economic event data API, such as that provided by EODHD, and AI models like GPT-3.5 presents an exciting opportunity for more accurate and insightful economic forecasting and market prediction.

This exploration is a great starter point for a big task of economic event prediction by LLMs. The explanations and interpretations generated by the model can be used by investors, and economists to make calculated, informed decisions in the long run.

Speaking of improvements that can be made, there is plenty of room for that. The performance can be improved using various RAG techniques, improved prompting, fine-tuning the LLM on the dataset, and various other evolving methods in the large language model usage.

In summary, through this article, we just tapped into the world of LLMs and market prediction, and there are a whole lot of things to be explored and researched. This promising field not only excites developers but also traders and investors who now have a great opportunity on capitalizing these emerging technologies.

With that being said, you’ve reached the end of the article. Hope you learned something new and useful today. Thank you very much for your time.