- A group of researchers questioned whether OpenAI’s GPT-4 could effectively perform financial analysis.
- The researchers found that in some cases the tool could outperform humans.
- GPT-4 performed better in financial analyses of larger, more mature companies, the researchers told BI.
According to a study OpenAI's GPT-4 can perform financial statement analysis and, in some cases, predict a company's future performance more accurately than human analysts.
Three researchers at the University of Chicago Booth School of Business, Alex Kim, Maximillian Mun and Valery Nikolaev, say GPT-4 Because financial statements couldn’t be analyzed purely in numerical terms, the researchers didn’t provide textual context for their large-scale language models.
The study found that the analysis did not include text that typically accompanies quarterly earnings reports, such as the management discussion and analysis (MD&A) section. “Textual information can be easily integrated, but our main interest is in understanding the LLM’s ability to analyze and integrate purely financial numbers.”
The researchers looked at more than 150,000 firm-year observations (data collected on a company over a one-year period) for roughly 15,000 companies between 1968 and 2021.
Mann told Business Insider that the data allows him and his colleagues to evaluate the performance of financial analysts' predictions.
For example, the study found that analysts were 53% accurate in making one-month forecasts about the direction of future earnings.
According to the study, the researchers We anonymized the data by creating financial statements without text information so that the model would not know which company's data it was analyzing.
When the GPT was given “simple” prompts that didn't use “chain-of-thought commands” — researchers asked the model to answer without breaking down the request into step-by-step instructions — the model's accuracy was slightly lower than that of the analysts, at 52 percent, the study said.
But when the researchers used thought-chaining command prompts, the model's performance changed: By giving GPT more instructions and guidance, the model achieved 60% accuracy, according to the study.
The study found that by giving GPT more instructions and approaching the analysis more like a human, the model “is able to outperform human analysts,” even without important text information typically found in financial reports.
The researchers also noted in their study that financial analysis and forecasting is a highly complex task that requires judgment, common sense, and intuition that can stump both humans and machines, which may explain why neither group achieves anywhere near 100% accuracy in their analysis.
Will analysts be replaced by AI?
An anecdotal observation that Muhn shared with BI, but not shown in the study, is that GPT seems to excel at analyzing large enterprises — think of a company like Apple, he said.
“For example, large companies like Apple appear to perform relatively better. This may have to do with the fact that, in general, and as shown in the prior literature, large and mature companies are inherently less specific,” Moon said.
For example, for smaller biotech companies, factors such as the success of clinical trials can make profitability fluctuate widely from year to year, making it difficult for GPT to make predictions based on financial statements alone, the researchers said.
The study also noted that GPT-4 may be more effective in its analysis because humans can be biased, such as when it comes to taking in information rationally.
Kim acknowledged to BI that there is bias in the LLM program, but said it is hard to define bias because people sometimes talk about political or positive bias in the model.
“But if the law programs had very strong biases in making earnings-related predictions, they would have been very poor at predicting outcomes,” Kim said in an interview. “But on average, they seem to have done pretty well.”
So, the billion-dollar question arises: What is an LLM? Replacing human financial analysts?
“At this point, I would say no,” Kim told BI. “It's still a compliment. Technology develops over time, and who knows what it will be like after that. Two years ago, I never thought we'd see technology like this.”
Kim stressed that the research does not suggest that analysts will be replaced by machines, but that there are currently areas where human experts can perform better, and vice versa.
Still, the study offers a glimpse into a tool that financial analysts may soon have at their disposal to more accurately judge a company's health.
OpenAI and Apple did not immediately respond to requests for comment.