Late Lucid Lectures Guild

Science, softly spoken.

Jing Gao

  • Understanding Behavioral Biases in Machine Learning Predictions of Corporate Earnings

    Behavioral Machine Learning? Computer Predictions of Corporate Earnings also Overreact

    By Murray Z. Frank, Jing Gao, Keer Yang

    DOI https://doi.org/10.48550/arXiv.2303.16158

    Abstract

    Machine learning algorithms are known to outperform human analysts in predicting corporate earnings, leading to their rapid adoption. However, we show that leading methods (XGBoost, neural nets, ChatGPT) systematically overreact to news. The overreaction is primarily due to biases in the training data and we show that it cannot be eliminated without compromising accuracy. Analysts with machine learning training overreact much less than do traditional analysts. We provide a model showing that there is a trade off between predictive power and rational behavior. Our findings suggest that AI tools reduce but do not eliminate behavioral biases in financial markets.

    Overview

    This paper investigates how machine learning algorithms, which are known for their superior accuracy in forecasting corporate earnings compared to human analysts, may still inherit behavioral biases—specifically, an overreaction to news. Although these algorithms can be up to 15% more accurate, the authors find that they systematically overreact, much like human forecasters do. The paper explains why this happens and discusses a trade-off between predictive accuracy and rational (i.e., unbiased) behavior.


    Key Sections and Their Insights

    • Main Claim:
      Machine learning (ML) models outperform traditional human analysts in predicting earnings but tend to overreact to both firm-specific and market-wide news.

    • Root Cause:
      The overreaction stems primarily from biases in the training data. Since historical data include human behavioral biases, these biases get passed on to the ML models.

    • Trade-Off Highlight:
      Attempts to reduce overreaction (making predictions more “rational”) lead to a reduction in overall predictive accuracy.

    • Additional Finding:
      Analysts with a technical background who are more familiar with ML tools show less overreaction compared to traditional analysts.

    Introduction

    • Setting the Stage:
      The paper challenges the common assumption that because machine learning algorithms lack human emotions, they must naturally produce rational predictions. The authors ask whether ML forecasts truly meet the standard of rational expectations—that is, being unbiased and efficiently incorporating available information.

    • Context and Data Sources:
      The analysis is based on several datasets:

      • Earnings forecasts from a well-known financial database.

      • Firm financial data and stock returns from comprehensive market data sources.

      • Macroeconomic indicators from a Federal Reserve Bank dataset.

      • Analyst background information collected manually from professional networking sites.

    • Empirical Focus:
      The study uses historical data spanning 1994 to 2018 to compare predictions made by machine learning algorithms (like gradient boosting regression trees, neural networks, and even large language models such as ChatGPT) against those made by human analysts.

    • The Overreaction Test:
      The authors apply a standard framework (inspired by earlier behavioral finance research) to test whether the forecasts systematically “overshoot” in reaction to new information. In simple terms, if a company announces positive news, a rational forecast would adjust predictions moderately, but an overreacting forecast would adjust them excessively.

    Methodology and Technical Explanations

    • Machine Learning Models:
      The paper primarily discusses the gradient boosting regression tree method. This approach works by:

      • Starting with simple decision trees (a tool that splits data based on features).

      • Iteratively correcting the errors of previous trees (a process called boosting).

    • Combining all the trees’ predictions to produce a final output.Example: Imagine predicting a company’s earnings like predicting the temperature for tomorrow. A simple decision tree might look at factors like today’s weather and season. With gradient boosting, the model first makes a rough prediction, then examines where it was wrong (for example, if it underestimated the chance of rain) and adjusts its prediction accordingly. This iterative refinement leads to more accurate forecasts overall.

    • Regularization:
      Regularization is a method used to prevent overfitting (i.e., making a model too tailored to past data). However, the paper shows that regularization techniques can also inadvertently underweight certain signals, which contributes to the overreaction issue.

    • Empirical Tests:
      The study employs linear regression and other statistical methods to compare the forecast errors of ML models and human analysts. A key idea is that if forecasts were truly rational, the errors would be random and uncorrelated with any available information. Instead, they find systematic patterns in these errors, indicative of overreaction.

    Findings and Implications

    • Systematic Overreaction:
      Despite being more accurate on average, ML predictions react too strongly to new information. This overreaction happens for both company-specific news and broader market news.

    • Influence of Training Data:
      The training data that feed into these ML models are themselves biased because they come from historical records that reflect human overreaction. Thus, even in the absence of emotions, ML models can learn and replicate these biases.

    • Comparison with Human Analysts:

    • Traditional Analysts: Tend to overreact more compared to the machine learning models.

    • Technically Trained Analysts: Those with a background in machine learning and advanced statistics tend to produce forecasts that are less biased.

    • The Trade-Off:
      The paper establishes that there is a delicate balance:

    • High Predictive Accuracy: Requires using the full power of the training data, including its biases.

    • Reducing Overreaction: Can be achieved by adjusting the algorithms, but this comes at the cost of losing some predictive accuracy.

    • Broader Implications:
      This study challenges the assumption that machine learning is free from behavioral biases. It also provides insights for financial professionals on how to balance accuracy with rational forecasting, and it emphasizes the need for careful handling of training data to mitigate inherited biases.


    Conclusion

    The paper concludes by reinforcing that machine learning algorithms, despite their technical superiority in many respects, are not immune to the behavioral biases that affect human decision-making. The key takeaways include:

    • Behavioral Bias in ML:
      ML models overreact to news because the training data are biased. Simply put, the models are learning from data that already include human errors.

    • Trade-Off Reality:
      There is a fundamental trade-off between achieving high predictive accuracy and maintaining unbiased, rational forecasts. Attempts to correct overreaction may compromise the overall performance of the model.

    • Future Directions:
      The findings suggest that efforts to improve financial forecasting should focus not only on increasing accuracy but also on addressing the inherent biases in the data. This may involve developing new techniques or modifying existing ones to balance these competing goals.



    Final Thoughts

    In simple terms, the paper reveals that while machine learning can predict corporate earnings better than human analysts on average, it isn’t perfect. The algorithms tend to “overreact” to new information albeit less strongly, partly because they learn from historical data that include similar overreactions. This insight is important for anyone relying on ML for financial forecasting, as it highlights the need to consider both the power and the limitations of these advanced tools.

    Prompt used in ChatGPT

    For each EPS announcement month of a given firm, we use the information 12 months before that date to make predictions. We do this instead of using all dates prior to the announcement in order to save computation time. Then for each firm-month, we submit the above prompt to ChatGPT-4o, and save the EPS it gives back to us. We use this EPS as the AI prediction. Then we calculate the difference between this and the actual EPS to get the AI prediction error.

    Below is an example of the prompt we used to generate GenAI prediction:

    You are a helpful assistant. I would like you to make predictions about firm’s earnings per share using the information I give you. I know the following information about a public firm in the US. Please predict the firm’s earning per share. Please just give a single number of EPS, and I do not want a range. When you report the EPS, please keep 3 decimals and use the following format: ‘the eps I predict is 4.000’