Analysis of Refinitiv data reveals that over one million restatements and reclassifications have been made to original filings since 2010. We explore how this reinforces the need for quants and data scientists to implement point-in-time data.
- Since 2010, Refinitiv has processed 1,002,184 restatements and reclassifications globally.
- These statistics demonstrate how standard datasets can create biases and inaccuracies in backtests and machine-learning models — and why quants and data scientists should instead use point-in-time data.
- Refinitiv offers the industry’s most comprehensive range of point-in-time data, including exclusive new ESG data.
For more data-driven insights in your Inbox, subscribe to the Refinitiv Perspectives weekly newsletter.
Data shows that, over the last decade, the number and distribution of reclassifications and restatements has been significant. The adjustments affect most major global markets — as shown below — and the figures for more recent years are expected grow further as time goes on.
The reasons for reclassifications and restatements vary — from mergers and updates to accounting policies, to mandated changes from auditors and regulators.
The impact, however, is always the same: Any backtests or machine-learning models based on this data will be inaccurate, because the figures available today are different to those that were available in the past.
Point-in-time data for restatements and reclassifications
Point-in-time is the only dataset that can overcome the challenges presented by reclassifications and restatements.
It is unique in including all reported values in the order they were published, timestamped to the moment they were made available to the market. Consequently, the original values are not overwritten by reclassifications and restatements, and the data gives a more realistic view of history.
This allows tests and models to reference the values that were actually available at the time — not those that came later.
Standard datasets, on the other hand, include only the latest reported values, and overwrite original values with updates and adjustments.
Evaluating the success of strategies and models
When training specific machine-learning models and backtesting quantitative strategies, an old adage remains true: Garbage in, garbage out. That is to say, a foundation of clean data is vital to ensure that results are accurate and new strategies or models are effective.
When being used historically, standard datasets are far from clean.
Beyond the issue of restatements and reclassifications, there are often additional inaccuracies — for example the publication date of a data point being recorded as the end of the reporting period. As Marcos Lopez de Prado explains in Advances in Financial Machine Learning (2018, John Wiley & Sons, Inc.), in reality the publication date is always significantly later.
He says: “Fundamental data published by Bloomberg is indexed by the last date included in the report, which precedes the date of the release (often by 1.5 months). In other words, Bloomberg is assigning those values to a date when they were not known.”
Lag assumptions are often implemented during backtesting to account for these reporting delays and make data more realistic. However, these lags are also imprecise.
Complicated by filings regulations that change over time and differ between countries and companies, they are often an ineffective solution to the problem and lead to further errors — as shown in Figures 6 and 7.
Another common problem arising from standard datasets is incorrect quintile assignment.
As represented in Figure 26, on average a significant 40 percent of companies are mis-assigned. This results in those companies being wrongly identified as the most attractive (or vice versa), and the wrong stocks being selected for buying and selling.
These inaccuracies can all lead to projected performance being very different to the reality — as demonstrated in the chart below.
Here, two tests of the same strategy (one completed using standard data, and one using point-in-time data) deliver notably different portfolio performance projections due to companies being incorrectly assigned to deciles in the standard dataset.
As Saeed Amen explained, these misleading results could cause organizations to invest in a strategy or model that is not as effective as tests suggested.
Point-in-time data is the only solution to ensure that organizations are empowered to make more informed decisions about which strategies or models to pursue.
Comprehensive range of point-in-time data
With Worldscope Fundamentals, Refinitiv Financials, I/B/E/S, Economics and Reuters real-time news, Refinitiv offers a broad range of point-In-time data. And, as the demand for alternative datasets continues to grow, we are able to offer ESG point-in-time data — exclusive to Refinitiv in an upcoming launch through our partnership with MarketPsych.