Saeed Amen, a quant and co-author of The Book of Alternative Data (published in July), describes why the concept of understanding point-in-time data is crucial when it comes to performing historical analysis of the market.
- If data is not recorded point-in-time, we could be using ‘future’ data in our backtest, This will pollute the analysis with hindsight bias, which can artificially make results look better.
- It is crucial to ensure that your data vendor provides point-in-time data. This is particularly the case for datasets that are often subject to revision, such as macroeconomic and company fundamental data.
- Point-in-time recording of data is also a factor to be considered when analyzing alternative datasets.
For more data-driven insights in your Inbox, subscribe to the Refinitiv Perspectives weekly newsletter.
What’s ‘point-in-time’ when it comes to data?
Point-in-time data is a very important consideration when performing any sort of historical market analysis. We can start to illustrate how it works through an example, before digging down in the subject further.
How is point-in-time data applied to analysis?
Let’s say we are backtesting a trading strategy and we are using the past five years of historical data as our input.
Our model is assumed to trade once a day, at the market close, and we’ll say we are calculating the trading signal for 1 January 2020 in our backtest. At that point, we should only have data for 1 January 2020, 31 December 2019, 30 December 2019 etc.
In other words, our backtest should only see historical data, not any future data. If we know future data, our results are going to look better.
However, using future data makes a strategy untradeable in a live environment.
An example of a very simple untradable trading strategy, which employs a healthy dose of hindsight bias, would use an input of tomorrow’s stock price. The strategy uses tomorrow’s price to determine whether to buy and sell today. If the price goes up tomorrow, assign a signal of ‘buy’ for today. Things are obvious, if you know the outcome.
If we are trading live, we don’t have any crystal ball to see prices from the future. Hence, we can’t download any future data. However, in a backtest it is more tricky to ensure that future data somehow doesn’t seep into our dataset.
In a backtest, we need to avoid loading future prices from our database, whenever doing a calculation. For example, if we are calculating what the trading signal would have been on 1 January 2020, we need to ensure that the price for 2 January 2020 (or indeed any data generated after 1 January 2020) is not used.
If we are not using point-in-time data, we could end up with a backtest that is not properly representative of a strategy’s historical performance. Any subsequent trading decision based on such a backtest could be made as a result of inaccurate data. We could end up investing in a strategy that is not as attractive as our statistical analysis suggests, because of hindsight bias induced by using data which isn’t point-in-time.
Checking data is recorded point-in-time
However, this is not enough. We need to ensure that any data we are using is recorded point-in-time, so that every data point has a timestamp recorded for when it was collected or updated.
We need to inspect this timestamp alongside each data point, so we don’t inadvertently end up using future revisions when backtesting. Macroeconomic and company fundamental datasets, and alternative data (such as machine readable news) more broadly can often be subject to revision many times.
Let’s take a macroeconomic data release, such as Q1 U.S. GDP (i.e. from 1 January – 31 March 2019), which is released several times for the same period:
- Advance GDP estimate: 25 April 2019
- Second GDP estimate: 30 May 2019
- Third GDP estimate: 27 June 2019
If we want to backtest a trading strategy which uses U.S. GDP as an input, we need to make sure that whatever number we are using for U.S. GDP has been collected before our trading date.
For example, we can only use the advance estimate from 25 April onwards in our trading calculations, the second estimate from 30 May onwards, and so on.
How do alternative datasets affect point-in-time?
When using alternative datasets, we must also be aware of the impact of point-in-time issues.
Let’s say we are constructing a dataset that records the car counts of retailer car parks, which involves collecting satellite imagery. We need to be sure that our observations for car counts have the same timestamps as the images we are collecting, and we are not inadvertently using ‘future’ images to backfill our car count dataset.
Point-in-time data enables investors to assess the evolution of a particular data point, rather than just observing the final revised figure in the data history.
As well as being key for performing an accurate and representative backtest, having access to multiple revisions can provide additional inputs for your model. You can consequently go backwards in time to see different revisions of the same data point, together with a timestamp for the release time.
The issue of point-in-time can seem very subtle. However, ignoring it can render historical market analysis unrealistic and result in investors coming to the wrong conclusions. To prevent running into this problem, it’s crucial to make sure that your data vendor offers point-in-time datasets.