Bricks that look like data

Safe Withdrawal Rates for Aussies — Part 2: Building and validating a data set (updated)

Posted by on 17 November 2018 in Safe Withdrawal Rate Series

Ever heard of the phrase “garbage in, garbage out”? It’s the idea that no matter how good the analysis is, if the data is wrong then the results will be wrong too.

The challenge with safe withdrawal rates is that you need historical data to run the calculations. And historical data, particularly early data from the late 1800s and early 1900s, is notoriously difficult to find. So before I dive into the deep, dark chasm that is safe withdrawal rate calculation, I’m going to spend some time ensuring that I’m using solid data.

Apologies in advance. This is going to be a dull article for those of you who aren’t interested in validating data sources. But it’s a necessary evil.

How did I validate the data?

U.S. Data

I have two options when it comes to U.S. data. I can either try to stitch together a consistent and comprehensive U.S. inflation, equities and bonds data set. Or I could just validate that Big ERN’s data is accurate and use that.

So yeah, I’m going to validate Big ERN’s data us accurate and use that.

U.S. CPI/Inflation Data

The simplest way to validate a data set is to cross-reference it with other data sets. If the data is broadly consistent, then happy days. If the data is all very different, then good luck figuring out which one is correct.

I compared Big ERN’s inflation data against two other sources.

The first source comes from Professor Robert Shiller, who is the Sterling Professor of Economics at Yale University and the winner of the 2013 Nobel Prize for Economics. Professor Shiller shares the data that he used in his book Irrational Exuberance on his website. The data set includes U.S. historical inflation, stock and bond data, some of which I’ll also use to validate other data later on.

The second source comes from the Jordà-Schularick-Taylor Macrohistory database. This data set covers 17 advanced economies and includes inflation, equity returns, bond yields, house prices, GDP, consumer lending, business lending, and more. It’s literally a gold mine of financial data (please do not email me to tell me that it’s figuratively a gold mine).

All three of the sources that we are looking at report U.S. inflation as indices with different starting points. So to compare on a like-for-like basis, I reset each index to 100 and plotted the data.

Wow. The data sets are so similar that it’s difficult to make our each of the lines on the chart. I since learned that each data set sourced the majority of their data from the U.S. Bureau of Labor Statistics, so I’m confident that it’s accurate.

U.S. Equities Data

I used the same approach to validate Big ERN’s U.S. equities data. But I added a couple more data sets for comparison.

The trick with equities is that most indices only measure capital/price growth (this was explained in detail in the Ultimate Guide to the Australian Share Market article) but we want to measure both capital growth and income/dividend growth.

In addition to the Jordà-Schularick-Taylor and Shiller data I used for inflation data, I also compared Big ERN’s equities data with data made available by Professor Aswath Damodaran of the Stern School of Business at the New York University, as well as the S&P500 Total Returns Index.

As you can see in the chart below, the data sources show quite different results. The JST, Shiller and Damodaran indices are much, much lower than the S&P500 Total Returns Index. It looks like they are measuring the S&P500 Price Index. They do not account for dividend income.

You will also notice that the Big ERN data correlates perfectly with the S&P500 Total Return Index data. I was only able to source the index data from 1989 onwards but I think it’s safe to assume that Big ERN’s equities data is accurate even for earlier periods.

U.S. Bonds Data

For those of you who are new to this, bonds are fixed income contracts whereby an investor loans money to the bond issuer (usually a Government or corporation) in return for fixed interest payments over an agreed timeframe. At the end of the timeframe, the bondholder will also receive their money back.

The challenge with measuring bonds is that they can also be traded on the secondary market. That is, bondholders can sell their bonds to other schmucks. These bonds are worth more when interest rates fall and worth less when interest rates rise.

This means that any calculation of bonds returns needs to account for both the interest payments (known as yield) and the capital appreciation due to the the price of the bond increasing.

After chatting to Big ERN about his data set, he explained how his bond returns were calculated:

The bond returns are assuming you constantly roll into the so-called 10-year benchmark bond. The reason you constantly have to roll into a different bond is that once the current bond in your portfolio slips too far below 10 years maturity you replace it with the next benchmark bond that’s closer to the 10-year target maturity.

This means that his data should take into account yield and capital appreciation. We know that capital appreciation should be included because ‘rolling’ into the benchmark bond requires constantly re-purchasing that bond.

But I trust no man. So to validate Big ERN’s bonds data we cross referenced it with our old favourites from Jordà-Schularick-Taylor, Shiller and Damorodan, as well as bond returns data from (possibly a reputable source, possibly not?). All of these relate to 10 Year Treasury Bonds.

The complexity of bonds calculations results in some slight differences between each of the data sets. As you can see, the Aswath data understates bond returns slightly. But overall, we can be comfortable that Big ERN’s data is close enough to consensus to be valid.

Australian Data

Finding a complete set of Australian inflation, equity returns and bonds returns data is very difficult.

The biggest challenge is that data records get worse the further back we look. For example, Australian Stock Exchange only came into existence in 1987. Before that, there were multiple state-based exchanges which each listed and traded shares of Australian companies.  This makes it hard to find a single, accurate source of historical Australian equities data.

Similarly, in the early part of the 1900s, most Australian companies offered bonds in London and not in Australia. And even after that, there is a little historical bonds data available.

As you’d expect, there is no complete source of historical Australian inflation, equities and bonds data available for the average punter. If you’re flush with cash and are willing to drop $7,500 AUD on a dataset, you can check out the Dimson, Marsh and Staunton data set. But there’s no way on Earth I am willing to do that. I’m going to build my own.

Australian CPI/Inflation Data

There’s one ‘source of truth’ when it comes to Australian inflation data and that’s the Australian Bureau of Statistics (ABS). Ideally we’d use that data and call it a day. But the ABS only report data back to about 1948, which is a real pain.

Fortunately, I found two sources of inflation data that cover from about 1870 to the early 2010s. The first is the Jordà-Schularick-Taylor data set that I spoke about in earlier sections. This is the same source that I used to validate Big ERN’s data.

The second is an academic paper by Braisford, Hanley and Maheswaran that was published in 2012. The paper itself isn’t particularly relevant. But buried deep in an appendix is yearly summaries of the $7,500 AUD DMS data set — cha-ching!

As usual, I converted all the data sets to an index and plotted them on the chart below.

The chart shows the JST data set is consistent with our ‘source of truth’ at the ABS. Surprisingly, the DMS data (which costs a small fortune) looks completely off-base. Glad I didn’t buy that shit!

Australian Equities Data

You should recall from the U.S. Equities section that we want to measure both capital appreciation and dividends from equities. This means that we should be comparing our data to the official ASX200 Accumulation Index (XJOA).

Again we will compare the JST and DMS data sets to our source of truth. And again I converted all the data sets to an index and plotted them on a chart.

The chart shows that the DMS data set is consistent with the official XJOA index. Can’t get much better than that.

The JST data set looks like a complete cardiac flatline. That’s not surprising though; we discovered in the U.S. Equities section that the JST data set doesn’t take into account dividends and thus severely understates equity returns. Into the bin you go!

Australian Bonds Data

I’m starting to hate bonds. Not only because Big ERN tells me that they both low return and not particularly useful as a defensive asset, but also because it’s impossible to find a good source of bond returns.

It’s quite easy to find bond yields but that’s only half the equation. We’re still missing the return from price appreciation.

The very generous Professor Damodaran of NYU shared a method of estimating total bond returns on his website. I won’t go into detail about how it works here. But suffice to say that we already showed that his estimates were consistent with actual bond returns data from Big ERN. If you can’t remember this, shame on you. Go back to the U.S. Bonds section and look at the pretty chart.

And when you’re done with that look at the pretty chart below. I’ve plotted the JST data set against data from the Reserve Bank of Australia. As with the U.S. bonds data, all of these relate to 10 Year Treasury Bonds.

At first glance, it looks like the JST and RBA data diverge. However, they’re actually both using the same underlying data set. This difference is simply an artefact of converting everything to an index, and one line calculating returns of mid-year numbers and the other line calculating returns of year-end numbers.

I told you I’m starting to hate bonds.

And now… what?

If you’ve read this far, kudos! I really didn’t think that anybody else would be as interested in data validation as me (I’m not lying, it’s Saturday night and I’m sitting on my computer plotting indices).

Now that we are comfortable with the data that we will use in our safe withdrawal rate analysis, we can start the analysis itself.

Cross your fingers that the analysis doesn’t spit out anything completely counter-intuitive. You know what that means: the data is f*&#ed. I don’t think I could handle that. It look me like 2 weeks to source and validate this data.

I’ll report back in the next article and show you how it went.

The information in this website and the links provided are for general information only and should not be taken as constituting professional advice. Ordinary Dollar is not a financial adviser. You should consider seeking independent legal, financial, taxation or other advice to check how the website information relates to your unique circumstances. Ordinary Dollar does not guarantee the accuracy of any information on this website, and is not liable for any loss caused, whether due to negligence or otherwise arising from the use of, or reliance on, the information provided directly or indirectly, by use of this website.

There are 1 comments on this article

  1. Avatar for Dan Montgomery


    Awesome research, nice work pulling together such a critical and robust dataset. Would you be willing to share it more widely (or privately) so folks like me can run our own custom analysis using tools like ERNs SWR spreadsheet?

Leave a Reply