Financial statistics are hard. It’s really difficult to determine the average return of individual stocks. You would think defining typical stock properties would be straight forward, but interestingly it’s not, and I fear many have made false conclusions about stock returns.
Take the question: what is a typical return of a stock in the S&P 500? What average do you take to find this answer?
Indices Distort Returns
Indices have a problem, a bias, built into them which causes the average stock return to be lower than they truly are. The problem comes from how stocks fall into and out of the index at only the “bottom” and not the top.
In the S&P 500 there are 500 stocks which are roughly the 500 largest stocks in US markets. This list isn’t stable though. Some stocks go down in value, others go up.
There is a theoretical threshold at the 500th ranked stocks. Stocks are moving across this threshold in both directions. Some falling out of the index and some moving in. Same thing happens in many other indices like the Russell 3000. The key point in understanding this post is that this threshold for most indices is always, and only, on the bottom.1
Now if a stock falls out of the S&P 500 it doesn’t go away. It’s still there. It’s just lower in price now. Same is true of other indices. Falling below the threshold of being included in an index doesn’t mean the stock is gone, just that it’s not large enough any longer to meet the criteria of the index.2
What’s the Average Stock Return?
Ok, so the question is, what is a stock’s average return? How do you compute the average returns of a stock in the S&P 500?
Over a 20 year period, there will be say 2000 stocks that were in the S&P 500 over that period (I just made that number up, don’t quote me). Therefore studies will often average the returns of stocks that are in the S&P 500 over that period. As in, they will take those 2000 stocks, figure out their individual compound growth rate (geometric return), and then average them to determine the returns of individual stocks in the index. But this method understates the true return of stocks.
To understand why, let’s flip some coins.
Index Coin Flip Game
Here’s the game:
- Heads up 30%
- Tails down 20%
- Arithmetic Return = 5.0%
- Geometric Return = 1.98%
Now, what happens if we set up our random coin flips to work like an index?
Start with 100 “CoinStocks”. Each CoinStock begins at $100. With each flip, value goes up or down per the rules above. We will flip a coin 100 times to determine the CoinStock’s final value, however if a CoinStock ever falls below $50, it will be replaced in our index with a new coin, which will “come into the index” at $100.
This way we are simulating how an index works. CoinStocks will start in the index and if they have a bad run, they will be dropped from the index. A new CoinStock will take its place. We will calculate the compound growth of each CoinStock per flip while it is in the index and average the compound growth of the CoinStocks to determine their average compound growth rate.
This is what we get from one trial:
Hmm. Ok that’s quite a bit lower than the 1.98% geometric return we expected.
The standard deviation of a single coin flip is 25%. This test uses 100 x 100 = 10,000 flips. Therefore the t-stat of our sample is:
4.8% / (25%/sqrt(10,000)) = 19.8
which is way outside the range of reasonable randomness.
Of course it’s also easy to see this by just running the random trial again and again and again, and getting nearly the same number every time. It’s never close to 1.98%.
Why is this average so far off from the true underlying average?
The Lower Threshold Distorts The Data
We know what the true geometric return is for this data. It’s 1.98%. So how can our sample miss low by a meaningful amount every time? That can’t be random.
Well, the $50 threshold takes losers out of the sample, and locks them in as losers. If a CoinStock over 19 flips falls down to $43 and then drops out, those losses are locked in forever at -4.3% return per flip.
The CoinStock could still keep flipping over the remaining 81 flips, and if it did it would probably trend towards a 1.9% geometric return, diluting the prior run of bad flips. It’s still likely a loser, but over the full 100 flips it won’t be as bad of a loser.
But because the CoinStock fell out of the “index” the coin’s losses are locked in forever. And then this CoinStock is averaged at the same weight as a CoinStock which was in the “index” the entire 100 flips, giving these losses higher weight to our calculation than a set of CoinStocks which “survived” and had good returns.
Distribution of Returns
Here we can see the geometric returns of every Coinstock from the random trial. These are the values which are averaged.
One hundred and five of the CoinStocks fell below the threshold and were “delisted”. Notice how skewed downwards the results are. The 105 delisted Coinstocks saw an average of 18.5 flips and all lost money. Yet we average them equally with the 100 surviving CoinStocks that saw 80 flips on average. That’s where the skew comes from.
The Distortions are Widespread
This problem exists in more statistics than just the geometric average. The $50 threshold at the bottom of our imaginary index distorts nearly every statistic.
The Arithmetic Average is 5% too low. The standard deviation is over 3 times too high. There is a negative skew, when there shouldn’t be one. Fifty five percent of the CoinStocks are losers, when math says only 18% should be losers.
The method of taking the the average of individual CoinStock’s properties doesn’t seem to be working. It’s not finding anything truthful about our underlying coin. The returns are low, the standard deviations are high, and more stocks than we would expect lose money.
Now is it wrong to say that 55% of the CoinStocks lost money? No, that’s an accurate statement.
But is it correct to extrapolate this and say that CoinStocks have a negative Geometric Return? Nope. That conclusion is clearly incorrect.
How to Fix This
Instead of taking the arithmetic return of geometric returns to determine the geometric return of our coinflips (which should have been a red flag upfront)3, you can take the weighted geometric average of the CoinStocks.4
Product for all CoinStocks {CoinStock’s Compound Growth Rate^(months in index/ total months for all CoinStocks) }
The arithmetic return can also be averaged on a weighted return. If you do this, then the results fix themselves and become nearly dead on to the true statistical values.
False Equivalency Made Within Financial Studies?
Now let’s apply what we’ve seen in the coin flips to the actual stock market and any study that averages stock returns.
When a stock falls out of a sample from the bottom, its underperformance will be locked in. The bottom threshold in a sample ensures this.
Therefore averaging individual stocks overweighs the losers and underweights the winners. It distorts the true return of stocks.5
Some people interpret these averages as being equivalent to the return properties of stocks. But, now you should doubt that conclusion. Averaging the returns of stocks may not tell you anything meaningful about the underlying properties of individual stocks. The threshold may be distorting the statistics. A simple average of stock returns may not be equivalent to the true underlying return properties of the stocks.
This effect is certainly true if the threshold is arbitrary, like the S&P 500. But it’s still true even if the threshold is an actual “end” to the stock. If a stock delists after a few years, it can’t be fairly averaged evenly against a stock which has been around for 40 years. This method overweights the bad returns and underweights the good returns.
Averages are a Bitch
Financial statistics are hard. I’ve spent lots of pages describing the differences between the arithmetic average and the Geometric Average. I’ve talked about how the hot hand fallacy applies to mean reversion in stocks. This is a similar phenomenon.
Systems with thresholds have to be averaged very carefully. Typical averages don’t necessarily reveal accurate information, as we saw above with the CoinStocks. Many samples of financial data use thresholds to define that data, and many times the lower threshold is the most important threshold. This lower threshold distorts the true underlying properties of the returns. Keep this in mind when looking at studies and articles about average stock returns. The thresholds may be distorting the true returns.
If you enjoyed this post, please share it with others and subscribe to the blog at the top right of of this page.
1-Even when an index has a top threshold it’s more common for stocks to cross the bottom of the index than the top.
2-I focused on indices in this post because I felt it was more relatable, and the data I had was from an index. But the mathematical issue is really about having a changing sample with unequal thresholds and not about an index. Therefore it applies to pretty much any study of data in the financial markets.
3-With real life data, this method can be a bit tricky, as a single zero will send the entire geometric return to zero, which isn’t a problem with our coins. This stuff is certainly not simple in real life.
4-A more properly phrased equation:
N-number of CoinStocks
G-Geometric Return
m-months the CoinStock is in the index
M-total sum of all months CoinStocks have traded (10,000 in our examples).
5-The inspiration for this post is from this twitter conversation. Thanks to @therobotjames for the conversation and data on the Russel 3000. You can see his reference to two different returns depending on how you take the average. One average is far, far different than the other.
This conversation lead me to re-investigate the famous Bessimbinder study which shows that most stocks underperform the risk free rate. There’s nothing wrong with the study per se. Most stocks do underperform the risk free rate. However, I get the sense that most people interpret the study to mean that we should expect an individual stock to have a negative geometric return, and I’m not sure that’s a correct interpretation. Hopefully you can see from the coinflip example that the negative geometric return can be a function of the threshold, and not an indication that the true statistical properties of the stocks are negative.
> The problem comes from how stocks fall into and out of the index at only the “bottom” and not the top. […] Same is true of other indices.
If you want an index where stocks also fall into and out of the index at the top they exist: S&P 400, S&P 600, Russell 2000…
And even for the S&P 500 is not true that stocks go in and out only at the bottom. For example, Tesla.
Yes, those indexes have both a top threshold and a bottom threshold. In those cases, what matters is which threshold is it more common for stock to “cross”. Do more stocks leave the index by falling out of the bottom or moving through the top? Whichever direction is more common is the direction the index will be biased.
Tesla came in FROM the bottom because it’s price ROSE enough for it to be include. Yes, its price rose so much that it entered as one of the largest stocks, but the threshold it crossed to get there was from stock 501 to 500. And whenever it leaves the S&P 500 it will fall back through the bottom threshold.