Why the SP500 is better to store value than your bank (Part II) 💸

By simulating the future behaviour of the SP500, we conclude that it is a very strong asset to increase our wealth over time.

7 min readNov 15, 2021

In the first part of this article we simulated a thousand times the SP500 performance over the next 20 years. We obtained a thousand annual returns and we also observed, thanks to the histogram, how these returns were distributed.

In this second part of the article we will get the distribution of the simulated returns in order to calculate some probabilities useful when making investment decisions, such as the probability of obtaining an annual return lower than 0%, or the probability of obtaining an annual return higher than 5%.

Computing the distribution of yields

If we were able to know the distribution of the daily increments, in this case, of the SP500 index, (for example, if we know that the daily increments of the SP500 follow a uniform distribution, i.e., the index rises or falls daily randomly), by the central limit theorem, we could fit the returns obtained in the simulations to a normal distribution. But finance experts have been trying for decades to predict the stock market in the wrong way, which makes us think that it is not such an easy task. In reality, nowadays we do not know how stock market increases are distributed, so we have no way of adjusting the distribution of simulations theoretically. In fact, if we apply a normality test to the results obtained we will see that we obtain a p-value close to zero, i.e. the returns do not follow a normal distribution, which is not so easy to deduce by looking at the histogram. It is for this reason that we must calculate the distribution of the results empirically.

In R, just with a call to the density function we get the probability density function and with ecdf (empirical cumulative distribution function) we get the cumulative distribution function, which is essential to answer the questions we want.

Keeping in mind that the 1000 simulated yields in the first part are stored in a vector called annual.yields, we will calculate both curves thanks to the following code extract:

Thanks to the cumulative distribution function, we can estimate how low the probability of obtaining an annual return of less than 0% in 20 years will be, or how high the probability of obtaining an annual return of more than 5% will be. But these results are formulated based on 1000 simulations, which is a quite small number. This raises the second problem we have to deal with, the number of simulations.

Simulations with apply family functions

The simulations function created in the first part tries to explain, as simply as possible, the methodology used to perform these simulations. But there are much more efficient ways to obtain the same results.

In R, the apply family functions provide us a solution to the efficiency problem. In addition, thanks to the parallel library, we can parallelise the simulation process, drastically decreasing the execution time. As this article has an informative nature, I am not going to focus on explaining how these functions are used. I will simply point out that we create, in the simulations.R script, a new function, called simulate.apply, to carry out the simulations in the same way as in the first part of the article. This function, through an optional parameter, gives us the option to parallelise or not the process (if your computer has more than 2 cores, I would recommend you to parallelise it). Its implementation is the following:

There are probably more effective functions than ours, but for our purposes it is enough.

Recomputing the distribution of yields

Once the new simulation function is done, we reload the $simulations.R$ script, increase the number of simulations drastically, e.g. to 1 million, and recalculate the annual returns.

Again, we obtain the density and distribution functions from the results obtained from the million simulations.

We can see how we obtain results that are “similar’’ to the first ones, but much more accurate, since we have increased the number of simulations carried out by 1000. We also observe how the simulated yields are distributed. In the probability density function we observe how the area of x <= 1 is minuscule compared to the area in the rest of the domain, i.e., we will have a quite low probability of ending up losing money. Another way to conclude this is to look at the height of x = 1 in the cumulative distribution function. In fact, we see that it is quite low. We also see from the cumulative distribution function the high probability that we will have an annual return between 1 and 1.2, i.e. between 0% and 20%, due to the “majority’’ of the heights of the cumulative distribution function being between these two values.

Now, to answer the questions posed at the beginning, i.e. to calculate the probability of a given event, we implement the following function.

The last step is getting the required probabilities.

Therefore, we obtain that the probability of, in a 20-year period, getting an annual return of less than 0%, i.e., ending up losing capital, amounts to 0.0205, or, in other words, 2.05%. In addition, the probability of obtaining an annual return above 5% in that period of time is 87.75%. In the same way, we can calculate any probability in terms of the annual return obtained.

Finally, we will go a step beyond, seeing how the probability of obtaining an annual return lower than 0% changes over the time. To do so, we are going to use the following code:

Through the previous chart we can notice the hyperbolic form that the probabilities of negative annual returns follows. In fact, this curve meets the straight line y = 0.01 between years 22 and 23, so if we place our capital in an index fund with the simulated characteristics (such as the Vanguard U.S. 500 Stock Index Fund) 23 years ahead, we will be 99% sure that we will not end up losing money.

Conclusions

Thanks to the methodology used in this article to simulate the performance of the SP500 stock index, we have seen how index funds are a very powerful tool to grow your capital with very low risk, always when investing for the long term.

In 20-year simulations, 87.75% of times we have obtained an annual return higher than 5%. Thanks to compound interest calculators, like this one, we can estimate the evolution of our savings using this kind of financial products (with tools like the one we are going to use we assume that every year we get the same return, which in real life does not happen, so the evolution of the capital will not be so “smooth”, it will have ups and downs, but the result at the end of the investment period will be very similar). Supposing that we deposit an initial capital of $5,000 in a SP500 index fund, which gives us, in 20 years, an average annual return of 5% (which is a rather unfavourable scenario), and every month we make contributions of $500, our wealth at the end of this period of time would be around $220,000, compared to the $125,000 that we would have if we deposited our capital in a bank account. Now, if our annual return over that period of time were somewhat closer to the expected value, say 9%, at 20 years we would end up with $366,795, and at 30, $1 million.

We can see how time is the best allied of an investor, since it gives us a greater probability of success and a higher accumulated return. Index funds are a great way to increase our capital and fight against the purchasing power loss that we are increasingly suffering in our society.

This is the end of this article. I hope you enjoyed reading it. I leave you the link to the GitHub repository where you can find the code used in this article.
Best regards, and see you next post.