Should you trade this portfolio? Interpreting Backtest Results

Congrats! You just ran your first backtest and made a bajillion dollars for your (simulated) portfolio!

Should you go trade it? Try again for a bajillion-and-one? How do you know you have a good system?

We try to make backtesting and trading easy around here with fast, no-code software - but we can't tell you if you've got a good strategy or not - that's up for you to decide. What we can give you are the metrics and data to look at.

Before the Portfolio Backtest

Prior to building your strategy, you should have an idea of what you are aiming for in the first place.

What kind of strategy are you running? Mean reversion or trend following?

Depending on your decision, you can expect a very different trading experience!

Profitable trend following strategies may only make money on 30-40% of their trades. When they lose money, they lose a little bit, and when a trade works, they make a lot to compensate for the small losses along the way.

Mean reversion strategies tend to work the other way with lots of small wins (>65% of trades make money) but occasionally lose a lot of money.

From there, think about what you're trying to optimize for. Total returns are a great way to go, but if you take too much risk on the way, you're not going to sleep well at night and are likely to abandon your strategy, even if it is making money. Whatever metric (or set of metrics) you choose, check it relative to a baseline like the S&P 500, or your current system and strategy.

The S&P 500 returns ~10.5% per year over the long run, so if you're going to trade for returns, that seems like a decent hurdle to shoot for. Using average rates for the 10-year, we can estimate an historical Sharpe ratio of 0.3 for the S&P 500.

These are achievable values and many retail traders can find valuable, systematic strategies with higher returns and a higher Sharpe ratio. If these get too high, however, alarm bells should start going off!

Use Some Common Sense

Too many backtests are simply too good to be true. It's easy to get sucked into the allure of a great equity curve and low drawdowns and think that you're the next Jim Simons. Instead, you should immediately be suspicious that your algorithm overfit to the data and isn't going to generalize to live trading.

From our own experience and reading widely on the subject, it seems that most retail traders can devise trading systems with Sharpe ratios from 1-2 depending on your strategy. If you get beyond that, start becoming highly skeptical of your results!

Likewise, if your annual returns balloon over 20%, you've probably overfit and have a strategy that's going to fall flat in live trading.

Keep in mind too how many backtests you're running. If you run 100 strategies and one turns out to be profitable, then you're likely guilty of the trader version of p-hacking: repeating a test until you get the result you're looking for.

If your strategy is finding a reliable signal, then small perturbations to the parameters ought to produce similar results.

For example, say you're running a trend following model with a simple moving average cross-over that gets you 12% annualized returns. If you change from a 50-day to a 51-day moving average and your returns drop to 3% per year, then you're looking at something that's probably overfit and very brittle. On the other hand, if you run it with a 45, 46, 47...55-day period and get roughly similar results, then you've probably got a reasonable strategy to work with.

Another thing to keep in mind are the number of trades. If your 20 year backtest looks great but had 3 trades, then you probably just got lucky and can't make any generalizations from these results. The number of trades is going to differ based on your trading speed and how many instruments are in your system. If you have some long-term indicators (e.g. 200+ day moving averages) you're going to have fewer trades than shorter term models. To increase your sample size, you'll want to test it over longer periods (e.g. a few decades of data) and more instruments.

Other Metrics to Look At

While Sharpe ratio is a standard risk metric, it has its drawbacks. Other metrics like the Sortino ratio account for some of the Sharpe ratio's drawbacks by only penalizing you for downward volatility rather than overall volatility.

Maximum drawdown is the largest loss your portfolio has taken. Every system will have losses and drawdowns, but the key is to keep these small and manageable. The larger they get, the more likely you'll abandon your system, even if it is profitable. Same with drawdown duration. Can you sit through a few years of losses? Maybe you can, but I wouldn't want to bet on it.

Average return per trade is important because if this is too low, you're likely to have an unprofitable system in live trading. Unfortunately, trading isn't without some costs. Even those "free trading" platforms have to make money, and they wind up doing this by selling order flow meaning your orders get filled at a worse price than you would get otherwise. This slippage may not be much, but it will eat into your profitability and reduce compounding over time. Slippage estimates are always imperfect, so if you have no buffer on your average trade - likely because you're trading too quickly - then your profitable backtest is likely to fall down in the real world.

Examining the Equity Curve

Take a look at the equity curve below.

We see our single-stock trading strategy against a buy-and-hold strategy holding the underlying. The PSAR-based model did fantastic, for a while, but began to tank in 2016 where it gave up its spectacular, 7,000% gains. It still finished up 2,200% for an annualized 15.6% return (vs. 14.6% for the underlying). Even before we look at some of the key metrics, does this look like a strategy you want to trade?

I would say, "no." This is supposed to be a trend following strategy of sorts, but it was flat to down during some of the biggest trends in the underlying. The model took a huge, ~70% drawdown from its peak in late 2015 to early 2016 by being caught long during a large down day and switching to short as soon as the price rebounded.

This doesn't mean that the strategy isn't a good start to build a robust trading algorithm off of - perhaps with some better risk management it could perform very well - but this type of volatility and the fact that it clearly missed the trends we want make this a pass.

This next one comes from adding a trend intensity indicator to the PSAR to try to capture those longer-term trends.

While this underperformed the underlying (10.8% annualized returns vs 14.6%) , this is a more reasonable equity curve than the previous model. Much smoother and tends to follow the underlying asset on the upside while giving up less on the downside. Its not perfect by any means - trend following models tend to have a decent amount of volatility like we see here - but appears like a good starting point that could be built off of.

Taking your System for a Test-Drive

After all the checks are passed and you're confident in your system, what comes next? It's probably best to do some paper trading with your shiny new algo: especially if you're a new trader.

Paper trading (also called forward testing) is simulating your trading system in real time with fake money. You could manage it yourself or just plug into a brokerage that offers paper trading accounts and run your system as if it were trading with actual money. This provides another level of testing to be sure that you're comfortable with the model and that your algo is doing what you expect it to do.

Unfortunately, paper trading isn't a perfect simulation of the market, no matter how hard you try. Slippage remains a difficult cost to simulate, and paper trading takes a lot of time. Say you've got a model that you expect to trade a dozen or so times per year (like this starter system), you might be waiting 2, 3, or 4 months for a few trades to execute, which means you're losing time in the market if you've got a good trading algorithm. On the other hand, if your system sucks and should be trashed, this could be valuable time to keep you from losing money.

Finally, paper trading is no substitute for experience doing the real thing. It may get you practice at the mechanics of implementing your system, but it's impossible to simulate the emotional mindset and issues of trading with real money. Fully automated algorithmic trading - which we love - removes the day-to-day emotions and second guessing from your hands, but you can always abandon your algo or turn it off when things get tough. Your testing procedure (backtest and paper trading) should help prepare you for those tough times because you'll have seen them before (hopefully), but looking at a max drawdown of 40% in your backtest and living through that over the course of a few months aren't the same thing, and paper trading doesn't quite cut it in these situations.

Building your Own Trading System

We believe that algorithmic trading is a great way for average investors to get started. A system helps to outline your portfolio risk, control your cost, and control your emotions - all of which are critical for being a profitable trader!

Of course, we can't tell you what to trade -this isn't financial advice - but hopefully these examples provide some indication of what to look for when you run your backtest and some of the principles to think through. There is no perfect model and all trading has its risks, so it's up to you to find the set of tradeoffs that you're willing to live with so that you can be profitable and stick with your system over the long run.

If you're interested in building your own trading algo, check out the free demo of our no-code trading platform here and see what you can come up with!