Backtesting

AlphaNova
4 min readJan 26, 2025

--

So, not sure if I ever mentioned this, but I hold a Ph.D. in backtesting — with a focus on overfitting. This illustrious degree was unofficially granted during the years 2011 to 2014 when I invested an absurd amount of time tinkering on platforms like Quantopian and QuantConnect. I tested thousands of ideas, optimized parameters, and watched pnl curves morph into something aesthetically pleasing. And let me tell you, I wasn’t alone. I’ve seen brilliant individuals, with actual Ph.D.s from top-tier universities, do the exact same thing right in front of me. It’s mesmerizing — staring at a screen, waiting for backtests to finish, only to scream at the heavens when the results disappoint, and then starting over. It’s eerily reminiscent of those casino regulars glued to slot machines for hours on end.

That said, there seems to be a growing chorus of voices criticizing backtesting itself lately. Ironically, many of these detractors have likely never run a backtest in their lives. It’s a dangerous trend, akin to the proverbial taxi driver waxing poetic about Bitcoin or the inexplicable hatred children once had for Barney the dinosaur. Remember that? “We hate Barney!” — a bizarrely viral phenomenon.

Let’s be real: if you’re developing a trading strategy or signal, you have to backtest. There’s no way around it. You can’t spend countless hours refining a model, mulling over its philosophical implications — “Is it causality? Does it capture market dynamics?” — and then skip straight to production without ever running a backtest. Even the legendary Jim Simons, in his characteristically understated way, once said, “You try different things and see how they worked on historical data.” Translation? You backtest. Call it research or call it something else — semantics don’t change the necessity. Imagine a scientist proclaiming, “STOP! DO NOT TEST YOUR THEORY — testing isn’t research!” Absurd, right? (Well, except for geology or palaeontology, where live testing isn’t exactly feasible.)

Here’s the nuance: there’s a critical distinction between an alpha/signal/prediction and a trading strategy — and the way you backtest each is different.

An alpha is a model or function that predicts some future outcome. For instance, it might forecast the next hour’s average temperature at a weather station. A trading strategy, on the other hand, is a more complex construct. It combines one or more alphas with additional rules like thresholds, position sizing, and other parameters. My advice? Do not try to optimize everything at once — alphas, thresholds, sizing, and the whole strategy — right out of the gate. That’s like taking an untested car to the racetrack without first validating its engine or tires. Start with the alphas. Test their predictive power and robustness against overfitting. Treat them as modular components — refined jewels you can reuse in various contexts. Only once you’ve nailed that should you evaluate the broader strategy.

Let’s keep it simple and focus on a univariate alpha — a prediction of a single asset’s return (relative, not absolute). In finance, the goal isn’t necessarily to match the prediction to the outcome perfectly. What matters is alignment — how closely the prediction vector correlates with the outcome vector. In other words, you want a small angle between those vectors, or high correlation (cosine similarity). This is especially true if your utility function is the Sharpe Ratio, which on an unscaled basis is just the ratio of expected value to standard deviation (where s is the alpha and r is the return):

Most machine learning models minimize the residual variance (the squared distance between predictions and targets). Conveniently, that also maximizes the correlation between predictions and outcomes — boosting the Sharpe Ratio. For zero-mean alphas and returns (assuming Gaussian distributions — cue the groans and yes, you’re right this won’t be valid for nasty asset classes), the link between Sharpe Ratio and lead-lag correlation can be expressed as:

Backtesting an alpha, in essence, involves pretending you’re trading nominally proportional to the alpha and computing its Sharpe Ratio, potentially visualizing cumulative pnl. It’s just a way to measure the signal’s lead-lag correlation with the outcome. This principle holds across domains — whether you’re trading asset prices or forecasting hourly temperatures.

And yes, visualization matters. Consider pnl curves with the same Sharpe Ratio. They might share identical statistical properties (Sharpe ~0.173, scaled to root 250 ~2.7), yet look wildly different.

Clearly, Sharpe alone doesn’t tell the whole story.

But that’s a discussion for another day.

--

--

AlphaNova
AlphaNova

No responses yet