Chapter 14: Advanced Techniques & Backtesting

"The first principle is that you must not fool yourself — and you are the easiest person to fool."
— Richard Feynman

The Backtest Illusion

Most backtests are worthless. Not because backtesting is flawed — done correctly, it's the closest thing to a crystal ball for understanding whether an approach has edge. The problem is how traders backtest. They do it wrong in predictable ways.

I've built systems with 80% win rates and 3:1 R:R that looked like money-printing machines. Beautiful equity curves. Then I traded them live. The charts showed profits while my account showed losses. The backtest was lying — or more precisely, I was lying to myself through the backtest.

Why backtests fail:

The Hindsight Problem. You know what happened next. Your eye is drawn to patterns that preceded big moves. Manual backtesting — scrolling through charts marking where you "would have" traded — is almost always worthless. You're confirming biases with hindsight.
The Selection Problem. You test on stocks you know performed well ("let me backtest on NVDA 2020-2024" — a stock up 1,000%). You avoid difficult periods. You test on stocks that still exist (survivorship bias — use CRSP, Norgate Data, or Sharadar for delisted stock data).
The Optimization Trap. You tweak parameters until the backtest looks perfect. RSI at 14 doesn't work? Try 12. Try 9. Try 7. You've found settings that fit THIS data by chance, not genuine market dynamics.

"A backtest that can't fail isn't testing anything." If you keep adjusting until results look good, you're not discovering edge — you're creating an illusion. The backtest should be a trial, not a confirmation.

Proper Backtesting Methodology

Treat backtesting like a science experiment, not a treasure hunt. Define your rules completely before looking at a single chart.

Rigorous Backtesting Process

Step 1

Define rules BEFORE data. Complete entry, exit, stop, sizing, disqualification criteria. If it's not written, it's discretion — and discretion becomes hindsight bias.

Step 2

Select test universe. Include winners AND losers. 5+ years minimum. Include different regime periods. Use survivorship-bias-free data.

Step 3

Split your data. Training (60-70%), Validation (15-20%), Test (15-20%). NEVER peek at test set until the very end.

Step 4

Execute mechanically. Apply rules exactly. No discretion. Log every trade. Include realistic costs (commissions + spread + slippage ≈ $0.10-0.30/share round trip).

Step 5

Analyze honestly. Win rate, avg win/loss, profit factor, expectancy, max drawdown, recovery time, performance BY REGIME (critical).

Step 6

Out-of-sample validation. Apply to validation set WITHOUT changes. If it fails → back to Step 1. Test set is FINAL — no adjustments, no do-overs.

Regime-specific analysis is where most backtests fail. A 2.0 profit factor overall might hide: Regime 1 = 3.5 PF, Regime 3 = 0.7 PF. The system loses money in ranging conditions. If you don't know this, Regime 4 will destroy you. Always analyze by regime.

The Curve-Fitting Trap

Imagine predicting rain. You notice your neighbor wore blue on the last ten rainy days. Rule: "It rains when neighbor wears blue." 100% backtest accuracy. Obviously useless — no causal relationship. This is exactly what curve-fitting does to trading systems.

Warning Signs of Curve-Fitting

⚠ Too many parameters — More than 3-4 adjustable settings is danger zone

⚠ Unusual values — RSI at 13? MA at 47? Round numbers (14, 20, 50, 200) are safer

⚠ Too-good results — Win rate 70%+ with good R:R? PF above 3.0? No losing months? Certainly overfit

⚠ Performance cliff on out-of-sample — Validation much worse than training = fit noise, not signal

⚠ Logic doesn't make sense — Can you explain WHY the rules should work? If not, might be coincidence

⚠ Brittle to small changes — RSI at 13 works but 14 fails? Robust systems work across similar parameters

The defense: simplicity. Every parameter is an opportunity to overfit. A system with 2-3 parameters can't be easily manipulated. If it works with so few moving parts, it likely reflects genuine dynamics.

"Simplicity is the ultimate sophistication." — Leonardo da Vinci. In backtesting, sophistication kills. The system that looks slightly worse in backtesting often performs better live. Accept good enough. Perfect is the enemy of profitable.

Walk-Forward Analysis is the gold standard. Train on months 1-12, test on 13. Train on 2-13, test on 14. Continue through your history. Consistent performance across all windows = genuine edge. Great in some, terrible in others = noise. Tools: AmiBroker (built-in), QuantConnect (cloud), Python backtrader/vectorbt (custom).

From Backtest to Live

Forward Testing Protocol (3-6 months minimum)

Phase 1: Paper Trading (2-4 weeks)

Execute every signal exactly as rules dictate. Compare to backtest expectations. If significantly different, investigate.

Phase 2: Micro Size (4-8 weeks)

25% of normal size with real money. Tests emotional execution. Losses hurt but can't damage account. Are you following rules? Be brutally honest.

Phase 3: Half Size (4-8 weeks)

50% of normal. Emotions more engaged. Watch for rule deviation under pressure.

Phase 4: Full Size

Only after Phases 1-3 confirm. Only after you've experienced a real drawdown. Only after you trust the process in your gut. If you haven't hit a losing streak yet, wait.

Continuous Improvement — Evolve or Die

The edge you find today may not exist in five years. But adaptation must be disciplined — not panic.

Adaptation Decision Framework

TRUST the system when: Drawdown within historical norms. Losing streak within expected range. Executing rules correctly. Regime matches system's strength. Nothing fundamental changed.

INVESTIGATE when: Drawdown exceeds historical max. Win rate dropped over 50+ trades. Persistent regime mismatch. Fundamental market structure change.

ADAPT when: Investigation reveals genuine edge decay. 100+ trades show systematic weakness. Walk-forward on recent data shows degradation.

NEVER adapt based on: one bad trade, one bad week, a "feeling," or anything without supporting data. Adaptation requires evidence, not emotion.

"The best system you abandon is worse than the mediocre system you stick with." Jumping between systems means experiencing every system's drawdowns and none of their recoveries. Consistency beats optimization over the long run.