The sample report says the strategy made $18,420. That is the least interesting number on the page.
The better questions are uglier: how many trades created that profit, what happened during the worst week, did the model pay for slippage, did one morning session do all the work, and would the fills survive a real futures market?
This walkthrough uses a hypothetical futures strategy report. The numbers are not a claim of performance. They are a teaching artifact. Your job is to learn how to interrogate a report before it seduces you.
The Sample Report
Here is the fictional report we are reviewing:
| Metric | Sample Result | First Question |
|---|---|---|
| Market | MES, regular session | does it survive ES or MNQ? |
| Trades | 438 | enough across regimes? |
| Net expectancy | $42 / trade | after costs and slippage? |
| Win rate | 48% | what is average win vs average loss? |
| Profit factor | 1.42 | stable by quarter? |
| Max drawdown | -7.8R | can the trader actually sit through it? |
| Decision | research pass, not live pass | what must be tested next? |
This is not a bad report. That is the point. Bad reports are easy to reject. The dangerous ones look good enough to make you lazy.
Step 1: Read Expectancy, Not Net Profit
Net profit is emotionally loud. Expectancy is more useful.
Expectancy asks: after wins, losses, costs, and position size, what did the average trade earn or lose?
If a strategy made $18,420 across 438 trades, the surface expectancy is about $42 per trade. That sounds decent for MES only if commissions, fees, slippage, and realistic fill assumptions are already included. If costs were ignored, the report is unfinished.
This connects directly to How to Backtest a Trading Strategy Without Fooling Yourself. A pretty equity curve without cost modeling is not research. It is decoration.
Step 2: Find the Trade Distribution
The second question: did many trades contribute, or did three monster trades save the system?
Distribution Review
- How many trades produced 80% of the profit?
- What happens if the top five winners are removed?
- Are losses clustered by session, day, volatility, or news?
- Does the strategy make money outside one narrow market condition?
If the strategy dies when the top winners are removed, it may still be valid, but now you know the edge depends on rare expansion. That strategy needs different sizing, wider patience, and a trader who can survive long flat periods.
Step 3: Treat Max Drawdown Like a Job Interview
The report says max drawdown was -7.8R. Most traders read that and move on.
Do not. Open the worst drawdown and inspect it trade by trade.
- Was it caused by normal losses or rule failure?
- Did it happen during chop, trend, news, or low liquidity?
- Did the strategy keep trading after conditions changed?
- Would your daily loss limit have stopped the sequence earlier?
Pair this with the daily loss limit guide and the losing-streak reset. A strategy can be statistically fine and psychologically untradeable.
Step 4: Audit Fill Realism
This is where many futures backtests quietly lie.
If the strategy buys every breakout at the exact trigger price, exits every stop cleanly, pays no slippage, and assumes every touch fills, the report is probably too generous.
Ask these questions:
- Was slippage included?
- Were commissions and exchange fees included?
- Were stops and targets filled intrabar or only on bar close?
- Did the test use enough granularity for the strategy timeframe?
- Were limit orders assumed filled just because price touched them?
Use the tick value cheat sheet to translate small fill errors into dollars. Two ticks of extra slippage on NQ is not a rounding error when repeated across hundreds of trades.
Step 5: Split the Report by Regime
A futures strategy that works only in one regime is not bad. But you need to know that before you trade it.
Split the sample report by:
- Trend days vs range days.
- Positive GEX vs negative GEX days.
- First hour vs midday vs closing hour.
- News days vs normal sessions.
- High volatility vs low volatility weeks.
If the strategy is a breakout pullback system, it should probably do better in negative GEX continuation conditions than on dampened range days. If it is a fade system, it should not be forced into sessions built for expansion.
For discretionary overlays, compare the report against the futures pre-market checklist. If the backtest makes money only when the live checklist would have blocked the trade, the rules are not aligned.
Step 6: Review the Worst Trade Cluster
Single worst trade is useful. Worst cluster is better.
Find the stretch where the strategy lost the most money or gave back the most open profit. Then label each trade:
- Valid setup, normal loss.
- Valid setup, bad fill.
- Wrong regime.
- Late entry.
- Stop too wide for target.
- Rule violated.
This is the bridge between backtesting and journaling. The futures trading journal should use the same labels live so you can compare the model against real execution.
Step 7: Decide What the Report Actually Allows
There are four possible decisions:
| Decision | What It Means | Next Action |
|---|---|---|
| Reject | edge disappears after realism checks | archive the lesson |
| Research pass | promising but unproven | run out-of-sample |
| Paper pass | logic survives but execution untested | paper trade same rules |
| Live pilot | small-size monitoring only | trade minimum size with kill rules |
Our hypothetical report gets a research pass. It has enough sample size to continue, but not enough evidence to risk real money. The next step is out-of-sample validation, then paper trading with identical rules.
The Report Notes I Want to See
A premium backtest report should include a notes section. Without it, the metrics are too easy to misread.
Minimum Notes
- Exact market, timeframe, session template, date range, and data source.
- Strategy rules in plain English, not just code parameters.
- Commission, fee, and slippage assumptions.
- Fill model and intrabar handling.
- Parameter count and optimization method.
- Known failure modes and conditions where the strategy is disabled.
If those notes are missing, you cannot audit the result later. You only have a screenshot and a feeling.
How This Connects to Live Trading
Backtesting does not remove risk. It makes risk visible earlier.
Before a strategy goes live, define:
- maximum daily loss,
- maximum weekly drawdown,
- minimum sample size before increasing size,
- slippage threshold that pauses trading,
- market conditions that disable the system,
- and the review schedule.
Use the futures position size calculator, the position sizing guide, and the R-multiple calculator before any live pilot. A backtest that cannot be translated into controlled size is not ready.
For the first live monitoring batch, keep size conservative with micro futures position sizing, then review every mismatch with the futures trading journal process. The point is not to prove the model right. The point is to catch where live execution disagrees with it.
Source and risk notes
- NinjaTrader's Strategy Analyzer documentation states that Strategy Analyzer runs historical analysis on NinjaScript-based automated strategies: Strategy Analyzer.
- NinjaTrader's backtest documentation explains that a backtest analyzes historical strategy performance and requires historical data and a custom NinjaScript strategy: Backtest a Strategy.
- NinjaTrader's historical fill processing documentation discusses methods for improving historical backtest realism: Understanding Historical Fill Processing.
- NinjaTrader's optimization documentation defines optimization as iterative backtests across input ranges to determine optimal values over the historical test period: Optimize a Strategy.
- NinjaTrader's backtest logs documentation notes that Strategy Analyzer logs can preserve backtest history and code snapshots for comparison: Backtest Logs.
- CME's position and risk management education emphasizes risk management and managing losing positions: CME Position and Risk Management.
- This article is educational. Backtests are historical simulations, not performance guarantees. Data quality, fill assumptions, costs, slippage, regime shifts, leverage, and trader behavior can make live results materially different.
Final rule: a backtest report should make you more skeptical, not more confident. If the strategy is real, it can survive questions. If it falls apart under basic audit, the report did its job before your account had to.
Audit the report before trusting the curve
A backtest is useful only after costs, fills, drawdown, regime splits, and out-of-sample behavior survive review.