Why "Walk Forward Analysis" is still unreliable and useless! [DISCUSS]

DarwinFX · February 26, 2014, 5:11am

Hello, I am Darwin and today I want to talk about the limitations that walk forward analysis suffers from. This is my 3rd article, so if you do not know how WFA works, please read the other 2 (you can find them here on the forum).

The target audience of this article is everybody that deals with ExpertAdvisors and Backtesting / Walk Forward Analysis

Some of you might already have seen a few posts of me where I talk about some research I do in the fields of Trading System Analysis (in the course of writing a meta-algorithm that can build, analyse and trade strategies on its own). The goal is to write an algortithm that is so powerful that it can take every EA and, due to in-depth analysis, tell you how and when to trade it in order to make profit, no matter how good or bad the underlying EA is

So here is a new article in which I would like to lay down some insights that I could get in the process of writing this algorithm (DATFRA - Darwins Algorithmic Trading Framework)

Well, lets begin. My first concern is that the design of Walk Forward Analysis is, in its nature, unrewarding and not the kind of analysis a trader wants.

Also, I claim that the results of a WFA are more or less random, and if a system works well after a successful WFA, then not because the test was successful, but because the trader designing the system did a good job.

In this article I do not yet want to show how this problems can be solved, I just want to demonstrate that they exist. In my next article I will explain how I think this all can be solved in an elegant way.

The fundamental design problem

Walk Forward Analysis is designed to evaluate a trading construct you give to it.
This construct consists of:

Trading System (eg an Expert Advisor)
Market/Timeframe (eg “EURUSD / H4”)
System’s Parameter ranges (eg “Moving Average Period from 50-150”)
Optimisation (In Sample) Timespan (eg “Optimise on 2 years of data”)
Forward Trading (Out of Sample) Timespan (eg “Forward Trade for one month”)
Preferred characteristic (eg “forward trade the candidate with highest profit”).

So all this has to be pre-determined by the trader, out of intuition, and not based on true facts and data. But god, these are the most important decisions, how should one “guess” them?!

And then, WFA will only be able to tell you if this construct would have worked in the past or not, thats it.

So in order to find the best trading construct, you have to use trial&error and repeat WFA step multiple times. This would then, step by step, even lead to the worst case, your “unseen” out-of-sample tests would slowly become “known” in-sample data and the whole advantage of WFA over backtesting would fade away completely.

This design related problems are already showing that WFA can not be the end of the road in terms of system analysis.

In a perfect world, you should give the analysis algorithms only the trading system and the market/timeframe, no other parameters. And then, the algorithm should tell you the best choices for all the other parts of the trading construct, based on data and facts, not the other way round.

Side Note: it should NOT just tell you how to trade your systems, it should give you the possibilities to look into the system’s characteristics on your own. You should never be forced to trust any algorithms without the possibilities to check it’s findings!

This is very, very important. It is not very much of value to evaluate a single trading construct, but it is a gamechanger if you can look into your strategies in a way that would allow you to just “see” how they work and what trading construct will work best (More on this in my next article)

Even worse: Unreliable results because of lacking data

Ok, so even if a trader could come up with a good trading construct out of intuition/knowledge, WFA would still be a more or less random thing. But first, let’s make a rough calculation:

An example trading system and a small estimation of its parameterspace-size

So, a system that enters trades based on a Moving Average Crossover and RSI Indicator, and exits them using a different Moving Average Crossover has at least 5 Parameters (2x2 for MA-Periods + RSI Threshold). It’s 6 if you take into account the StopLoss.

Let’s say the “fast” Moving Average Periods can be 10-50 and the “slow” ones 50-250, the RSI threshold can be 1-100 and the StopLoss 50-150 pips (this is no real system, just an example!)

So this system can already be traded in 4020040200100*100 different ways. That is 640 billion (640.000.000.000), which is quite a huge number.

One might question my exact example strategy, but can not question the millions or billions of possible parameter combinations, even for small systems.

But thankfully, if we take into account that a lot of these parameter-combinations would behave very similar, we do not need to evaluate them all, but we need at least a meaningfull sample of it, like a few hundred thousand or a few million.

So, keep this huge amount in mind, even for small systems, because with every new dimension for our optimisation problem’s solution space (every new parameter) the amount of possible parameter-combinations grows exponentially.

Walk Forward Analysis - missed data during optimisation

Ok, now lets look at the first step of WFA, and the first problem: Missed data because of inefficient algorithm design and computing time concerns.

During optimisation step of WFA, the algorithm should, in a perfect world, evaluate all 640 billion combinations in order to determine which of them work best. Of course this is not possible, but a “meaningfull” sample (let’s say 500.000) would be feasible and needed if we want to look at the “real” picture.

The problem is, due to limitations of WFA algorithms, optimisation has to be done in every single Walk Forward Window.

Let’s say we do a WFA on 10 years of data and our Forward Trading Timespan is 2 weeks: That makes 240 Walk Forward Windows. That means 500.000 tested parameter combinations per window would need 120.000.000 single simulations.

And then, remember that WFA relies on a trial&error principle, so you will most likely have to do this a few times.

You see? Evaluating the “real” picture would take very, very long, and therefore most WFA implementations are forced to only evaluate an very much cropped fraction of the actual parameterspace because it is not possible to evaluate the whole parameterspace (or a meaningfull sample of it) in a reasonably small timespan, because optimisation has to be done in every single WF-Window.

This means, WFA most likely does not evaluate 500.000 parameter combinations per window but only 10.000 or 50.000 or something like that. So eventually we already lose like 90% of all data in this step.

This is a problem that could be solved if the trader has lots of time for his/her analysis (which is not likely, especially based on the trial&error method), or with a more efficient design of these algorithms. Nevertheless, in praxis, this problem is ever-present.

For comparison: DATFRA, which is my private research project, only has to do one single simulation per parameter-combination, no matter how many WF-Windows it analyses. In the above example, that would already decrease the computing time by the factor 240.

Parenthesis: What kind of data do we look at when analysing trading systems, what is a “datapoint”

I will talk about “datapoints” and “data” quite frequently in this article and in my posts, so here is an explanation.

When analysing systems, it is always about a trinity of informations. Remember how WFA works:

So a datapoint, of which 1 is generated per Walk Forward window, consists of:

The performance in the RED optimisation window
The performance in the GREEN forward trading window
The used parameter-combination for this specific test

So, in our example, a WFA would generate 240 of them, whereas 120million (500k * 240) would be possible for our example system. That should already give you headache.

Walk Forward Analysis - tons of missed data during forward trading

Ok, now lets look at the second step of WFA, and the second problem: Missed data because of wrong algorithm design and computing time concerns.

Now remember, a meaningfull sample of our trading system’s parameterspace would be 500.000, and we have 240 WF-Windows. That would make a total of 120.000.000 optimisation-candidates. And out of this huge amount, a WFA algorithm takes the very best per window, 240 in this example.

That is 0,0002% of the total amount of all datapoints that we could use to describe/analyse this system and it’s ability to produce good forward trading results, based on good optimisation results.

And then WFA takes these few datapoints and claims it gives a somehow realistic view on a trading system’s performance / robustness.

Thats nonsense! You also would not judge a picture’s colour by looking at 1 pixel, would you?

A word about fluctuations and why the “very best” parameter combination is not meaningfull

You could argue that it is not important if we forward trade all 500.000 candidates per window, because we are only interested in the top performers, as they are the ones we trade in realtiy.

Well this argument would only works if:

We would ignore the ~90% of data lost in the optimisation step
The very best candidates would be meaningfull, which means that all candidates that are following (like the next 10 or 20 or 50, which is not much compared to 500.000) would behave in quite the same way.

But reality is different, the performance of the top candidates per window fluctuates quite much and taking the “very best” therefore leads to more or less random outcomes.

Experiment 1

Here are some examples, I plotted the forward trading performance of the best (left) and the next 4 candidates of some random strategies I created and evaluated with DATFRA. Most of the analysed WF Windows looked like these:

These were just a few examples to illustrate my point of view, I could show hundreds or thousands of them.

So, for the real picture, you would AT LEAST need to evaluate a few hundred of the top candidates, not just one, as it does not show the “real” picture. It’s performance is more or less random!

A perfect analysis algorithm would evaluate every single candidate that made at least 1$ profit during optimisation. That would give the real picture and most likely 1000 or 10.000 as many datapoints than what a WFA gives.

Experiment 2

Here are some more examples, this time I plotted the overall WFE (red) and the WFE of single windows (green) of some random strategies I created and evaluated with DATFRA.

WFE (Walk Forward Efficiency) is a measurement that compares in-sample and out-of-sample performance and is used as THE statistic about system robustness in WFA (google for it if you want to know more about it)

This clearly shows the flucutating nature of the results a WFA generates, and that the end result is not really telling much about your expected live trading performance.

Btw: To keep the plot scale in limits I did map all points > 2.5 to 2.5 and all points < -2.5 to -2.5, so reallity is even a lot worse. That is also the reason why the second image in the second row does not look “right”

A word about feasibility

Please do not think I only talk about grey theory here “as it is not possible to do this kind of simulations in a short enough amount of time anyway”.

If the algorithm is designed well, one would not need a single further simulation in order to determine forward trading profit and not a new optimisation procedure for each WF-window.

So for the used example, DATFRA can generate 34.000.000 “Optimisation=>Forward Trade datapoints” in ~24 hours and on a mid-end PC (8GB Ram, quadcore 3GHZ).

Still not 120millions, sure, but compared to 240, I think its a very good result.

So it IS feasible to analyse a system with such a level of insight, even on today’s hardware.

DarwinFX · February 26, 2014, 5:12am

[U][B]Some afterthoughts[/B][/U]

To everyone claiming that backtesting strategies does not work: Well, in its current form it does not, but if you look at enough data, it does, and it can aid a trader in taking funded decisions.

To everyone using backtests/WFA: It does not work that way, you can never rely on your analysis results, and if your EAs/Trading Strategies make profit, then not because of the good tests, but because you did a very good job designing them!

In about 1-2 weeks I will post my next article, in which I will explain how an ultimative state-of-the-art system analysis algorithm works and what can be done with it. You will be stunned, promised!

[B][U]“Are you just trying to sell stuff?”[/U][/B]

People keep asking me this whenever I post stuff.

No, I post this because I want to discuss my concepts and thoughts with other advanced traders. The side benefit is the educational effect for everyone that is willing to learn more about algotrading.

And yes, I am developing an algorithm that is based on the concepts that I explain in my articles (especially the next one) and that is able to solve the issues discussed here. Well, basically I have already developed it, it’s in first alpha version at the moment and works great.

But I am developing this for my private usage, so No, I am not here trying to sell you stuff, as most of the people reading this will not have the chance to purcase it.

It will only be sold to a few people, just enough so that I can fund my own trading accounts (I am young and therefore need the money :P).

Most likely I will limit the amount of copies sold or only sell to expert traders or only to companys or charge enough money so most ppl won’t want it or sell copies in silent auctions or … Well, I do not know yet how it will work out, I can just say that I will keep it private to a small circle of happy few, so do not read this article with the bias of “this guy just wants to sell me stuff”, thanks.

-Darwin

PS: As always, just write me a PM or post in this thread if you want to discuss further and/or want to have more informations about DATFRA, as I am not allowed to leave my SkypeID in this thread.

PPS: I know I could have hidden the part about somewhen selling something, but hidden agendas are cowardly, so live with the truth, as it does not make my arguments any less valid