[TUTORIAL] Complete Backtesting and Analysis Setup (100% Free)

ClarkFX, may I ask you something?

For some reason the backtester stops before the end of the test. I mean, I setup for 2007/05/01 to 2013/03/28 and it is stopping at about 2010/07

What could be the problem?

Thanks!

EDIT

If i use control points intead of every tick, I get the complete results.

For a strat that uses mainly daily bars, do you think that backtesting with control points on 1 minute timeframe (the EA makes the necesary adjustments to use daily data) is enough to get a good result?

Thanks!

I am guessing that youā€™re backtest is stopping earlier than it should because youā€™ve run out of money. I would look at the log to see if that is the case. Obviously if your backtest has reached a level where it can no longer trade, the backtest will stop.

I would use Every Tick for the highest accuracy still.

I checked it an that is not the problem. The equity never drops more than 13%

What does your logs say?

DISREGARD THIS MESSAGEā€¦

I just realized what was going onā€¦ the images server you are usingā€¦ postimg.org for some reason is recognized as unauthorized website by my internet providerā€¦ as I am an expat in the UAE, itā€™s sometimes really frustrating to deal with these thingsā€¦

I can see your post entirely just only passing through my VPN, witopia.org.

Problem solvedā€¦

Hello ClarkFX,

I am new to the forum, also did not post for almost a year.

I was interested by your post, and unfortunately I cannot see the images posted.

Is that only me, and my setup or are they just broken links ?

Regards,
Pete.

Glad you found a way around it. :slight_smile: Enjoy.

Hi Clark,

I used EA with 4 pairs on demo accountā€¦ how I can do a backtest with 4 pairs ?.. on MT4 strategy tester only we can use one symbol.

Iā€™m having problems with AUDUSD. I found two bars that have values out of the range. Let say, on April 4 2007, there is one that has itā€™s high value of 234, and on April 1 2007 the high and close value is 1.6, double than the open.

Donā€™t know if the problem is on the original data or with the python script.

The best way would be to test each market individually, save the backtest reports and merge them using Strategy Quantā€™s EA Analyzer.

I looked at the code, there shouldnā€™t be any issue, if there is, the most likely problem is the Dukascopy data itself had some sort of error during those times. You can take a look in the History Center and if you can find it, you are able to edit them.

Hi Clark

See this error in data source, on NZDUSD

2007.03.30 21:00:00.097,0.7138,0.7145,6.40,6.40
2007.03.30 21:00:00.638,0.7138,0.7145,2.40,2.40
2007.03.30 21:00:00.915,1.6231,1.6238,1.20,8.00

As you can see, the last row for that specific 1 minute bar shows a value of twice the value of the market.

I think that is wrong data, isnā€™t it?

at the open on next day the price is 0.7154, so I think this is corrupted data.

I have tested downloading that specific file again and it is still corrupt.

Do you have seen a problem like this?

Thanks.

EDIT:

Look at this other error


High corrupted?: 157.2
Error in row - extreme price:
 this row
Array
(
    [0] => 2007.04.01 21:38
    [1] => 0.715
    [2] => 157.2    --------------------------------- how is the "high" 157.2 while the prices are 0.715x?
    [3] => 0.715
    [4] => 0.7152
    [5] => 10
)
 prev row 
Array
(
    [0] => 2007.04.01 21:37
    [1] => 0.715
    [2] => 0.715
    [3] => 0.715
    [4] => 0.715
    [5] => 2
)
 next row
Array
(
    [0] => 2007.04.01 21:39
    [1] => 0.7153
    [2] => 0.7154
    [3] => 0.7153
    [4] => 0.7154
    [5] => 6
)

Also there are problems with dates, check this


 this row, date 2007.12.12 18:02
Array
(
    [0] => 2007.12.12 18:02
    [1] => 0.78765
    [2] => 0.78765
    [3] => 0.78765
    [4] => 0.78765
    [5] => 2
)
 prev row, date 2007.12.25 04:43
Array
(
    [0] => 2007.12.25 04:43
    [1] => 0.78795
    [2] => 0.78795
    [3] => 0.7878
    [4] => 0.7878
    [5] => 13
)
 next row, date 2007.12.12 18:03
Array
(
    [0] => 2007.12.12 18:03
    [1] => 0.78765
    [2] => 0.78785
    [3] => 0.78765
    [4] => 0.78785
    [5] => 12
)




As you can see the previous row had a date greater than the current row, and that is wrong, the next date must be greater than previous date. In this case the previous row had wrong date.

So the first line, it has two entries? Or was it a copy and paste mistake?

And yeah, I wouldnā€™t be too shocked if Dukascopyā€™s data source had a couple mistakes. The best thing you can do at this point is to find them manually when you are backtesting and try to fix it.

Personally, I do things a little different. I have MySQL store the tick data into a database and in their separate tables. Then I have a piece of code that will be able to find erroneous data through finding extremes of price value. Then I take a transform of the ticks and replace the broken tick with the value of the transform at the current period.

Iā€™ll have to look at the default Dukascopy data. I took a look at the Python script and itā€™s working as it should, I think ā€“ please let me know if there is something off. But if there was something off, then there should be a lot more mistakes, I believe.

Just wanted to link this PDF on tick data filtering. Some people may find it useful.

http://www.tickdata.com/pdf/Tick_Data_Filtering_White_Paper.pdf

I also checked the python script. It is OK, the problem is on the Dukascopy data

It seems to be corrupted. I think the python script could make some filtering on the data to prevent this type of errors, that can result in problems when backtesting :slight_smile:

By the way, I made some modifications to the previous post. I added more errors now in dates and fixed the paste I did.

Please check it :slight_smile:

Great idea! Itā€™s definitely something to look at. Unfortunately, Iā€™m a little busy working on a project.

But if you could perhaps do us a favor and write the steps on how you would cleanse the data, I would be happy to quickly whip up the code thereafter. :slight_smile:

Already did. :slight_smile: I think the issue is that we can say that Dukascopyā€™s data isnā€™t 100%, perhaps 99.5% is better. With that said, these erroneous ticks can place a large problem on our backtesting.

If that is the case, letā€™s create a tick filtering script. :wink:

This should remove duplicate lines on Linux and also sort the results in by chronological way.

sort -n -t, -k1,1n NZDUSD_1m.csv | uniq -w 16 > sNZDUSD_1m.csv

I think that for the price errors we can discard (or adjust based on previous and next bars) prices that has a variation of more than 10% from the SMA60 on 1 minute bars

That should give good results. Iā€™m myself are removing that bars, because 1 or 2 minutes doesnā€™t affect too much if they are missing on a big timeframe (4 hours or greater) strategy.

Thatā€™s one way to do it. I would look at trying to replace it with a closer value, rather than removing it completely though. Just to prepare for future developments. :wink:

BTW, it seems like you have some programming experience? Do you happen to have any PHP, ASP.NET, MySQL experience? Currently working on a project, would really appreciate some help. Itā€™s never been done before. :wink: