I m developping a strategy using the python library backtesting.py .
My problem is that when i train my strategy using a dataset (from yahoo as an example) I get good results but when i change the datasource for exactly the same period (directly from the broker as an example) the results become very bad.
I want to understand the reason of this big difference ? And why data is not the same everywhere ?
What does this say about my strategy ? Is it very bad ? should it have consistent results independent of the data source ?
data is different in different places (that’s just one reason - there are others, too)
why should it be? data from brokers is just data from their own products, for which they themselves make up the prices - it isn’t objective, real or transparent - spot forex/CFD’s are a decentralized market
if you want something objective and factual, you have to use futures data (that’s the same everywhere because there’s a centralized exchange)
can’t tell, from what you said
but the faster the timeframe you’re using, the more the results are going to vary and the less reliable they will be, overall: if you’re using daily charts, the differences should be very small indeed and barely affect the outcomes at all; if you’re using 10-second charts (don’t laugh - some people do!) then the results will be all over the place and mean almost nothing
no strategy will really do that - that’s a bit unrealistic
let’s hope you get a reply from some backtesting and automation expert like Greg @ProfesorPips who knows a lot more about this subject than i do, and whose answers will surely be much more helpful to you than mine are!
Thank you for your reply, that helps a lot !
That also makes me wonder if there are differences between data from a demo account and a real account for the same broker.
I m working on EURUSD and using 1h timeframe.
Often data from Yahoo, Brokers, etc. are not reliable enough. They have data gaps or slightly false data which might not affect in an overall view. But for backtesting you should search for more reliable data like from Dukascopy and others. I’m backtesting on Dukascopy data and it works for me.
After you have reliable data to test your EAs on, you can think in the next step about the strategy and the coding if there is a problem.
Thank you for your response.
If i use Dukascopy data to develop my bot, does that mean i should use live data from Dukascopy too when I go live (on demo or real account) or should i relate on the broker data in this case ?
It seems that you haven’t been trading yet. Is that correct? Your question leads me to believe so.
But that’s not a problem. I’ll be happy to explain it to you. So if you have created your bot and are testing it against the Dukascopy data, then you know how it will most likely work on a demo account.
If you then run your bot on a demo account with your broker in the next step, you can then compare after some time whether the bot with the backtest data (which you then update again at Dukascopy) matches those on your broker’s demo to a high percentage (it will never be 100%). After all, the bot runs always with your broker’s data on the demo. Not with the data from Dukascopy.
You will only find out 100% how the bot behaves on the live account when it is running on the live account. Because there you will not always be able to open or close an order at exactly the time your bot wants to (#slippage). In particular, bots on timeframes below M15 have to struggle with this.
Since your bot runs/is supposed to run on H1 and on a very liquid market like Eurusd, you have already made a good choice. The higher the timeframe, the less you have to worry about slippage. And the higher the timeframe, the more likely it is that the demo trades = live trades.
In my opinion: Unless you are not creating a scalper bot (M1/M5), the M1 Data from Dukascopy should be sufficient. It works for me, and I’m working with M15/H1/H4 bots.