A Journey into Trading Using DRL

A couple of members have been trading using the NEAT neural network algorithm, but I’m going down a different route and using TensorFlow which is created by google and used in many commercial things, including Tesla’s Autopilot system.

Reinforcement learning seems to be the most appropriate solution, and there are several algorithms that can be used to attain this. The principle is largely the same, we create an environment which in this case is a trading environment. The agent is given observations, in this case equity, open, high, low, close, open position price, RSI(14), SMA(20). The job of the agent is to explore the environment, make an action (buy, sell, hold) and we reward the agent based on the success of that action. The agent will try to learn how to take those actions in order to maximise its reward.

I started off with a PPO algorithm because we can run multiple environments simultaneously to make the learning more efficient, however I couldn’t get this agent to do very much at all. It always decides to do the same action regardless of input. It may well be that it has learnt enough to know what to do, so I may set it a large training task in the future.

A2C is a similar algorithm and I got similar results, so didn’t pursue this for long. DQN has been the most promising to start with. This manages to hit a sweet spot where it does very little at first, then takes a lot of trades, then starts to reduce the number of trades as it learns that it can’t always win. Catching it at the sweet spot is difficult, I’ve been saving every 200k data points, but will increase the frequency because that’s only been creating one agent that works.

On the face of it, that agent performs well. Trained from 2000 until 2010 on the Daily timeframe and then testing from 2010 until 2022 Gives over 3000 pips profit, however it made a bizarre decision to hold a trade for years at a 7500 pip loss. I think that’s because I’m differentiating between sell and close buy to try and allow the agent to realise those things are different, but if we get a sell signal with a buy position open, we should probably close it regardless.

Next steps are to try and remove those 3 big losses and ideally get more frequent trading. Then I will try it on 5 minute and hourly chart data and then different commodities.


Great to see other traders experimenting with neural networks :slight_smile:
Are you planning to share more stats (std dev, max drawdown, expected value etc.) for your networks?

Good luck and have fun! :slight_smile:

Keen to hear more about how you get on!

1 Like

Yes, I’m at the very start of the journey, I will share more as I progress. Right now I’ve only just learnt about neural networks and created the environment and setup. I feel like I’ll maybe not get something that I set free trading for real, but it’s very interesting and I’ve learnt Python as part of this too.

Right now I’m battling Colab. It keeps disconnecting in the middle of training and deleting all of my agents. My PC isn’t up to churning through everything, might have to try it on the work laptop that’s much more powerful.

Been a bit of a slog trying to get something that works. Part of the problem seems to be that the reward is calculated whilst holding a position, so a trending market to teach the agent results in holding for a long time and getting out. The problem is it does that when going the wrong way too.

I thought scalping might work better, but the spread is absolutely crippling, struggle to make anything worth the risk.

Hourly has been difficult too, although I did stumble upon this beauty that gave 100% win rate over 3.5 years. Not many trades, but who cares if we have lots of very good robots. Only downside is that I accidentally trained it on 100% of my data, so I have no idea if it has overfit or actually has something. I’ll try some more analysis to see what it actually did later.

On the hourly I’ve found a bot that seems to work very well at picking up 1 candle moves during an uptrend. I’m not sure how it identifies an uptrend without a memory layer or a long period MA, but it certainly seems to. And performs well outside of the trained area too. Main downside is it clusters trades together and then waits a long time to take more.

This will be the first agent of an ensemble that I will try to use.


Everything got a bit messy as I was trying to change inputs and I couldn’t really remember what went where so I’ve started again before they ever got to a forward test. Got some better code and analytics, so modifications should be easier as long as I don’t need more than 7 inputs.

Indexes definitely seem to give better performance than forex. First new one is DAX 4H. Training is 50% of the data I have from the broker, then test on 100% of the data. Has to get over 65% win rate with an average win of over 10 pips. I also like to see mean/median as a high ratio, if it’s low it means it’s holding onto losers for a long time.

Got some pretty huge returns, but I was testing on old data. I will be retesting the best 6 on data including the last 5 months and see if they still have a good performance in that time too.

One thing I have thought of is rather than using reinforcement learning to trade, which is notoriously difficult, I could use a feedback type of AI where I tell it whether it’s an uptrend, range or downtrend and it tries to predict the turning points based on my feedback. That is a more definitive type of neural network because it’s either right or wrong (based on my feedback). I don’t have time at the moment to look into this, but it’s another possibility and potentially more accurate if it would work.

Interestingly, those results were when I had the max observation set to 200, so the only input that was changing was rsi. I also think OHLC is kind of irrelevant to what direction things are going to move. A neuron and weight can’t really use that information to decide what will happen, so I suspect it was ultimately overfitting to the max and min in the data and that was causing the long trades.

I’m going to try and run predictions using only indicators and ignore price to see what happens. It should realistically be able to reproduce something similar since that’s what it was doing before.

Some testing has shown that my previous prediction appears to be correct that price isn’t a big factor in predicting the moves. Much better results by using several indicators and letting the AI learn when a good time to enter is. It appears to be better at learning when it doesn’t buy and sell, although a couple of good results with a nice smooth equity curve. I feel like I now have enough to start integrating with MT5, so a bit of learning needed to get that up and running before demo.

One thing it’s not great at is getting out when a big event goes wrong (covid, Ukraine war etc). So it would need oversight to make sure it isn’t left to run through major events. Definitely having best results with stocks so far, so I’m a bit sceptical as to whether it is truly learning properly or just riding the long term uptrend. I’m trying more symbols to see if I can get good results on a swinging market rather than just a trending, although it’s not necessarily critical to performance.

Some of the best results from S&P500. The curve is because of increased volatility and growth of the index, not compounding. With compounding, the results get very impressive although I suspect unrealistic…

And for fun, my favourite one of all, and weirdly the smoothest plot:

Indices will most likely be easier to model, as stock market is trending in general :slight_smile: Forex is more like a neverending huge range. Good idea to switch to indices!

1 Like

The only question is, would it be better to just put money into a tracking fund, or is it worth CFD/spreadbetting on indices.

In 10 years, the profit of my AI is slightly better than if I’d just bought and held. Obviously can’t do that on a CFD because the swap would kill you, but on a tracker fund you’d get interest as well as growth. The benefit of CFD is leverage and being able to have a much higher return although obviously a bigger risk.

I guess if you’d put £1000 in a tracker fund 10 years ago you’d have a bit more than £3000 now, which is a good return. If you had £1000 on my bot and used say 15:1 leverage, you’d have £18000 now (assuming fixed lot size). Weirdly you’d think it’s 15 times bigger, but those are rounded real numbers from my broker.

I guess it seems worth persevering, because the rewards are potentially so much greater. And as long as you don’t enter immediately before a covid type event, you’re unlikely to wipe out with buy as the only option as that’d need about 7% swing down in the market

DCA (dollar cost averaging) with index ETFs is good old strategy proven to be profitable in the long run. So yeah, you don’t need an AI to do this :slight_smile:

I use this for years on my self managed retirement fund.

I found an absolute beast of a system on DAX where it made 30000 pips in 7 years. Until I realised I pasted the moving average data back a month so it knew the future. With the data properly aligned, it’s incredibly difficult to find a reliable indicator that works on that index, it is a very volatile one, but that opens up more opportunity if you can get it right, but 54% seems to be the best win rate.

I made some custom indicators that I used a while back and have fed that data into the training and it’s bumped the win rate up to around 70%. I accidentally put a bug in the environment that means it will have a trade in place 100% of the time, so I’ll analyse the results, then redo it without the bug and see if that helps or not. I think I’ll try putting these indicators into the S&P data too and see if that helps. I forgot about those indicators, but remember they gave good results backtesting a while back.