The algo trader. Hi!

Over the past few months I have been building an algorithm to trade the eur/usd. From a forex perspective Its my first stab at it however I am no noob to the algo world. I have a masters degree in predictive analytics and an undergrad in business management. I am not a math or computer science geek. Just a business guy trying to use technology to solve business problems.

I’ve always had a strong affinity to the markets. I traded back in 2000 to 2004 with little success and much frustration. 20 years later that trading stint combined with my professional trajectory has finally led me back to the trading world.

Over the past 5 years I have built algorithms to solve problems for the the corporate world. Solving questions like:

  • what customers are likely to leave?
  • what products are likely to be cross-sold or upsold?
  • what prospective customers are least likely to convert to actual customer?
  • what customers are likely to default?
  • what products will supply chain have insufficient supply for the next order
  • what customers are likely to respond to marketing

This will be no cake walk. Based on exploring the predictor variables I’ve created and the given results of my models the markets have far more randomness baked in it than problems like “what customers will leave?”. When a customer leaves its due to identifiable business practices like long customer service wait times, aggressive sales tactics, or overly priced products compared to market pricing norms.

In the business world that data is marked as such. I can look at the data and see how long a customer waited, how many times they have been contacted by sales, and their pricing compared to what a competitor would sell them at. The forex market has two variables, price and volume. All other variables are derivatives of those two market attributes for example; candlestick patterns, moving averages, and oscillators. That being said, compared to all other problems I have solved randomness has been no bigger threat than now.

I built two models. One for long trades and one for short trades which will hopefully be finalized this week. The pieces I still have to build out are incorporating a live data feed to my software, building a mobile application for trading alerts, debugging in live application, and finally testing with a tiny bit of cash.

I’m excited to start sharing more of this journey.

Wish me luck.

Best trades to you all,

The algo trader

1 Like

Hi and welcome on BP :slight_smile:

Hi,
I look forward to your data analytics progression. Are you contemplating using sentiment like twitter APIs or google APIs?

Backtest for about 20 years for both ranging and tending markets as a start, adding 'what if this happens? and high, low volume questions. Add time frame differences, and any other question you can think of.

My pro friend has about 5 algos for different market conditions. He’s been consistently profitable for many years now.

1 Like

Hi. Sentiment analysis would be interesting however the art of stitching sentiment to 4 hour bars would be tricky. It would be interesting to see what I could scrape from a twitter API that’s predictive. Right now its price action and technical indicators.

Im using 3 years of data for modeling and then testing it out on a future 9 months. There is SOOO much that can be added to machine learning that it can get extremely complex and eventually the complexity can suck up all your computational resources. I have to be lean at this point as I dont want to start moving everything to the cloud. Thats awesome to hear about your friend.

1 Like

Very interested to see how you progress. Thanks for keeping us in the loop.

Hi Baby Pips Community,

Update Data issues please stop

I have completed my engineering for two machine learning models. 1 for long trades and 1 for short trades

There is one thing standing in my way. It’s a quality data feed. As of right now I have built models on 3 different data sets.

  1. DukasCopy bank historical 4hr increment snap shot. Data is considered the gold standard. The issue is I can only get historical data via a historical snapshot that is available at the end of the day. Considering my two model assume live data to be updated every 4 hours I cant use this data. I emailed them and it appears they only provide their data feed to customers and as of now they are not accepting US customers.

  2. OANDA. Data has problems. The most significant problem is the potentially erroneous values. The reason I say potentially is the lots traded could be valid but don’t belong to the bar they are aggregated in, are not retail, or maybe they are indeed erroneous and should be removed. They are looking into the matter.

  3. Twelve Data. Data is unusable. I sent a call over the API for 4 hour bars and I got a mix of at the very least 1, 2, 3, and 4 hour bars. I had to stop looking for errors because it was so full of them I couldn’t make any logical assumption as to what was going on. They said they were going to look into it but have not gotten back to me.

I will be moving from the 4 hour bar strategy to a daily bar strategy. At the daily bar I wont need to rely as heavily on automation because I can manually enter in the high, low, open, close values for now. I didn’t want to have to be up in the wee hours of the morning but I guess for now ill have too.

The data issues threw me into a loop for the past month and I’m ready to make some changes to move forward with this. Hopefully I’ll have some more meaty stuff to show next time.

1 Like

Hi @Rjmiller421,
Your last post is perhaps the most useful (and impacting) posts I have read on Babypips in a long time.

It is useful because it highlights an issue that all of us should have seen on at least one occasion, whereby a massive high or low excursion cannot be realistic, since even if you take the timeframe down to 1 second, it will still only “not occur outside of the smallest timeframe” and must therefore be treated as an erroneous reading. That could be removed ( in theory) by programming to the effect that “if price (at t second) is more than “x” above or below price (at t-1 second), then replace value at t seconds with value at t-1 second”. But you highlight other potential issues as:
How do you know if the base time reference of the data is GMT, UTC or another?
Why does it appear that OHLC values from a short timeframe do not appear to be a subset of those from the same “enveloping” higher timeframe?

This link (over a decade old) reviews the same problem but does not conclude with a solution.

https://www.forexfactory.com/thread/5284-reliable-data
No conclusion but some examples of using MT4 period converters on raw data feeds.

It is not only important to anyone trying to automate a process by using datasets from API feeds, it is important to anyone whose trading plan is based on an assumption that the OHLC data is, within a few PIPs, a reliable representation of reality.

If you have not already gone down this route, it is my assumption that the vast majority of data feeds emanate from the same “raw data” of which there are likely very few providers. The question is, besides a live data delay for commercial sensitivity reasons, what “manipulation” is done on the input raw data before the output data provider creates his output data set for download? What is the relationship between an hourly and daily OHLC dataset and how could you be sure whether an hourly OHLC is indeed created from sixty one-minute datasets representing one minute data, or whether an hourly data set is the mathematical combination of tick data, one second data or “another method”?

The implications are dire for any backtesting, and therefore for many members on here who have done, or are planning to do, either manual or automated backtesting. If you can’t trust your input data source, how can you be confident whether a back test is a valid representation of a plan if the input data to that plan is questionable?

  • Update *
    Twelve data gave me an update on their data issue and they are still working on it. Keep in mind this company sells access to this data. The data is specifically the structure of a 4hr candlestick. It contains the “high”, “low”, “open”, and “close” prices for each observation and the respective time stamp. They should include volume so you can be sure a sufficient liquidity is captured at each observation

If this data was good I should trust a few attributes of this data:

  1. The time stamp for each sequential bar should be at least 4 hours apart. There are seldom cases where the separation is more and that generally is okay. Conversely, the separation should NEVER be less.

  2. The volume should have a good distribution throughout an interval of data. For example, if I am looking to build a model on many years of data I would look at the volume by year and by month. I check to make sure year by year by and month by month has no issues. Each year and month is represented and the values appear uniform. You might find in the early pandemic volume was lower than normal. and that is okay.

  3. I check the extreme values of all the fields. Bottom 10 and top 10 are sufficient to analysis. The currencies price should make sense and the so should the volume. If the volume is VERY low then check the prices against yahoo finance or some other reliable charting source. If the volume is high then there was likley something driving that demand in the news. In my experience it is the low volume observations that contain pricing errors.

  4. Spot check observations. Grab a handful of observation and validate them against a charting software. I look for prices to vary by as much as the underlying currencies spread. I would accept a larger variance than the spread if the volume declines. When the volume is small it is hard to trust that the volume presented represents the population. For example, the data might show 1000 eur/usd lots in 4 hours. That is likley not a good representation of the total lots that exchanged hands in that 4 hour window. Therefore the pricing will have a high chance of being out of whack compared to the charting software.

  5. Chart the time series and visually inspect for bad bars. There shouldnt be any bad ticks. Those should be cleaned by the provider you got the data from. If there is bad ticks then email the data provider and tell them to fix it. Dont attempt to fix the data since you can never be sure what logic exists in the source code that is creating the error.

Hi @Rjmiller421,
I’m curious to know the reason you haven’t start with just Metatrader 4 datas instead of three datas set then backtest your models on this last on large period of time.

For information, I’m currently equally developping a trading application but my approach is different.

Good question Blue. I am building a machine learning model which requires each 4 hour bar to be scored. Most people build rules based models. Rule models are essentially “if this then that”. Machine learning models look for predictive patterns and then ‘score’ the future observations for how likely it is that some behavior occurs or how much or how little should we expect from some behavior. From my understanding MT4 doesnt support that type of modelilng.

I see.
I saw you use price and volume the only parameters for your models.

It could be useful to take into account the economic calendar(news) and/or the working days.

That is the next set of data i’ll be looking to add. For now price action will account for some of the news cycles. An easy add on is day of the week and week number of the month. That will be the next dimension I add to the data engineering