Basic Research and statistical analysis example for Noobs

Overview

Here we go. Noobs pay attention. This is your job as a trader. Research and statistical analysis. If you can’t preform this then the sad news is give up and walk away. You’ll save a whole heap of money and pain.

This topic came to me after a recent thread where a newbie wanted to know if a currency deprecated in value after a certain time of day like close of market. This type of question immediately raises alarms for a more experience (not necessarily meaning successful) member. Lazy trader. And there is not much room for lazy traders in this game. You have to be able to research, collect data, analysis that data and apply your findings to your strategy. There is simply no way around it.

But where to begin?

I’m going to give a working example to demonstrate how to go about it. After that it’s up to you.

But remember this. I’m no teacher, educator or mentor. I am also heavily dyslexic. So excuse me if this jumps around a bit. This thread is about how to do some basic analysis using the tools available to an everyday Joe. It’s how I do things because this is how I’ve been exposed to these tools. It certainly may not be the best or simplest way. What’s its about is demonstrating that you will need to figure out how to do your own relevant data analysis if you even think you have a chance at winning at this game.

Hopefully some others will jump on board and improve what I offer here. Best of luck , ya all gonna need it.

7 Likes

Data

First and foremost you are going to need quality data. Quality data in, quality results out. Lucky for us there is a great source for quality data available over at Dukascopy. Even better, it is completely free. But there is a catch. It’s a real bugger to download. You have one of two choices. Piss-fart-gin around manually downloading directly from Dukascopy or use third party software. You will want to use the software.

There are three software options you can choose from and I’ll briefly go over each.

  1. Birts Tickdata Suite. IMO, the gold standard. You will pay a license fee. However it makes life a whole lot simpler and a must for EA development. It’s what I use. Everything is self managed. True plug and play. However Tickdata Suite wont produce the csv. file we’ll need for data analysis, we will have to convert it which is relatively straight forward.
  2. Tickstory Lite. Another paid app. However it is a much cheaper option. It does not have the same features as Tickdata Suite. It will however allow data export as a csv file thus making it a suitable alternative.
  3. StrategyQuant Tick Downloader. This app is completely free. It does not support back testing in mt4 however. It will however export data as a csv file. The pay off, man it’s slow. I mean mind-blowing slow. It will literally take an hour to import 1 years worth of data. Still it’s quicker than manually doing it from Dukascopy and for a noob probably the best option when starting.

Hint. Data from Dukascopy is set to GMT time. Most brokers run off the brokers local server time mainly to align end of week with close of trade New York. My advice, adjust your time to your brokers server time. My brokers time which is Eastern European time adjust for US daylight saving which is standard practice.

1 Like

Generate a csv file.

A csv file stands for comma separated values. Traditionally it is used to store tabular data so it can be read in excel. It is the file type you need to open in excel to conduct your data analysis.

Data from Dukascopy comes in a bi5. file. In MT4 , the program generates hst. and fxt. Files. Excel can’t read any of these formats. So first you need to generate a csv. File for the timeframe you wish to examine.

THIS IS NOT A TUTORAL ON HOW TO GENERATE csv. FILES.

You will need to do this yourself. Each of the above software apps has a way of doing it. They go into detailed descriptions on each website. Can’t figure it out, take it as a sign.

1 Like

How to examine data

So now we have some data it’s time to examine it. IMO the best way is to work through an example.

Lets us ask the following question;

BY WHAT TIME DOES THE HIGH OR THE LOW OF THE DAY FORM ON THE GBPUSD PAIR SINCE 2012?

Step One - generate and open a csv file for the 1hr chart

The analysis you wish to preform will determine what timeframe of data you will need to produce. For our example we will work in the 1 hr chart although a 15min, 30min or 4 hr csv. could be used. Generate this file, find and open it in excel. Save as under a different name. You’ll get something that looks like this

To the more experience, we know what each column represent. However to a beginner it pays to add some headings

Step Two - Add Headings
Right click on row 1, from the drop down list click insert. This will add a row above in which we will add our headings into. From column A to G they are , Date, Time, Open, High, Low, Close and Tick Volume.
Your spreadsheet will now look like this.

Excel hint - in the View tab is a function call Freeze panes. Click on this and choose Freeze Top Row. This will allow you to scroll down through he spreadsheet while retaining your headers.

Step Three - Select your data range.
Now I have data going back to the 5/5/2003. I don’t need all this data. I only want from Jan 2012. Scroll down to Jan 2012, left click in column A (date), hold and drag across to column G (volume) then while still holding scroll down to the bottom of the spreadsheet. This will start the sheet scrolling down until you get the last row of data. Release the mouse button . You have now selected the data range you wish to examine. Right click anywhere in the selected area and copy.

Step Four - Create a new worksheet
To effectively work with our data it is best to add a new work sheet. Left click on the add new worksheet button will add a new sheet to the file. In cell A1, right click and from the drop down menu select paste. Your data will now be pasted there. As in step two, add your headers and freeze the top row. Next we are going to add the follow headers for our analysis, High, H Time, Low, L Time, High/Low. To answer our question we need to following pieces of information, the high of the day and what time it occurred, the low of the day and what time it occurred and which one happened first.

Step Five - data analysis.
For the 1hr chart naturally enough there are 24 rows of data collected. One for each hour. These 24 rows will form our array for each days analysis. The first days data goes from Row2 to Row25, The next day Row26 to Row49. And so on. As Row 25 represent the last data for the day we’ll do our calculations here. Select column H25 by left clicking on it. With cell H25 selected go to the Formulas tab, left click on More functions, hover over statistical in the drop down menu until a second dropdown menu opens, hover over that list and scroll down to MAX function.

The MAX function selects the highest valve in a range . Left click on MAX and the following Function Argument window will appear. It wants to know the range of data to examine. In our case it’s the highs of the data. Column D has the highs so we need to select from D2 to D25. Type in it into the field as such D2:D25. Once you’ve done this click OK and the high of the day will be displayed. In this case 1.55429

Knowing the high of the day is one thing but we need to know what time this occurred by. We need to match the high with time in column B. This time select cell I25. In the formula Tab, select Lookup & Reference, and select Match in the dropdown menu. This formula returns the relative position in the array of the value that matches the reference value (in this case the high). It’s important to understand this because the timeframe with determine that relative value. Lucky for us, on the 1hr chart the relative position represents time bar closed. When clicked on you’ll get the following window displayed

In the Lookup_value type H25. H25 contains the value of the day high we previously found. Next in Lookup_array, type D2:D25. This is the range we used that contains the high value. Finally in the Match_type, type 0. 0 represent exact match. Click OK and the position will be display. In our case 10.

We now know that the High of the day was formed between 11am and 12 noon. Next we are going to repeat but this time for the days low. Select cell J25. This time we want to find the low of the day. We’ll use the MIN formula found in More Functions - statistical. The low of each hour is in column E, cells E2 to E25. Type E2:E25 into the Number Field and press ok. Our return value is 1.54671.

Next we determine time the same way as we did with the high. Except this time the Lookup_value will be J25, the days low and the Lookup_array will be E2:E25. Our result the low of the day was formed between 5pm(17:00) and 6pm(18:00)

We now know the times the days high and low formed. The high formed before 12:00 and the low at 18:00. The high time being the smaller of the values was formed first. Therefore to tell which formed first we choose the MIN value. Select cell L25. From the formulas tab select More Functions - Statistical - MIN. In Number1 type I25 (the cell containing the high time. In Number2 type L25 (the cell containing the low time) then press OK. This will return the lower value. And there we have it. The high/ low for the day was formed at 12:00. We have analysis the first day’s data.

1585 to go.

What’s that you say, another 1585 times wow, that is going to take like forever. Chill, relax. This is excel. We’ll do it the quick way using the Filter function.

Filtering and working with your data.

Now we could progress along, scrolling through excel copying and pasting formulas. But this is time consuming. And at the end of the day this is excel. It’s designed to do the hard work for you. So we’re going to take advantage of the filter button

Select and left click column B (Time). Select the data tab and left click the Filter button in Sort and filter. A filter (down arrow) button will now appear in the header cell.

Left click on that and a dropdown menu will appear. We are going to filter all results other than that of end of day or 23:00hrs. Deselect “select all” the select 23:00 only. Click Ok

Now the spreadsheet only display rows containing the time 23:00. This corresponds to the row we need our formulas to be applied in to work. You should see this.

Now go to cell H25 and select from H25 to L25. Do this left clicking and hold the left button in H25 and scroll across to L25. Release. This selects our formulas.

Next move the mouse button of the bottom right corner of the selected cells, a solid sight symbol will appear. Left click, hold and scroll down to the end of your data. Release. Excel will now automatically copy all formulas to our worksheet.

Finally, in the Sort & Filter pane left click on the Clear button. This will cancel the filter option returning the worksheet back to the default view. As you can see we have now analysis all our data, 1586 days worth.

Activate Excel Data Analysis ToolPak Add-in

Now to interpret our findings. Again excel has the tools to take the hard work out of this task. But first will need to active an add-in in excel. Select File - Options - Add-ins. At the bottom of the window ensure Excel add-ins is selected the left click go.

An Add-ins window will appear. Click to add the analysis ToolPak and left click OK

The Data Analysis toolpak should now be active and displayed under the data type

Statistic analysis of our data

The final stage of our journey. Lets interpret our findings. The Data Analysis ToolPak has a histogram function. This will display frequency and cumulative percent in both tabular form and graphic chart. Perfect for our needs.

The data we are interested in is what time the high/low formed by. Column L. Use the filter button and filter out times except hour 23:00 again as this is the rows that contain our data. Select column L by the rick click and select copy from the dropdown menu

Open a new worksheet and in cell A1 paste our data. Next select cell B2 and type in Bin. This will represent each value excel will work with. Then from cell B3 to B26 type the values 1….24. Your worksheet should look like this

Now left click on the Data Analysis pane in the data tab. A dropdown menu will appear. Highlight Histogram and click OK

The histogram data window will now appear. We need to input the variables. Input range. Select the input pane, a sub window will pop up. Scroll to cell A2, left click , hold and scroll down to the last row of data and release. The sub window will copy the range. Deselect the sub window.

Select the bin range. Again a sub window will open. Select cells C3 to C26. Deselect the sub window.

In the output options select Output Range and click on the select button. Click on cell G2. This is where our results will be displayed. Finally select Cumulative Percentage and Chart Output for our visual display. Click OK when complete

The end result should look like this

To answer our question

In case you forgot the original question

BY WHAT TIME DOES THE HIGH OR THE LOW OF THE DAY FORM ON THE GBPUSD PAIR SINCE 2012?

We can see from our data that by 12:00hr East European time (10am British time) there is a 79.5% chance the high or low of the day being formed on the GBPUSD pair.

Now, one last word. We’re here to answer questions. But if you can’t demonstrate your work then it might not be the answer your looking for

1 Like

The floor is all yours

@_bob
ok… so i’m sitting here, I read this line hehe
and i’m like … HERE WE BLOODY GO :stuck_out_tongue: he’s got a glass of milo ready in front of him, he’s got a backup prep’d in the fridge if he runs out of the first one :stuck_out_tongue:
and he’s gonna teach the newbies how to trade

HERE WE BLOODY GO
BOB’S COMPREHENSIVE COURSE ON FOREX TRADING FOR DUMMIES(SORRY… NEWBIES) :stuck_out_tongue:

OK, i’m gonna read this as well (Let me make a cup of coffee first,)
be cool

1 Like

ahh yes… i remember that one

or they could ask Martin, Really Politely and Martin might be a nice guy and upload it to a cloud server that is a lot more reliable and quicker to download then dukoscopy hehe
then martin will give them a link to the direct folder they need to download from , SO THEY DON’T NEED TO INSTALL ANYTHING (as would be the case with something like dropbox or icloud) this would be MEGA that i’m using.

Because Martin has an internet connection of 115Mbps and it took him around 2 Weeks to DOWNLOAD EVERYTHING , Every Instrument, the Entire Time Frame … EVERYTHING

so… yeah, they could do that as well :stuck_out_tongue:

1 Like

Actually NO mate
Specifically speaking…
it is text based data that can be used by ANY SPREADSHEET or ANY DATABASE
so, it’s not just excel, it can be Calc it can be MySQL, … whatever
Any spreadsheet or Database can import this file type and use it

Bro happy to be corrected and have this thread enhance anyway and anyplace.

This all took me 3 coffees and 2 milos and a quick yell at the kids to complete. Just as well the ball and chain is working tonight.

Must be one of those special Australian concoctions for those for whom the Foster’s just doesn’t pack enough of a punch?

2 Likes

Might get in trouble with the robot elders for this one.

Aussie slang for “to get nowhere fast”

1 Like

@_bob
yeah… but did you put 2 spoons of milo or 3 hehe

Thank you very much for this step! I have marked this educational thread! Although, I am not a newbie, but I am sure there will be so many educational stuff‘s!

No… i don’t think you will for this one.[quote="_bob, post:16, topic:135938"]
Might get in trouble with the robot elders for this one.
[/quote]

Wow, what a ‘piece’ - must be important as there is a shed load of data here and Bob has taken the time and trouble to share and provide the process. Will save, read and try and digest. Thank you for sharing a valuable resource/insight!

1 Like