Well I thought I would contribute something to this thread. this is an update from my original thread which caught little traction but maybe this is the right crowd. Some of the jpegs are dead now from the hosting site i used so i attached a txt file with all the stats.
http://forums.babypips.com/newbie-island/46361-datamining-quantifying-your-markets-personality.html
I do this type of very basic data mining on every instrument I trade. The data feed I am using is FXCM. The dates for each level of study are noted for each time frame, all bars are built using 1 minute time bars.
This study is based on the closing prices of bars, I am using % ages as opposed to pure price movements for several reasons. Usually at higher prices we have higher volatility, changing to percentages helps remove that bias. Also financial time series are in general non stationary and non normal, so determining any confidence level becomes more difficult, changing to % returns (you can do log returns as well) creates a weak stationary time series which becomes easier and more reliable to work with.
lets address what each field means.
of periods = how many bars of that time frame were analyzed
of Moves = how many bars consecutively closed in the same direction
Average period in move = how long they lasted
Avg AMP = the amplitude in % the moves traveled
Chance to last X+ periods = is the % chance that a move lasts X or more periods/candles
Chance > X % move = the % chance that a move larger than X percent may occur.
My analysis of this data:
Euro/Usd is symmetrical in both directions on all time frames, this is not the case with other instruments. Take stocks for example they exhibit a long side bias. This is both for the num of moves as well as the average amplitude of moves.
Also note that as the time frame decreases the average amp (volatility) does not decrease at the same rate, it decreases significantly less. So you would expect from the daily to 4 hour ( 6 periods) a 1/6th the average amplitude of the daily move (.155) you actually get (.37) about double what we would have guessed in a linear relationship. This is true again for all time frames.
Lets look at the characteristics of the % chance of a X + move, think of this as a trending characteristic. The % chance between the daily, 4h and 1h are all very similar, only 1-2 % age points difference. Now lets compare daily and 5 minute. Most people would say that the higher the time frame the less noise, and the “easier to trade”. So this comparison would be natural as the 5 minute is very “noisy” as seen on numerous forum posts. The 5 minute % chance of X move is larger than the daily % chance of X move in every category. Actually its larger than every chance of X move on every time frame. But the significance comes from the chance for a 4 + period move. About 6 % (5.81 % on average) larger than the daily time frame, it is 3.79 % higher on average than the hourly time frame. What does this tell us? That lower time frames have “momentum”. That they tend to exhibit more trending behavior than they are given credit for, however the amplitude is very small so the amount of market friction (slippage/commissions, execution latency etc.) becomes a larger factor. But this just shows that being profitable at the 5 minute and lower level is possible and there is an inherent edge there. This edge is best shown by the 2 + % chance stat. taking a random entry (50/50 long short) the % chance to last 2 periods respectively is 51.96 and 52.04. Basically a net 2 % edge either direction. Now is the bar large enough for a retail trader to profit due to those market frictions, probably not. But you can see where the HFTs and large institutions have a fertile trading ground.