Statistical Arb/Pairs trading strategy!

the answers are yes, and yes. I’m locking in the midle and at an arbitrary point.

I explained my mathod of locking on the thread, and uploaded a youtube video explaining how I do that :slight_smile:

How to center & lock screen for Kelton’s Statarb method - YouTube

Comparison between EU-GU with differing betas. All three pictures below come from the same 1H recent chart. Same pairs, different looking spread curves.

The first picture shows an equal weighted EU-GU for comparison purposes. Note the spread in the bottom window with 1,2,3 standard deviation bands and mean in red.


The second picture shows EU * 1.2 - GU so a heavier weight on EU.


Thes last picture is of the spread (indicator window) with EU-GU*1.5 so an overweight on GU.


I’m trying to understand the concept of cointegration. I think that is the right way to continue with stat arb.

On this page (in Spanish) you can see the four different ways to see a series. The cointegrated series looks like “noise” around an horizontal line.

I suppose that with cointegration one can set up an hybrid symbol that oscillates around an horizontal line with a calculated variance, that is different from the correlation aproach that can diverge a lot for some time.

Once that one have that hybrid symbol trading is easier than bread and butter, but the difficult part is to find a combination of symbols that form a near perfect cointegrated new hybrid symbol, that seems like noise around that horizontal line.

I’m thinking about how to get the right combination of symbols to reach the objective. If someone has more information that is more “human” hehehe please tell me to see if I can learn more about cointegration.

In this thread is something about cointegration, but I’m still unable to understand it well enough to use it.

Medisoft, I’ll share a few thoughts with you on cointegration to help you get started because there may not be too many on this forum who know about it. What is it? A mathematical model for finding long term co-movement among time series and it is based on regression. Note the emphasis on long-term. Strategies developed with cointegration will tend to have longer holding periods than strategies developed using correlation / empirical stat arb.

On this page (in Spanish) you can see the four different ways to see a series. The cointegrated series looks like “noise” around an horizontal line.
At the bottom of that link are two articles by Carol Alexander. You should see if you can make it through those. She is one of the best authors on the subject for non-math practitioners and tends to explain things quite clearly (her textbooks are great too). Once you have a basic understanding of the theory then you can move on to some application.

I suppose that with cointegration one can set up an hybrid symbol that oscillates around an horizontal line with a calculated variance, that is different from the correlation aproach that can diverge a lot for some time.

Once that one have that hybrid symbol trading is easier than bread and butter, but the difficult part is to find a combination of symbols that form a near perfect cointegrated new hybrid symbol, that seems like noise around that horizontal line.
Cointegration is a model, not a panacea. These models can fall apart, particularly if you don’t create them in the right way. (See Old Dog’s link below.)

If you want to get some hands on experience with applied cointegration, I suggest you read through System to hedge every major dollar pair for profit / low DD and follow the instructions for setting up Arbomat with MT4 so you can follow along. The main thread (Synthetic Hedges, Cointegration…) is also a very good read with a lot of great contributions. The "System to Hedge… " thread kinda goes off the tracks at the end but is fairly good for beginning.

Once you have a working understanding of the concept with Arbomat and can see it in action, read Old Dog’s Taming of the Beast? thread to see how to implement it without fooling yourself. Long story short, you want uncorrelated pairs to get a valid cointegration. Note the interesting debate in this thread between Old Dog and 7Bit. Side with Old Dog in that debate and your models will have a better chance to not fall apart.

I’m thinking about how to get the right combination of symbols to reach the objective. If someone has more information that is more “human” hehehe please tell me to see if I can learn more about cointegration.

In this thread is something about cointegration, but I’m still unable to understand it well enough to use it.
Hope that gets you started!

Hi guys,

very interesting thread, thanks for sharing all this information.

I traded Stat Arbs for about 1 year now.
I came to this after reading a thread on Donnaforex Forum.
There is a product (a semi automated EA) for trading this.
If you want to learn more, please search for “FX Algotrader” and “STAT ARB V.2.5”.

Once you have the arb open, this EA will do the work for you.
But what I experienced when using this EA is, the problem is to find the right opportunity for arbing and the correct entry point.
Because the EA uses Trigger levels to open arbs automatically. The trigger leves are just like bollinger bands around the spread.
One price hit either the upper or lower trigger, the EA will open the arb.
But where to put these levels? The vendor suggested to look at the left of the chart on D1 timeframe and set the distance of the trigger levels by following the recent spread volatility.
Btw, when we talk about spread, we mean the spread between the two arb legs (the two pairs involved). The spread of each single pair on the other hand forms the so called spread cost channel.

So thanks again for sharing your ideas, this will help in improving my arb trading skills I think.
If you have any questions about the EA from Fxalgotrader, maybe I can give more information about how this works.

Cheers and good trading.
Shark

A topic that seems to keep resurfacing is the EG vs EU-GU debate. Earlier in the thread I mentioned my opinion and stated that I would follow up with a chart comparison to put this debate to bed. Far from being definitive, this post is simply a pictorial description of EG vs EU-GU for future reference.

Some say EG = EU-EG. Some say EG is not equal to EU-GU. I will leave the editorializing to others and simply provide two pictures that show how they appear on the chart, then describe how they are similar and how they are different. The chart is EG 1 hour in green with an overlaid DodgerBlue EU-GU plot fixed at the starting point of the chart. The data shown is 7/28/2010 - 9/7/2010. The dates were selected because it shows the point on the chart where the green and DodgerBlue are identical. I kept the scaling identical on the two charts to show exactly the difference.

The first chart shows EG with overlaid EU1.0-GU1.0 on the same scale. This is what you would get if you compare EG to EU-GU side by side.


The second chart shows EG with overlaid EU0.6-GU0.6.


While the EG - (EU-GU)0.6 overlay is not a perfect match to EG, it bears a striking resemblance. But there are minute differences in chart formation that may be difficult to pick up on these smaller forum pictures. Not every peak on EG is mirrored exactly by peaks in (EU-GU).6, however it is worth noting that the chart formations are very close. It is expected that the charts would be similar because EG = EU/GU. However as any mathematician will say, dividing and subtracting are not the same thing. When you subtract EU-GU the USD portions don’t cancel out completely, leaving slight differences in chart formation. I’ll let the reader decide if those slight differences in chart formation are important or not.

The thing that is most notable about the first chart is the volatility of EU-GU compared to EG. The volatility of EU-GU is much greater as shown in the scaling shift in the first chart than the underlying EG volatility. What are the implications of the difference in EG vs EU-GU volatility? I’ll leave that for others to say.

How big was the difference in volatility shown the first picture? The range of EU-GU shown was about 400 pips. The range of EG shown was 273 pips. This makes EU-GU in this period 46.5% more volatile than EG.

It is worth noting as a post script that even the EG vs EU0.6 - GU.06 shows some significant and extended divergence at some points during the chart (not shown). The 0.6 beta is not a perfect weight over the entire data series. Using different weights will either increase or decrease the volatility difference between EG and EU-GU.

Hey I am very interested in getting this to work, I spent 2 hours trying to figure it out… can you help me install it and get it all working?

Do you have skype or something so we can chat?

Thanks,

Dustin

No time to act as tech support and this is probably the wrong forum for it. But the most common errors in setting up Arbomat are in not editing the file to include your R path (read instructions in the arbomat.mq4 file). Also a note, the file structure requires you to use dividers like / rather than . So you would need to replace the back slashes \ with forward slashes. For instance: (customize this for your file structure)

#define RPATH “C:/Program Files/R/R-2.14.0/bin/x64/Rterm.exe --no-save”

The other common error is in not putting the files in the right location. The EA goes under experts, the .mqh file(s) go under experts\include.

If you are having difficulty with Win7/vista hiding your MT4 files you might try copying the whole MT4 directory to a new directory that you own, and running it from there. And if all else fails, read the instructions and then the forums where others are helped through their install problems. HTH!

Thanks for the reply! will give it a shot!

FXEZ, can you tell me if I’m understanding well?

I’m making a program that test all the combinations of a lot of pairs to see which one is the best for cointegration.

I define the “best” as the one that is more stationary (with slope==0), has more trades (like frequency of trades, a trade is counted when the pair touch the 2nd or 3rd standard deviation and returns to the mean) and has less outliers (ideally zero of them, outlier for me are the ones that exceeds the 4th standard deviation)

What I’m doing is getting all the forex pairs with small spread (less than 10 pips)
Then, I’m converting the prices like this: log(price*coefficient)
Then, I’m doing the equation and storing the result in an array
Then, I calculate the linear regression of that array
Finally, I store the data on a database

After the process is finished, I simply get the best (like defined above) result from the database, that gives me the “equation” or the hybrid fx symbol.

This are some of my first results:

    M: 0.17493400 ([B]slope[/B])        B: -143.92413500 ([B]offset[/B])

equation: -3.99AUDCHF-4.00AUDJPY-4.00AUDNZD-4.00AUDUSD-4.00CADCHF-4.00CADJPY-4.00CHFJPY-4.00EURAUD-4.00EURCAD-4.00EURCHF-4.00EURGBP-4.00EURJPY-4.00EURNZD-4.00EURRUB-4.00EURUSD-4.00GBPAUD-4.00GBPCAD-4.00GBPCHF-
4.00GBPJPY-4.00GBPNZD-4.00GBPRUB-4.00GBPUSD-4.00NZDCAD-4.00NZDCHF-4.00NZDJPY-4.00NZDUSD-4.00USDAED-4.00USDCAD-4.00USDCHF-4.00USDCZK-4.00USDHKD-4.00USDIDR-4.00USDINR-4.00USDJOD-4.00USDJPY-4.00USDKWD-4.00USDMXN-4
.00USDOMR-4.00USDPKR-4.00USDRUB-4.00USDSAR-4.00USDSGD-4.00USDTHB-4.00USDTRY-4.00USDTWD

On this example, the slope is pretty high, I think it should be less than +/- 0.0001

Thanks

This is a better basket

    M: -0.00000500
    B: 5.30450000

ecuacion: 3.00AUDJPY-1.00CADJPY-2.00EURGBP+3.00EURJPY-3.00EURUSD

slope is very flat, B (from y=mx+b, the line equation)

Then, I’m converting the prices like this: log(price*coefficient)

Hey Medisoft! It is typically either log(price) * coef or the simpler and more used price*coef. You should get very similar results from either and can compare to what you currently have. I prefer the more simple (price * coef) equation but you can check the calculations for yourself quite easily.

Just as a point of information, the slope is an indication of the amount of trend, as in a trend line. As you know, a strictly stationary series will have zero slope / trend.

A stationary time series is one whose statistical properties such as mean, variance, autocorrelation, etc. are all constant over time.

So a stable mean is one part of a stationary series.

The other important part has to do with the variance. Having a fixed variance / standard deviation is a great blessing in system design. With price time series having a fixed standard deviation is really hard to achieve due to the volatility patterns in the underlying data. But you don’t need strictly fixed std dev all the time to make a profit.

Getting back to the point made earlier regarding Old Dog’s thread, one way to test the validity of your resulting curves is to do an in-sample and out of sample test and compare the results. Your out of sample curves should not alter the slope dramatically, or the standard deviation in a major way. The most likely difficulty will be the trend / slope changing out of sample. This may indicate that the underlying symbols being used are not in fact cointegrated, possibly due to too high correlation between the pairs. Or possibly invalid assumptions in your model. There are lots of things you have to look out for.

As for your general approach it seems sound. The task in system design is a bit different than that of optimizing the stationarity of a time series, so the method of maximizing crossovers can show good results. But I suggest that really the best way to test and maximize the goodness of the curves is to create an underlying strategy, and apply the data to that strategy both in and out of sample for comparison purposes. Most stationary doesn’t necessarily equal most profitable, though more stationary usually is more profitable but also can be less volatile, which can mean less opportunity. It is a balancing act.

I hope that makes sense.

I took the liberty of running the formula above in three forms on some 1 minute data in R:

Pricecoef
log(Price) coef
log(Price
coef)
(Shown as sub graph 1, 2, 3 in the picture)



Note how similar sub graphs 1 and 2 are to each other. The forms are Price
coef or log(Price)coef. The third chart log(Pricecoef) looks quite different.

ecuacion: 3.00AUDJPY-1.00CADJPY-2.00EURGBP+3.00EURJPY-3.00EURUSD

By way of contrast and for the sake of completeness, I ran the same pairs with Johansen cointegration using R. Below are the first three columns of eigenvectors applied to the price charted.

R code for package urca:

 m = ca.jo(p, type=c('trace'), ecdet=c('trend'), K=2, spec=c('transitory'))

Eigenvectors, normalised to first column:
(These are the cointegration relations)
AJ.l1     1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00  1.000000e+00
CJ.l1     2.845320e+01 -2.321219e+00 -6.674735e-01  1.481832e+01 -1.074078e-01
EG.l1     4.471224e+01 -1.105396e+00  6.342888e-01 -4.278603e+01 -8.107445e-01
EJ.l1    -5.219393e+01  3.519224e-01  6.367206e-01 -2.301109e+01  8.236176e-01
EU.l1     2.414314e+01  6.549944e-01 -4.443174e-01  2.967418e+01 -2.073002e+00
trend.l1 -6.072959e-06 -2.922243e-09  3.198172e-07 -7.391761e-06  7.699305e-07

So the first column 1.0AJ + 2.84+01CJ + 4.47+01EG -5.21+01EJ + 2.41+01*EU: these are in the first sub graph.


Values of teststatistic and critical values of test:

          test 10pct  5pct  1pct
r <= 4 |  3.18 10.49 12.25 16.26
r <= 3 | 14.15 22.76 25.32 30.45
r <= 2 | 27.05 39.06 42.44 48.45
r <= 1 | 54.03 59.14 62.99 70.05
r = 0  | [b]97.30[/b] 83.20 87.31 [b]96.58[/b]

Based on the test statistics I would conclude that only the first picture (first column of eigenvectors top of post, first row of teststatistics just above) is significant at the 99% level. This is interesting because it appears that only the 2nd picture has a stationary mean.

The 3rd picture looks sort of similar to the first two pictures in the previous post! Note, I didn’t center the charts for the final number in each column (trend.l1) which is also the intercept value in a regression.

I installed arbomat with R and the EA on a demo account.

I still don’t know how does R obtains the coefficients so fast. I suppose it solves a matrix or something like that, compared to the brute force attack I’m doing.

I provided to arbomat with some of the combinations that my own method gives to me, and I found that they shows a cointegrated chart, with zero slope, so maybe I’m doing something good.

What I need to do now is to obtain my coefficients in a faster way, because it can take hours to obtain them for only 6 forex pairs.

Arbomat is using the Engel Granger method of cointegration. That method uses regression. You can see in the code the calls using lm that show what is going on. See the onOpen method in Arbomat for the real guts of the approach.

if (allow_intercept){                      
      Rx("model <- lm(y ~ x)");                       // fit the model
   }else{
      Rx("model <- lm(y ~ x + 0)");                   
   }

Note the caveats associated with using allow_intercept = true with use_diff = true discussed in the other threads. It can lead to spurious regressions. I generally favor allow_intercept=true with use_diff=false.

Have you considered using genetic optimization over the brute force approach? You provide a range of values (minimum and maximum desired parameter values) for each coef, and then randomly select from that range parameters to test. You set your tests for say 5000 runs this way and can cover fairly well the landscape to get a quicker idea of which coefs are better according to your scoring method.

Yes, I considered that, and also made my program to do that.

It is better for a big combination of symbols, like 7 or more, because brute force with 7 pairs take days to complete. With 6 or less it take less than a day to complete, and if I trade 4H it is fast enough.

By the way, I think I will need to learn more about R, to understand the code of arbomat ehehhehe.

I was thinking about using sum of vectors. Let say, I calculate the linear regression of the majors, and convert the line equation to vectors.

What I need is a resulting vector with 0 degree angle, so I can sum all the vectors, with coefficients to have a resulting vector with that zero degree angle. Maybe that will give me an equation to solve instead of brute force / genetic approach.

Don’t these test statistics show that there is no cointegration relationship at all between those pairs? r denotes the cointegration rank. Even for r <= 1, the test fails reject the null hypothesis at the 10% level.

r=0 is interpreted as there is at least one cointegrating relation, r <= 1 as there are at least two cointegrating relations etc. It’s counterintuitive, I know. See this for more:

http://cran.r-project.org/web/packages/vars/vignettes/vars.pdf

Page 24:

The outcome of the trace tests is provided in Table 4. These results do indicate one cointegration relationship.
Then note table 4.