Take a system you have optimized from period X to Y and then run it from Y to Z, then evaluate the characteristics of the system on the Y to Z region in order to evaluate whether or not it passes the test.
This means that you should run an optimization, take the best result and run an out of sample test. If the result fails then the whole trading logic has no value and no other optimization results are used.
The problem is that if you start āpickingā results to get those that perform well in out of sample testing you are effectively introducing a strong selection bias which is equivalent to running an optimization of the full testing period
Just to reemphasize, make sure you only optimize on the in sample period. Donāt use the out of sample results as a basis for system selection as you are going along optimizing the in-sample data, only for system validation at the end.
It makes no sense to out of sample any but the best optimized results of the āworking periodā since in real life you wouldnāt be able to āgo backā and ācherry pickā the best system for use.
I donāt necessarily agree with the following paragraph, because volatility tends to cluster and randomly selecting periods may cause that relationship to vanish when testing if there is too much splitting of the data. This is also why I donāt use Monte Carlo analysis when analyzing system results. Also it is bad practice to intentionally introduce any selection or look ahead bias.
Another interesting factor is that using a straight X->Y->Z approach doesnāt allow for the best out of sample testing solution since you are effectively also introducing a selection bias against the Y->Z period. Although this selection bias is not that bad you can obtain better results if you distribute the out of sample period length randomly within the tests in such a way that no particular set of market conditions is evaluated as an out of sample period. So for example if I wanted to run a 20 year test with 10 years of optimization and 10 years of out of sample testing I would choose 10 years at random for optimization and then do a test of the 10 out of sample years after that. This means that any strategy developed will be inherently more robust as it is out of sample tested across non sequential market conditions.
And finally this is a good summary:
Does this mean that out of sample tested strategies cannot fail ? Certainly not, out of sample testing merely ensures that a strategy was able to survive outside of its optimization period and therefore it hints that it can survive changes in market conditions without losing its mathematical expectancy.
Oh! I seeā¦ so a forward testing could be a Out sample, because my optimization is between right now and 2000 previous bars, and a forward testing from right now to right now + 2000 bars is like y -> z step.
FXEZ, Iām extracting cointegrated stocks from US market.
Iām following this rules:
Period 2 full years of data on daily view
Low correlation (from -0.30 to 0.30)
Beta difference less than 0.2
r=0 shows cointegration
Price of stock above 5 USD
Minimum cash traded daily of 1,000,000 USD, using average volume * price
Iām also planing to check that they are on same Sector/Industry, but right now it donāt do that.
Do you think they are correctly obtained?
This are some results
Relacion valida para MON y NE correlacion 0.1199 y cointegracion r = 0 ; test 31.01 > 1pct 30.45
Relacion valida para MON y DO correlacion 0.1909 y cointegracion r = 0 ; test 31.98 > 1pct 30.45
Relacion valida para MON y SLB correlacion -0.07077 y cointegracion r = 0 ; test 36.65 > 1pct 30.45
Relacion valida para MON y TDW correlacion 0.1469 y cointegracion r = 0 ; test 31.91 > 1pct 30.45
Relacion valida para MON y IX correlacion 0.2675 y cointegracion r = 0 ; test 37.01 > 1pct 30.45
Relacion valida para ACOM y IRIS correlacion -0.1357 y cointegracion r = 0 ; test 32.24 > 1pct 30.45
Relacion valida para LM y CEB correlacion -0.09153 y cointegracion r = 0 ; test 30.99 > 1pct 30.45
Relacion valida para LM y PKG correlacion -0.1488 y cointegracion r = 0 ; test 37.85 > 1pct 30.45
Relacion valida para LM y IRIS correlacion -0.2981 y cointegracion r = 0 ; test 38.25 > 1pct 30.45
Relacion valida para CTAS y IMH correlacion 0.288 y cointegracion r = 0 ; test 31.12 > 1pct 30.45
Relacion valida para CTAS y PVR correlacion 0.2062 y cointegracion r = 0 ; test 33.48 > 1pct 30.45
Relacion valida para OFG y AZZ correlacion -0.2328 y cointegracion r = 0 ; test 31.41 > 1pct 30.45
Relacion valida para OFG y IRIS correlacion -0.01355 y cointegracion r = 0 ; test 43.16 > 1pct 30.45
Relacion valida para OFG y CBB correlacion -0.2493 y cointegracion r = 0 ; test 34.33 > 1pct 30.45
FXEZ, can you confirm me that a way to check if some time series is I(1) is with the ur.df function?, checking if the test if greater than the 1pct result, with lag=1?
Thanks!
By the way, I found about 6000 combinations of low correlation and good cointegration on the stock market. Some of the combinations I found had cointegration with r <= 1, r<=2 and r<=3. I think Iām going to retry the test with more data, maybe 4-5 years of data to have longer term information.
To your post #1285, the method looks sound enough. Because there is not any one āright wayā when it comes to putting together a strategy testing methodology, youāll have to see how the results look from your specific formulation and if it makes sense to you.
Yes the ADF test (ur.df is one implementation of that test) allows checking if the cointegrated series is I(1) (or I(0) as it should be). From what Iāve read there is quite a some statistical āartā in how to put the tests together. Statistical art is generally left for the pros so the basics will have to do for the rest of us. There is a method for determining the number of lags to use for a given series. I canāt off the top of my head remember how this is done. There may be something on that topic in one of the vignettes (urca or vars). I think there was a good journal article on the urca package but canāt find it at the moment. I can try to locate it if youāre interested.
Medisoft, I think I found the Journal of Statistics paper I mentioned in the previous post or at least one that goes into the main topics. jstatās papers are generally very tractable with examples so that you can follow along.
The paper goes into depth on unit roots and order of integration. Also there is a lengthy discussion on lag selection, as well as use of the various packages. It is actually a write up for the CADF package, but section 5 covers ur.df as well for contrast. Good luck!
I tested my hypothesis creating 3 series, one that was not a time series, one with the ts function, that should be a time series, and other one importing 1000 days from the IBM quote from yahoo.
I evaluated the three of them with adf test, and found that IBM data and the data generated with TS function gave a higher test value, telling me that they are I(1) and also I(0) for lags from 1 to 9, and the series that should not be time series gave me the expected result, showing me a much smaller test value than the 1pct required value.
Now Iām in the way to understand how to use the information obtained from ca.jo to size the positions.
I think that ca.jo gives to me the coefficients, isnāt it?
I think that they are the eigenvectors, but I have some doubts: I donāt know what to do with the ātrendā value, also donāt know what is the Weights W data.
On the example below I have co-integrated series on r=0, r<=1 and r<=2, so I can use the eigenvectors from column 1,2 and 3.
Using the first column I buy 100 shares of security1, sell 52 from security2, sell 440 for security 3 and buy 356 for security4 (or the respective values scaling the max risk)
Is that right?
Thanks again
Test type: trace statistic , with linear trend in cointegration
Eigenvalues (lambda):
[1] 7.845393e-02 6.711211e-02 5.300111e-02 2.669163e-02 5.551115e-17
Values of teststatistic and critical values of test:
test 10pct 5pct 1pct
r <= 3 | 13.61 10.49 12.25 16.26
r <= 2 | 41.00 22.76 25.32 30.45
r <= 1 | 75.94 39.06 42.44 48.45
r = 0 | 117.04 59.14 62.99 70.05
Eigenvectors, normalised to first column:
(These are the cointegration relations)
Adj.Close.l1 Adj.Close.l1 Adj.Close.l1 Adj.Close.l1 trend.l1
Adj.Close.l1 1.0000000 1.00000000 1.0000000 1.0000000 1.0000000
Adj.Close.l1 -0.5283754 5.41397216 -29.4342016 6.3219781 -3.2545624
Adj.Close.l1 -4.4066380 12.40436122 9.8253392 11.6584724 -4.0106937
Adj.Close.l1 3.5658929 6.26066904 1.0843591 -5.2988737 -3.5152390
trend.l1 0.1147336 0.04972291 0.2207489 0.0156572 -0.1350147
Weights W:
(This is the loading matrix)
Adj.Close.l1 Adj.Close.l1 Adj.Close.l1 Adj.Close.l1
Adj.Close.d -0.081819434 -0.0103705224 -0.0058732067 -0.0039185655
Adj.Close.d -0.004313884 0.0000633547 0.0017941945 0.0001134815
Adj.Close.d 0.002268379 -0.0015292304 0.0003956242 -0.0007759662
Adj.Close.d -0.010012370 -0.0033857778 -0.0004904115 0.0052852698
trend.l1
Adj.Close.d 7.347546e-17
Adj.Close.d 1.199676e-18
Adj.Close.d -4.006632e-18
Adj.Close.d 5.248568e-18
The portfolio ratios are correct. To find out whether you should buy or sell you need to look at the current value of the portfolio in relation to its mean.
If you plot the portfolio value over time the meaning of the trend should be clear. Normally youād want a trend close to 0 for equal long and short opportunities. But I imagine you could use weightings with a positive (negative) trend and only trade long (short).
The trend.l1 row is the intercept. You can optionally add this number as a constant to the rest of the formula to zero center the computed spread around zero. Shamanix likely knows more of the underlying math theory so refer to his answer regarding the trend.l1 column.
I donāt know what the weights are either, and I havenāt found any use for them in trading.
I would like to have a trend = 0, but donāt know how to get it with ca.jo. On the Arbomat program the author uses the lm function and it gets trend = 0, but I still donāt know what is doing that code.
Nop, I continue with FX Iām forward testing on FX, but Iām in diversification. Adding cointegration trading with stocks makes sense also, specially if the tool from my broker gives to me that pair trading tool
A problem with FX is that it is difficult to find pairs that have low enough correlation and at the same time low spread and be cointegrated hehehe.
I think my āsearch engineā is working now for stocks. Iām forward testing (out sample) some pairs on a demo account and also on FX (thanks to Arbomat). So far, arbomat is giving good results, but Iām just begining. With the stocks I donāt have any information yet, because they are on daily view, so that could take some weeks, but the stocks pairs that I found have had very long co-integration relationship, more than 4 years of pretty good co-integration.
Thanks FXEZ for your help and thanks to the Arbomat developer and the R developers, R is an excellent open source program!
My understanding was that Arbomat by 7bit is using lm to do a linear regression, e.g. it tries to fit:
EUR-USD = alpha * GBP-USD + beta * AUD-USD + ā¦ (depending on the currencies, or it can take the time difference as well). Iām not great with R, but when he uses lm, he does lm(y ~ x + 0). I believe the +0 gives zero trend.
The docs for ca.jo say you can use the ecdet argument to set zero trend. So when you call ca.jo do something like:
ca.jo(ā¦, ecdet=ānoneā)
Forcing zero trend will likely result in less statistically significant relationships, but the ones you do find may be easier to trade
Let us know how you goā¦ I want to try looking for short term trends (over a week or so) and making day trades based off them. Not sure if it will work, but Iām too impatient for long-term cointegration
Iām with you, Iām testing arbomat and itās partner arbomatEA (modified and fixed by me because the one that is available has a lot of bugs, maybe some of them are purposely there to avoid newbies to use it without learning first)
With FX it is easier to trade short time frames like 15M because the commissions are low and the pairs are traded 24 hours, so Iām testing this on the short time frame, ideally intraday, and on stocks for long term co-integrations, because commissions are pretty high but the daily co-integration relationships lasts months if not years.