From theory to practice - page 336

 
Alexander_K2:

Last post for today.

So. The most burning question that Novaja once asked:

Why convert the current tick flow, which is in fact an Erlang flow, to an exponential flow, to then come back to the same, but already clearly distorted flow???

I agree - a mistake has been made here. One should work with the existing tick flow and perform further transformations over this natural source flow, not artificial one.

So, the algorithm of transformation looks as follows:

1. we take the initial tick stream but read every second tick instead of it - look at the distributions obtained for time intervals and increments.

2. ... every third tick is read - we look at distributions.

3. ...

Until the distribution of time intervals acquires a clear, pronounced Erlang flow that satisfies the formulas for the probability density function, and the distribution of increments gets more and more close to a normal distribution.

That's what I'll do, and I'll let you know the results.

Thank you for your attention.

this is just the beginning of years of painful research, based on hypotheses and with lots of colons

and the conclusions are like a grail...

it's like a grail...

 
Alexander_K2:

Last post for today.

So. The most burning question that Novaja once asked:

Why convert the current tick flow, which is in fact an Erlang flow, to an exponential flow, to then come back to the same, but already clearly distorted flow???

I agree - a mistake has been made here. One should work with the existing tick flow and perform further transformations over this natural source flow, not artificial one.

So, the algorithm of transformation looks as follows:

1. we take the initial tick stream but read every second tick instead - look at the distributions obtained for time intervals and increments.

2. ... every third tick is read - we look at distributions.

3. ...

Until the distribution of time intervals acquires a clear, pronounced Erlang flow that satisfies the formulas for the probability density function, and the distribution of increments gets more and more close to a normal distribution.

That's what I'll do, and I'll let you know the results.

Thank you for your attention.

By reading every 2nd, then 3rd and so on every n-th tick you actually get a chart of the closing price ranges.

And the distributions from this chart I have already filled you in.

At the beginning you will get the central peak decreasing, it will start to blur closer to normal and then the distribution will become bimodal.

To understand the process you need to study it at the edges and the edge measures are that with n=1 you get near to a log-normal distribution and with increasing n closer to n=100 you have a bimodal distribution. This means that the distribution has always been bimodal, it's just that due to discreteness at small n it overlaps each other and the picture is not clear.

So your research is an invention of the bicycle.

 
Yuriy Asaulenko:

Nah, that's no way to sell an elephant.

A_K2 is characterised by a complete lack of a systematic approach and digging into the details. What details when there is no vision of the whole?

In addition.

I sincerely wish that A_K2 would produce something that actually works. However, judging by his posts, it will be a blank shot this time too.

Progress, science and technology have always moved from simple forms (descriptions) to more complex ones. And, it has to be said, already the simple ones worked quite well.

If you've never designed a car, there's no prospect of starting with a Mercedes. You should start with something simple, something like Zhiguli - the principles are the same as for Mercedes, but everything is much simpler. When your Zhigul starts to move, you can improve, modernize, complicate and bring it up to the Mercedes level. Remember, what Korean cars were like 15 years ago - you can't help crying.

It looks like that A_K2 again started to design Mercedes.) It could be built at least Zhigul for 4 previous months - it doesn't need any science to design it, and technical solutions are enough).

 
Alexander_K2:

You have to work with the tick-stream that you have and make further transformations on this natural source stream, not on an artificial one.

I have already written to you about this, but apparently my voice is not enough.
Your "real tics" are something unusual. I don't know what's behind that "DDE", but it's not at all the haphazard rubbish that forex dilling usually gives. At least the ticks are 10 times less frequent than usual, which is already alarming. The first pocket on your histogram should be ~200ms, not a second.

Please put to csv a couple thousand last received ticks, no thinning, no filling of blanks with past values, but only those prices that came. Maybe somebody may use them and confirm, that these values are much more acceptable for trading than usual. And then maybe the collective intelligence in this thread will suggest how to make a grail with them in a dozen lines of mql code.

 
Alexander_K2:

And you don't have to convert anything? I don't believe it!!!!!!!! That's not interesting.

The principle of big data processing is always to work with raw data that has maximum information, but to add different metrics to it, or to shrink it without losing information.

If you start mashing the information, it is obvious that the value and adequacy of such algorithms immediately goes down.

 
Alexander_K2:

But six months ago we discussed that since different brokers have different tick-flows, the first task is to bring them to a single universal view. No?

One does not contradict the other.

A universal view does not mean that it is necessary to bring them to the same tick-flow...
 

Thanks, here is a comparison of the distributions of gains, and autocorrelations for the last 1000 audcad bid values. The top row is your ticks. Bottom row is what's in the terminal. There is a difference, but I cannot tell from the charts what is better. I like that the histogram peak is not abbreviated as in the terminal.

Some stationarity tests:

Your ticks are.

> Box.test(pricesDiff, lag=20, type="Ljung-Box")

        Box-Ljung test

data:  pricesDiff
X-squared = 39.466, df = 20, p-value = 0.005832

> adf.test(pricesDiff, alternative = "stationary")

        Augmented Dickey-Fuller Test

data:  pricesDiff
Dickey-Fuller = -11.556, Lag order = 9, p-value = 0.01
alternative hypothesis: stationary

> kpss.test(pricesDiff)

        KPSS Test for Level Stationarity

data:  pricesDiff
KPSS Level = 0.44326, Truncation lag parameter = 7, p-value = 0.05851


And those in the terminal:

        Box-Ljung test

data:  pricesDiff
X-squared = 29.181, df = 20, p-value = 0.08426

> adf.test(pricesDiff, alternative = "stationary")

        Augmented Dickey-Fuller Test

data:  pricesDiff
Dickey-Fuller = -10.252, Lag order = 9, p-value = 0.01
alternative hypothesis: stationary

> kpss.test(pricesDiff)

        KPSS Test for Level Stationarity

data:  pricesDiff
KPSS Level = 0.3404, Truncation lag parameter = 7, p-value = 0.1


The p-value in your Box-Ljung test is an order of magnitude lower, which is cool.


And most importantly, your ticks are a process with memory, it's very unmarked. I don't know how to express it in numbers, but in my model your ticks are easier to predict than normal ticks.


I'm wondering if there are any other tests to assess predictability?

 

Distances between ticks from Alexander's file 01AUDCAD_Real 14400 (1 second increments)


 
Alexander_K2:

If that's the case, then obviously everyone should just work in a discrete stream of quotes like mine and that's it. Isn't that right?

That's what I thought a month ago too. Since you had a logarithmic distribution (or pascal), I wanted to get one too by thinning to get good. After a couple of weeks and trying "I'll change p to 0.71 instead of 0.72 and it'll be OK". - I never got it right, it's all a tape measure, not a science.

The distribution of price gains, and time pauses are just consequences. The most important thing is to get a stationary non-Markovian process. And the more stationary and non-Markovian - the better. I think this is the first necessary transformation with the requirement of non-Markovian stationarity and it does not matter what distributions it produces.
I have no idea how to achieve this non-Markovian stationarity, but it looks like the right way.

Then for such a thinned series one may try to do a second conversion according to trading strategy. Like achieve a gamut in returnees, as you wanted for your model. Here the transformation depends on your strategy, you can create features and train neuron instead of second thinning.


p.s. -"non-Markovian stationarity" is a purely personal profane name for this property. In science it is probably called differently.

 
Novaja:

Distances between ticks from Alexander's file 01AUDCAD_Real 14400 (discreteness of 1 sec.)

Judging from the chart a couple of pages ago - the peak has shifted from 0 to 1. I guess it depends on the pair traded (audcad and cadjpy)


Alexander_K2:

I think it is a Pascal distribution with r=2, p=0.5, q=0.5

I tried to draw Pascal's distribution with such parameters in R, it did not coincide. But there are other notations instead of r,p,q, maybe I messed up something.

Reason: