Bayesian regression - Has anyone made an EA using this algorithm? - page 46

 
Yuri Evseenkov:

To an attempt to apply Bayesian formula. Again.

Task. Using Bayes' theorem, determine which value of a tick that has not yet arrived is most likely.

Given. Time series x,y.

y=ax+b A line from the last tick to the future.

P(a,b|x,y)=P(x,y|a,b)*P(a)*P(b)/P(x,y); (1) Bayes formula.

P(a,b|x,y)is the probability that the coefficients a and b correspond to the x and y coordinates of a future tick.

We need to find such a and b that this probability (or more correctly,a probability measure) is maximal.

P(x,y|a,b) - let's take the real histogram of ticks distribution by price levels as a likelihood function. The function is defined by a two-dimensional array (matrix): price range - probability, percentage ratio of ticks falling within this range to the total number of ticks.

P(b) - normal distribution of increments is taken as an a priori probability b. PRNG with the normally distributed value is used.

P(a) coefficient a determines the slope of the straight line and the sign of the predicted increment. So far I'm thinking of using the linear regression code I posted earlier. I.e. take the probability of the coefficient a found there as unity. And in (1) substitute the probability P(a) calculated by taking into account the difference of this a and the calculated for the given y.

Perhaps you have some thoughts on how the incremental sign of each tick behaves?


Sketched out 2 indicators, working on ticks. The first defines A and B for the linear regression (for bid and ask separately, just in case), and draws a line. The second is a histogram of tick values (red - Ask, blue - Bid). Each bar of tick charts is a new tick, they do not coincide with bars of the chart itself.

This is all that I have understood from the post. What's next? If I understand the logic, I will finish it.


 
Dr.Trader:

Using ticks for prediction is dangerous in my opinion, and the model should be set up for each broker separately.

About the actual ticks - they can be very different from broker to broker.

I agree. I wrote above. I will say it again.

Forex is a lot of brokerage companies, forex companies, kitchens - European, Chinese, Bahamian, Bermuda ... There are a lot of them. None of them dominates and does not make a decisive contribution to price formation, and neither does any player on the market. The assumption is based on the Central Limit theorem of probability theory (CLT) :

"The sum of a sufficiently large number of weakly dependent random variables having approximately the same magnitude (no single summand dominates, no determinative contribution to the sum) has a distribution that is close to normal."(Wikipedia)

As I understand it in relation to forex. If we collect all ticks of ALL brokerage companies in one M5 bar (millions of ticks) then ticks distribution inside the bar will be close to a normal. And the older is the timeframe, the closer it is. Each particular brokerage company has its own quotes flow that differs from the dominating global flow by the measure of deprecation of this brokerage company. This dominating flow on the chart is a curve (certainly not straight!) from which no brokerage company can go far.

Here in this thread people have reacted skeptically to the wikipedia formulation of CDT. It also seems incorrect to me. Although I've encountered this formulation in other sources since then. Even the MQL4 Reference has an example of the indicator according to this wording.

Thus, I think the TPT rules on a sufficiently large number of increments in the period of the weak influence of fundamental factors.

 
Dr.Trader:

Sketched out 2 indicators, working on ticks. The first one defines A and B for linear regression (for bid and ask separately, just in case), and draws a line. The second is a histogram of tick values (red - Ask, blue - Bid). Each bar of tick charts is a new tick, they do not coincide with bars of the chart itself.

This is all that I have understood from the post. What's next? If I understand the logic, I will finish it.


I want to calculate the probabilities using Bayes formula. Linear regression and the found coefficient a only apply to calculate the a priori probability P(a).
 

Suppose there is a reference price, which is given by liquidity providers, and the quotations of brokers just bounce around it. In that case each broker's quote will dance in some range around the "main price", forming a kind of dome on the histogram. If you add up the dome histograms, you end up with something resembling a normal distribution in shape, I agree.

But it still doesn't work for us, we are working with quotes from one particular broker, and a normal distribution is unlikely. I've been watching the histogram for a while, my broker gives me a maximum of 4000 ticks (that's about 20 minutes), I use them all for the histogram. It's more like a half-elipse lying on its side. If the price starts to rise/fall - the elipse grows a thin side, but eventually it takes its shape again. But sometimes there are two peaks. You can try to describe this average figure by some other distribution, and use it in calculations (and not Gaussian). If you make a histogram on a small number of ticks, for example a hundred, it's just a shapeless constantly jumping distribution, I don't think it will work, you need a thousand or more ticks.

The histogram in the picture here has the right third as a result of rapid price changes, then the whole thing should take the form of the left two thirds.

 
Dr.Trader:

Suppose there is a reference price, which is given by liquidity providers, and the quotations of brokers just bounce around it. In that case each broker's quote will dance in some range around the "main price", forming a kind of dome on the histogram. If you add up the dome histograms, you end up with something resembling a normal distribution in shape, I agree.

But it still doesn't work for us, we are working with quotes from one particular broker, and a normal distribution is unlikely.

That is another question. It concerns the practical application.

In formula (1) probability function P(x,y|a,b) is a real histogram of real ticks of a real concrete broker. For example the probability P(x,y|a,b)=0.12 if 12% of all ticks in the window fall into y(price)+range(set). I build the histogram in the profile.



Then there are the correction multipliers, the a priori probabilities P(a) and P(b). So, as P(b) I have chosen a normal distribution of PRIRATE prices. Why, it is written in the previous posts.

 

I read the document in the first post, it doesn't come out well at all.

Couldn't get through many of the formulas, so here's just a free paraphrase. The author has a bid and ask price for bitcoin for half a year, with an interval of 10 seconds. He makes a program (classifier) that will take current prices, and return three signals - to buy, to sell, and just to hold open positions. The forecast is made for 10 seconds ahead. Every 10 seconds, the program should receive new data and count them all again. The initial data is divided into several vectors and it uses these vectors for price forecasting. The classifier takes three arrays of data - one for last 30 minutes, second for last 60 minutes, third for last 120 minutes (each array is still the price with 10 second interval). I don't get it any further. The formulas are very similar to neuronics, i.e. every input value corresponds to some weight. But these weights are applied to three arrays at once. And then suddenly it turns out that weights cannot be found (but this is a neuron, isn't it?) and we have to try all variants. Empirically the author deduced some formula, which should help with optimization of weights, rejecting what is obviously not suitable, and somewhere there is used Bayesian regression. Also the result of the regression is probably used as an input value for the classifier.

It looks to me like a student's term paper done a couple of nights before he/she is due. There are no earning prufs :)

Although Bayesian regression is used, it's there as a small part of some complex system. Maybe due to optimized weights its influence is reduced to zero. I might as well put a random number generator into neuronics or Mayan calendar, anyway their influence will be reduced to zero during optimization.

 
I couldn't get through the English-language first post. I'm trying to understand the examples of Bayesian theorem in other areas. And I'm just trying to calculate the probabilities of occurrence of this or that price using Bayes formula. And normal distribution is not a necessary attribute at all. It's just one of the hypotheses of one of the a priori probabilities so far.
 

I came across two articles on the subject of the branch - may be useful to enthusiasts

Article 1.

Bayesian regression with STAN: Part 1 normal regression

Article 2

Bayesian regression with STAN Part 2: Beyond normality

Each article is an advertisement for two books with the same title

Bayesian regression with STAN: Part 1 normal regression
Bayesian regression with STAN: Part 1 normal regression
  • Lionel Hertzog
  • datascienceplus.com
This post will introduce you to bayesian regression in R, see the reference list at the end of the post for further information concerning this very broad topic. Bayesian regression Bayesian statistics turn around the Bayes theorem, which in a regression context is the following: $$ P(\theta|Data) \propto P(Data|\theta) \times P(\theta) $$...
 

You - engineers, physicists, radio operators - are so weird.....

Many times I've told you here that quants, alt-traders, market-makers - they're not idiots, that they are GOOD at maths, that they're not paid SOFTLY at 100K+ a year + bonus, but you all seem not to get it.

The price in the stock market is an expression of a CONNECTED system, so any useful (marginally adequate) price model CANNOT be simple. Yes, there can be Bayesian regression inside, but only as an auxiliary numerical method. And you're throwing a herd at "this tailrex of yours, we'll trample it here with Bayesian method alone!

Well, maybe this will get through to you: the list of mathematical methods actively used in trading by big market makers, banks and hedge funds. This list is also divided by sub-specialties, i.e. by types of traded financial instruments and by types of forecasting in banks. This list was put out by a former senior Citi and JPMorgan employee. The list is not secret, you can figure it out from 5-10 books on financial maths (in English). But on the Russian forum, and even in such a complete form - the list is rare.

Data Scientist, Statistician
25000 USD
Job description
PROFESSIONAL REQUIREMENTS (we value the desire to learn on the go the most)

Advanced knowledge of statistics and time series: Stochastic processesTools: SSA/SVD, RSSA, FIMA/ARFIMA, Nonlinear Autoregressive Exogenous Model (NARX), (N)GARCH and its derivatives, Hurst Exponent and its applications, Recurrence quantification analysis (RQA)
Programming experience (or ready to learn) in Python (and set of libraries for all things statistical)
Data analysis libraries in Python (theano, keras, Torch, Pandas, NumPy, scikit-learn) or their equivalents in R
some experience with Machine Learning, collaborative filtering, cluster analysis, Graphs Theory
Other blended approaches: ANFIS (adaptive network-based fuzzy inference system)
Neural Networks: unsupervised learning: RNN (Recurrent Neural Networks), FNN, RBF, etc.

TASKS & ENVIRONMENT:
statistical analysis of financial data, econometric applications

Moderators: please delete accidentally no-fast-deleted link to the original source, I can not, it still hangs in your database forum in the basement of the post. Otherwise they'll think it's advertising. Thank you.

Все вакансии компании TenViz LLC
Все вакансии компании TenViz LLC
  • www.work.ua
New York, US, based product (services) company with growing sales on 3 continents and 25−35 employees and contractors. We have a growing team in Ukraine, and are hiring full- and part-time people with...
 
Machine Learning, Neural Networks analyst
28000 UAH

Job Description

PROFESSIONAL REQUIREMENTS (we value the desire to learn on the go the most):

Knowledge of Artificial Neural Networks and Machine Learning, including:

FeedForward Neural Network (FNN)
Recurrent Neural Networks (RNN) family: including Long-Short Term Memory Model (LSTM)
CNN - Convolutional Neural Networks
Radial Basis Function (RBF)
blended approaches ANFIS (Adaptive Network-based Fuzzy Inference System)
Programming experience (or ready to learn) in Python (and set of libraries for all things statistical)
Machine Learning and data analysis libraries in Python (theano, keras, Torch, Pandas, NumPy, scikit-learn)
additionally, good knowledge of R and/or Matlab would be helpful

RELATED AREAS OF KNOWLEDGE:

advanced knowledge of Statistics and time series (Stochastic processes and Tools): ARFIMA, Nonlinear Autoregressive Exogenous Model (NARX), Wavelet Transforms
spectrum estimation models - Singular Spectrum Analysis (SSA) (SVD)
Collaborative filtering, Cluster analysis, Graphs Theory

TASKS (prioritized order):

statistical analysis of financial data, econometric applications
building services and frameworks for interactive distributed query processing over large volumes of financial market data
Reason: