Machine learning in trading: theory, models, practice and algo-trading - page 3694

 
Aleksey Nikolayev #:
I have stumbled upon another new area in MO - topological data analysis.
I think there are even examples of its use in trading.
Topology in Finance & Markets (Concepts & Applications) - DayTrading.com
  • www.daytrading.com
We look at the applications of topology in finance, trading, and markets. Specific examples and a coding example.
[Deleted]  
Aleksey Nikolayev #:

I don't understand. Isn't that kind of a triangle named after Renat Akhtyamov? )

Maybe try using logarithms? Like log(xauusd) - log(xaueur) - log(eurusd).

Probably its a triangle :) but the author of the article suggests opening only 2 trades based on this spread, which should reduce the total commission. I will try through logarithms later.
You can also check on lead/lag will be. Maybe it is possible to open only one trade at all.
 
Maxim Dmitrievsky #:

The spread distribution is very strange. Is this how it should be? :)

xauusd - (xaueur * eurusd), hourly.

since 2024 it is almost always > 0

The first summand is the real XAUUSD_bid.

The second summand is the synthetic XAUUSD_bid.


The synthetic bid should be less favourable than the real one. Therefore, it is greater than zero after 2024.

Before 2024 - data mis-synchronisation.

[Deleted]  
fxsaber #:

The first summand is the real XAUUSD_bid.

The second summand is the synthetic XAUUSD_bid.


The synthetic bid should be less favourable than the real one. Therefore, it is greater than zero after 2024.

Before 2024 - data mis-synchronisation.

Where did you get the information about mis-synchronisation?

The benefit in the spread does not seem to be taken into account, but simply the conversion of the value of the asset into dollars.
[Deleted]  
Aleksey Nikolayev #:
I've stumbled upon another new area in MO - topological data analysis.

Yeah, I've seen this approach. But I didn't understand it right away and put it aside :)

 

the spread depends on 1) which liquidity providers the initial stakes are taken from 2) their remap....

the notorious r*v used to make negative spreads even on the same instrument for advertising purposes for his PAMM.

I personally experienced how spreads and their "structure" changed in ti-ill.

history is segmented by change of suppliers (DC bosses/contractors), change of settings and software versions. (this is the nearest time, further on - everything is alien)

from some not so bright moment DC start to broadcast integrated, external history. Which cannot be relied upon for arbitrage and scalping in general

Al*i used to have a great history and held on the longest and gave only his own, but not anymore. (I can't say exactly - due to known problems they had to be abandoned).

[Deleted]  
Maxim Kuznetsov #:

The spread depends on 1) which liquor suppliers the initial stakes are taken from 2) their remap...

the notorious r*v used to make a negative spread even on one instrument for advertising purposes for his pamm.

I personally encountered how spreads and their "structure" changed in ti-ill.

The history is segmented by change of suppliers (DC bosses/contractors), change of settings and software versions (this is the nearest time, further on - everything alien).

from some not so bright moment DC start to broadcast integrated, external history. It is impossible to rely on it for arbitrage and scalping in general

Al*i used to have a great history and held on the longest and gave only his own, but not anymore. (I can't say exactly - due to known problems they had to be abandoned).

Well, they're time-synchronised clockworkers. It's hard to explain such a big difference by changing LP.

Glitches in the quotes would be in the form of individual outliers.

There divergences are 100-150 times the spread in 2023. MQ demo.

 

Maxim Dmitrievsky #:

time synchronised

Just in case. Is it absolutely sure that there are no bar skips on any pair? I recently encountered skips of weekly data on NZDUSD.

[Deleted]  
Aleksey Nikolayev #:

Just in case. Is it absolutely sure that there are no bar skips on any pair? I recently encountered skips of weekly data on NZDUSD.

Yes, rows with at least one Nan value in any of the columns (missing quote for that date) are deleted as meaningless. Accordingly, all remaining entries correspond to specific timestamps.

For example, 3 columns for xauusd, xaueur and eurusd, in the same dataframe with a single datetime index for all. Rows with Nan in at least one column are deleted. And only after that the spread is calculated.

You can put join='inner' instead of 'outer', then only existing values will be combined, then dropna() is not needed. But the general picture is the same.

def get_prices() -> pd.DataFrame:
    # Create an empty DataFrame to store the combined data
    combined_df = pd.DataFrame()
    
    # List of symbols to process
    symbols = [
        hyper_params['symbol1'],
        hyper_params['symbol2'],
        hyper_params['symbol3']
    ]
    
    for symbol in symbols:
        # Read the file for the current symbol
        p = pd.read_csv(f'files/{symbol}.csv', sep='\s+')
        
        # Process data similar to your original function
        temp_df = pd.DataFrame()
        temp_df['time'] = p['<DATE>'] + ' ' + p['<TIME>']
        temp_df['time'] = pd.to_datetime(temp_df['time'], format='mixed')
        temp_df[symbol] = p['<CLOSE>']  # Use symbol name as column name
        
        # Set time as index
        temp_df.set_index('time', inplace=True)
        temp_df.index = pd.to_datetime(temp_df.index, unit='s')
        
        # If this is the first DataFrame, use it as the base
        if combined_df.empty:
            combined_df = temp_df
        else:
            # Otherwise, join with the existing DataFrame on the time index
            combined_df = combined_df.join(temp_df, how='outer')
    combined_df = combined_df.dropna()
    combined_df['close'] = combined_df[hyper_params['symbol1']] - \
        (combined_df[hyper_params['symbol2']] * combined_df[hyper_params['symbol3']])
    return combined_df.drop(columns=[hyper_params['symbol1'], hyper_params['symbol2'], hyper_params['symbol3']])

I have checked all hypothetically doubtful variants, the result is the same:


Even any transformations, like spread calculation, where there is Nan in any one of the source columns, will give Nan. This is the spread without first removing rows with Nan. Essentially, it is impossible to make a mistake.


>>>  dataset
                        close
time                         
2020-01-02 01:00:00       NaN
2020-01-02 02:00:00       NaN
2020-01-02 03:00:00       NaN
2020-01-02 04:00:00       NaN
2020-01-02 05:00:00       NaN
...                       ...
2025-04-17 07:00:00  0.141732
2025-04-17 08:00:00  0.098689
2025-04-17 09:00:00  0.318368
2025-04-17 10:00:00  0.526109
2025-04-17 11:00:00 -0.395750

 
Maxim Dmitrievsky #:

Where'd you get the info about the synchronicity?

It is a logical hypothesis from the provided chart.

The benefit in the spread doesn't seem to be accounted for, just the translation of the asset value into dollars.

I don't know what a spread and a dollar are. There are two XAUUSD_bid. The higher that price is the more favourable.

If close bars were built on ask, you would get a negative spread, not a positive one. Because the real symbol is almost always more profitable to trade/exchange than its synthetic counterpart.