What to feed to the input of the neural network? Your ideas... - page 67

 

How do trees work in machine learning?

Деревья в машинном обучении, в частности, решающие деревья, представляют собой структуру, которая используется для классификации и регрессии. Основная идея заключается в том, чтобы разбить набор данных на более мелкие подмножества, основываясь на значениях входных признаков. Вот основные моменты, как работают деревья:

### 1. Структура дерева
- **Вершины**: Каждая вершина дерева представляет собой узел, где происходит разделение данных.
- **Корень**: Это верхний узел дерева, откуда начинается процесс разделения.
- **Листовые узлы**: Узлы, которые не имеют дочерних узлов, представляют собой конечные результаты (классификация или предсказание).

### 2. Процесс построения дерева
- **Выбор признака**: На каждом узле выбирается признак, по которому будет происходить разделение. Выбор признака осуществляется на основе критерия, который позволяет максимизировать однородность подмножеств (например, критерий Джини или энтропия для классификации).
- **Разделение данных**: Данные делятся на две или более группы в зависимости от значения выбранного признака.
- **Рекурсия**: Процесс повторяется для каждой полученной группы, пока не будет достигнуто определенное условие остановки (например, максимальная глубина дерева или минимальное количество образцов в узле).

### 3. Оценка качества
- **Критерии**: Для оценки качества разделения используются различные метрики, такие как:
  - **Джини**: Мера, которая вычисляет вероятность неправильной классификации случайно выбранного элемента.
  - **Энтропия**: Мера неопределенности, которая показывает, насколько данные в узле разнородны.


### 4. Устойчивость к переобучению
- **Переобучение**: Деревья могут легко переобучаться, особенно если они слишком глубокие. Для борьбы с этим применяются методы, такие как обрезка (pruning), которая удаляет менее значимые узлы.
- **Регуляризация**: Установка ограничений на максимальную глубину дерева или минимальное количество образцов в узле также помогает избежать переобучения.

### 5. Применение
- **Классификация**: Деревья могут использоваться для классификации объектов на основе их признаков.
- **Регрессия**: Деревья могут предсказывать непрерывные значения, разбивая данные на подмножества и вычисляя среднее значение в каждом листовом узле.

### 6. Ансамблевые методы
- **Случайные леса**: Это метод, который объединяет множество деревьев для улучшения точности и устойчивости модели.
- **Градиентный бустинг**: Метод, который строит деревья последовательно, где каждое новое дерево исправляет ошибки предыдущих.

Деревья являются мощным инструментом в машинном обучении благодаря своей интерпретируемости и способности обрабатывать как числовые, так и категориальные данные.

Above blue - iterativity of the process, green - evaluation of the process.

Below is more about iteration and evaluation, I'm too lazy to highlight colours.

Is splitting into leaves an iterative process or not?

Yes, partitioning into leaves in solving trees is an iterative process. Let's take a closer look at how it happens:

### Partitioning Process

1. **Selection of node for partitioning**:
- Starts at the root of the tree where all the data resides.
- At each node, a feature and value is selected by which to partition. This is done using criteria such as entropy reduction or Gini criterion.

2. **Data partitioning**:
- Data is divided into two or more groups depending on the selected feature and its value.
- Each group becomes a new node (or subtree) for further partitioning.

3. **Recursive Partitioning**:
- The process is repeated for each new node. At each step, a new feature is selected and the data is again partitioned.
- This continues until a stopping condition is reached, e.g.:
- Maximum tree depth.
- Minimum number of samples per node.
- No meaningful improvement in the quality of the partitioning.

### Iterative nature

- **Iteration**: Each partitioning step is an iteration that results in the creation of new nodes and subsets of data.
- **Recursion**: The partitioning process can be thought of as recursive, where each node is treated separately and the same process is repeated for each node.

### Completion of the process

- When one of the stopping conditions is reached, the node becomes a leaf node and a final value is defined for it (e.g., class for classification or mean for regression).

Thus, partitioning into leaves in decision trees is indeed an iterative and recursive process that allows for successive refinements

What is the evaluation criterion during tree construction and what optimisation algorithms are used in this process?

When building decision trees, several evaluation criteria are used to help determine how best to partition the data at each node. Basic criteria include:

### Evaluation Criteria

1. **Entropy**:
- Used in the ID3 algorithm.
- Measures uncertainty or randomness in the data. The lower the entropy, the more homogeneous the data.
- Formula:
\[
H(S) = -\sum_{i=1}^{C} p_i \log_2(p_i)
\]
where \( p_i \) is the probability of class \( i \) and \( C \) is the number of classes.

2. **Gini Criterion**:
- Used in CART (Classification and Regression Trees).
- Measures the probability of misclassifying a randomly selected item.
- Formula:
\[
Gini(S) = 1 - \sum_{i=1}^{C} p_i^2
\]

3. **Cross-entropy (logarithmic loss)**:
- Used in some variations of decision trees, especially for classification tasks.
- Estimates the difference between predicted probabilities and actual class labels.

4. ** Mean Squared Error (MSE)**:
- Used for regression problems.
- Estimates the mean of error squares between predicted and actual values.

### Optimisation Algorithms

Various optimisation algorithms and methods are also used in the construction of decision trees:

1. **The greedy algorithm**:
- At each node, the best feature and value for partitioning is selected based on the selected evaluation criterion.
- This process continues until stopping conditions are reached.

2. **Building a tree using random subsets**:
- The Random Forest algorithm uses a "bagging" method where multiple trees are built on random subsets of data and features.
- This helps to reduce overtraining and increase the generalisability of the model.

3. **Tree pruning**:
- After the tree is constructed, a pruning procedure can be performed to reduce the complexity of the tree and prevent overtraining.
- Pruning can be done based on cross validation or using a criterion such as error on the validation set.

4. **Hyperparameter Optimisation**:
- Methods such as Grid Search or Random Search are used to find optimal values of hyperparameters such as tree depth, minimum number of samples per node, etc.

These criteria and optimisation methods help to build efficient and robust models based on decision trees.

Everything that was denied (estimation, iterability, presence of an optimisation algorithm) is present in tree construction.


ZY. Articles on the topic #trees.

 

ChatGPT will leave the world without people trying to figure things out on their own.
Figure out the code from the article. And such questions will not arise. You are a programmer and can do it. Or is that it? Has GPT chat taken the place of a source of true knowledge?

You have seen iterationin recursion. No - it's not processing all the data over and over again (like in humans iterating or in NS recalculating all the weights at each epoch of learning). But each time the data set is divided e.g. in half, each half is divided in half and so on, until it is impossible to divide (1 example left or all examples are the same). This is spreading the data over the leaves in 1 pass. It works very fast.

This is selecting the best split, but not evaluating the data itself. That's what I wrote about earlier. You want to evaluate the data for the record to call memorisation.

 
Forester #:

ChatGPT will leave the world without people trying to figure things out on their own.
Figure out the code from the article. And such questions will not arise. You are a programmer and can do it. Or is that it? Has GPT chat taken the place of a source of true knowledge?

You have seen iterationin recursion. No - it's not processing all the data over and over again (like in humans iterating or in NS recalculating all the weights at each epoch of learning). But each time the data set is divided e.g. in half, each half is divided in half and so on, until it is impossible to divide (1 example left or all examples are the same). This is spreading the data over the leaves in 1 pass. It works very fast.

This is selecting the best split, but not evaluating the data itself. That's what I wrote about earlier. You want to evaluate the data for the record to call memorisation.

There is no time to type a large post manually, gpt is quite good for that.

Look carefully, please, at least look at the code you offer me to look at. Figure out where the iterations are in it, where the estimation is, and where the optimisation algorithm is. Your denial will lead nowhere.

Recursion is iterations.

 
1.You will not normalise the dataset values so that they are similar in structure to further inputs 2.Even if you can do it, it will be white noise and no neural network will work correctly 3.Only a network like GPT with thousands of neurons and architectures designed exclusively for trading will be able to give more or less accurate forecasts and adapt itself to white noise. And this is a separate server with huge capacities.
 
That's right.
 
Forester #:

...



here.
 
I continue creative experiments.

I have drawn a mental picture, I want to try to create price casts and use them in trading.

Each cast is like a certain huge pattern, which is conditionally projected into the future.

 

Since all NS can only recognise kittens, tumours and play Dota, they are unable to recognise a trading strategy, because this task is not for dumb networks.

As a result, "generalisation" turns into "averaging", when the result of NS is different kinds of fits with different perversions.

Turn a cat upside down and it is still the same cat.
Turn a price chart upside down and it is no longer a BUY, but a SELL.



In the end, I stand by my opinion: if you want to adjust, you should adjust it specifically.

1. Either a Q-table, where each historical pattern is assigned a buy or a sell based on statistics

2. Or we filter the input (or output number of the NS) over the whole range: some places - signal to open, some - ignore.



The second option is the easiest implementation of MLP replacement: instead of tens and hundreds of optimisable parameters of weights, you can optimise the range of working number.

So I did, turning optimisation into a fierce retraining, which sometimes results in something working on the forward. And even in this direction it is already possible to work, to pick, to continue searching.




 

Recent observations:


There are two types of input data:

1) Time sequence - homogeneous input data in chronological order: prices, indicator readings, patterns.

2) Diverse - single, but the most recent readings of different instruments: indicators, patterns.


The first variant is the worst. The deeper we delve into history, the worse the results. It seems to be a paradox, if we compare it with successful traders, who run their eyes deep into history.

The second variant is the best.


The first variant cannot be trained in any way. It would seem that more chart - more information - better results.
But in practice everything is exactly the opposite.

Further - my hypothetical justification of this phenomenon:

Price has a pattern. Objective, technical - it is volatility. Yes, it allows us to assume with certainty that in 10 cases out of 10, in 10 cases at least the price will not reach the 0-value. In 10 cases out of 10, the euro price will not pass 5000 pips in a 5-minute bar. Force majeure will spoil the absolute picture a little bit.
But there is still an element of chance: in this very volatility there are ranges of free price wandering. And, the price during the bar is in this average range.

And here we can say with some certainty: the price on the next candle will be in the range from now to now, slightly above the current high and slightly below the current low, because the price tends to move directionally.

So, what if we move 1 bar back? Which assumption would be correct? Yes, indeed: the price after 1 bar will have already 4 times the candlestick possible range.

What if we move 10 bars back? What will the price be in 10 bars? The range of possible values increases many times. It is impossible to guess.


And this phenomenon, I think, affects the frankly shitty results of the NS: non-freedom of forecasting from the past is superimposed on the overall performance - it worsens.
This is confirmed by the rule of practice: the more inputs - the worse.
You can also check it: input a pair of fresh inputs separately and 10 previous ones - also separately. The results of the first one will be much more stable.
The most recent inputs are "dwarfed" by the outdated ones in the general boiler of inputs, where they most often show overtraining and absolute randomness on the forward.

You can parry: nobody feeds only the past ones, they make up the whole pattern in the general boiler. And the incoming array should be considered as an independent pattern.

But statistics has shown that any single pattern consisting of a chronological sequence tends to work out 50/50. That is, after it the price continues to curve as it likes.



But thesecond variant is a beauty. This beauty not only has the main property of the predicate - freshness of data, but also can potentially realise the functionality of chronology and patterns:

- the whole chronology of the chart can be realised by one number and made one input

For example: the ratio of the current price to the last N candlesticks.

Or the same chronological sequence, but with a mandatory relation to the most recent data: if it is a price - then the reflection of the increment of the most recent price with the rest.
And then the "dead" unviable chronology starts to come to life.

 
Ivan Butko #:


For example: the ratio of the current price to the last N candles.

Or submit all the same chronological sequence, but with the obligatory relation to the most recent data: if it is the price - the reflection of the increment of the most recent price with the rest.
And then the "dead" unviable chronology starts to come to life.

How did you do it before? Did you submit absolute price values for training? Like 1.14241, 1.14248.

What you described is relative prices. You can make the difference (delta) of the current price to other bars or ratio, as you described.

I have always trained on deltas. The result is the same...