
Measuring Indicator Information
Introduction
Machine learning relies on training data to learn the general behaviour of the market to ultimately make fairly accurate predictions. The chosen learning algorithm has to wade through a carefully selected sample to extract meaningful information . The reason why many fail to successfully apply these sophiscated tools is because, most of the meaningful information is hidden in noisy data. It may not be apparent to many strategy developers that the data sets we use may not be appropriate for model training.
Indicators can be thought of as purveyors of information about the underlying price series they are applied to. Using this premise, entropy can be used to measure how much information is communicated by an indicator. Using steps and tools documented in the book, Testing and Tuning Market Trading Systems (TTMTS) written by Timothy Masters, we will demonstrate how these can be used to evaluate the structure of indicator data.
Why measure Indicator Information
Often when using machine learning tools for strategy development we resort to simply throwing all sorts of data at the algorithms with the hope that something will come of it. Ultimately success will depend on the quality of the predictor(s) used in the model and effective predictors usually have cetain characteristics. One of them being laden with significant information content.
The amount of information in variables used for model training is important but is not the only requirement for effective model training. Therefore, measuring infomation content can be used to screen indicators that would otherwise be blindly used during the training process. This is where the concept of entropy is applied.
Entropy
Entropy has been written about a number of times already on MQL5.com. I apologize to the reader as they will have to endure another definition, but i promise this is essential to understanding the concept's application. Previous articles have provided the history and derivation of the entropy calculation so for the sake of brevity we will begin with the equation.
H(X) means the entropy of X, X is a discrete variable representing an arbitrary variable, say a message. The contents of the message can only assume a finite number of values. This is represented in the equation as small x. Small x are the observed values of the messages, such that, if all the possible values of x were enumerated in a set, N.
Consider an example of a fair dice. The dice, when rolled, can be perceived as providing information, that determines the outcome of a game. The dice has 6 unique sides numbered 1 to 6. The probability of observing anyone of the numbers facing up is 1/6.
Using this example, Big X would be the dice and small x could be any of the numbers painted on the sides of dice. All of which are placed in set N ={ 1,2,3,4,5,6}. Applying the formula the entropy of this dice is 0.7781.
Now consider another dice, which has a manufacturing defect. It has 2 sides with the same number painted on it. For this defective dice the set N of possible values is {1,1,3,4,5,6}. Using the formula again we get an average entropy value of 0.6778 .
Comparing the values we notice that the information content has decreased. Analyzing both die , when the probabilities for observing each possible value are all equal then the entropy equation produces its greatest possible value. Therefore the entropy reaches its maximum average when the probabilities of all the possible values are equal.
If we discard the defective dice for an indicator that produces traditional real numbers as output. Then, X becomes the indicator and small x will be the range of values that the indicator can assume. Before proceeding further we have a problem because the entropy equation deals strictly with discrete variables. Transforming the equation to work with continuous variables is possible but the application of such would be difficult , so it is easier to stick to the realm of discrete numbers.
Calculating the entropy of an indicator
To apply the equation of entropy to continous variables we have to discretize the values of the indicator.This is done by dividing the range of values into intervals of equal size, then counting the number of values that fall into each interval. Using this method the original set that enumerated the maximum range of all values of the indicator is replaced by subsets , each of which are the selected intervals.
When dealing with continuous variables the variation in probabilities of the possible values that can be assumed by the variable becomes significant as it provides an import facet to the application of entropy to indicators.
Going back to the first example of the dice. If we divide the final entropy values of each by the log(N) for each ones respective n. The first die produces 1, whilst the defective die results in 0.87. Dividing the entropy value by the log of the number of values the variable can assume produces a measure that is relative to the variable's theoretical maximum entropy. Referred to as proportional or relative entropy.
It is this value that would be useful in our assessment of indicators as it will point to how close the entropy of the indicator is to its theoretical maximum average. The closer it is to one, the maximum, the better and anything on the other end may hint at an indicator that would be a poor candidate for use in any kind of machine learning endeavour.
The final equation that will be applied shown above and the code is implemented below as a mql5 script which is available for download as an attachement at the end of the article. Using the script we will be able to analyze most indicators.
A script for calculating the entropy of an indicator
The Script is invoked with the following user adjustable parameters:
- TimeFrame - selected timeframe to analyze indicator values on.
- IndicatorType - Here the user can select one of the built in indicators for analysis. To specify a custom indicator , select Custom indicator option and input the indicator's name in the next parameter value.
- CustomIndicatorName - If Custom indicator option selected for the prior parameter , the user has to input the correct name of the indicator here.
- UseDefaults - If set to true the default user inputs hard coded into the indicator will be used.
- IndicatorParameterTypes - this is comma delimited string that lists the data types of the indicator in the right order - An example of possible input, assuming indicator to be analyzed accepts 4 inputs of type double , integer, integer, string respectively, user simply has to input "double , integer, integer, string " , the short form "d , i, i, s " is supported as well where d= double , i=integer and s=string. Enum values are mapped to the integer type.
- IndicatorParameterValues - Like the prior input, this also a comma delimited list of values, for example , using the previous example, "0.5,4,5,string_value". If there is any error in the formating of the parameters for either of IndicatorParameterValues or IndicatorParameterTypes, will result in indicator's default values being used for any specific values that cannot be deciphered or is missing.
Check the experts tab for error messages. Note that there is no need include the name of the indicator here, if a custom indicator is being considered, it must be specified by CustomIndicatorName. - IndicatorBuffer - The user can stipulate which of the indicator buffers to analyze.
- HistoryStart - The start date of the history sample.
- HistorySize - This the number of bars to analyze relative to HistoryStart.
- Intervals - This parameter is used indicate the number intervals that will be created for the the descretization process. The author of TTMTS specifies 20 intervals for a sample size of several thousand with 2 stipulated as a hard minimum value. I have added my own spin on a appropriate value to use here by implementing the possibility to vary the number of intervals relative to the sample size, specifically 51 for every 1000 samples. This option is available if a user inputs any value less that 2. So, to be clear setting Interval to any number less than 2 and the number of intervals used will change according to the number of bars being analyzed.
//--- input parameters input ENUM_TIMEFRAMES Timeframe=0; input ENUM_INDICATOR IndicatorType=IND_BEARS; input string CustomIndicatorName=""; input bool UseDefaults=true; input string IndicatorParameterTypes=""; input string IndicatorParameterValues=""; input int IndicatorBuffer=0; input datetime HistoryStart=D'2023.02.01 04:00'; input int HistorySize=50000; input int Intervals=0; int handle=INVALID_HANDLE; double buffer[]; MqlParam b_params[]; //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ void OnStart() { if(!processParameters(UseDefaults,b_params)) return; int y=10; while(handle==INVALID_HANDLE && y>=0) { y--; handle=IndicatorCreate(Symbol(),Timeframe,IndicatorType,ArraySize(b_params),b_params); } //--- if(handle==INVALID_HANDLE) { Print("Invalid indicator handle, error code: ",GetLastError()); return; } ResetLastError(); //--- if(CopyBuffer(handle,IndicatorBuffer,HistoryStart,HistorySize,buffer)<0) { Print("error copying to buffer, returned error is ",GetLastError()); IndicatorRelease(handle); return; } //--- Print("Entropy of ",(IndicatorType==IND_CUSTOM)?CustomIndicatorName:EnumToString(IndicatorType)," is ",relativeEntroy(Intervals,buffer)); //--- IndicatorRelease(handle); } //+------------------------------------------------------------------+ //+------------------------------------------------------------------+ //| | //+------------------------------------------------------------------+ bool processParameters(bool use_defaults,MqlParam ¶ms[]) { bool custom=(IndicatorType==IND_CUSTOM); string ind_v[],ind_t[]; int types,values; if(use_defaults) types=values=0; else { types=StringSplit(IndicatorParameterTypes,StringGetCharacter(",",0),ind_t); values=StringSplit(IndicatorParameterValues,StringGetCharacter(",",0),ind_v); } int p_size=MathMin(types,values); int values_to_input=ArrayResize(params,(custom)?p_size+1:p_size); if(custom) { params[0].type=TYPE_STRING; params[0].string_value=CustomIndicatorName; } //if(!p_size) // return true; if(use_defaults) return true; int i,z; int max=(custom)?values_to_input-1:values_to_input; for(i=0,z=(custom)?i+1:i; i<max; i++,z++) { if(ind_t[i]=="" || ind_v[i]=="") { Print("Warning: Encountered empty string value, avoid adding comma at end of string parameters"); break; } params[z].type=EnumType(ind_t[i]); switch(params[z].type) { case TYPE_INT: params[z].integer_value=StringToInteger(ind_v[i]); break; case TYPE_DOUBLE: params[z].double_value=StringToDouble(ind_v[i]); break; case TYPE_STRING: params[z].string_value=ind_v[i]; break; default: Print("Error: Unknown specified parameter type"); break; } } return true; } //+------------------------------------------------------------------+ //| | //+------------------------------------------------------------------+ ENUM_DATATYPE EnumType(string type) { StringToLower(type); const ushort firstletter=StringGetCharacter(type,0); switch(firstletter) { case 105: return TYPE_INT; case 100: return TYPE_DOUBLE; case 115: return TYPE_STRING; default: Print("Error: could not parse string to match data type"); return ENUM_DATATYPE(-1); } return ENUM_DATATYPE(-1); } //+------------------------------------------------------------------+
Just a note on the value selected for Intervals: Changing the number of intervals used in the calculation will vary the final entropy value. When conducting analysis it would be wise to have some consistency so as to minimize the effects of the independent input used. In the script the relative entropy calculation is encapsulated in a function defined in the Entropy.mqh file.
The script simply prints the resulting entropy value in the experts tab. Running the script for various built in and custom indicators yields the results shown below. Its interesting to note William's Percent Range has relative entropy close to perfection. Compare it with the Market Facilitation Index indicator which shows a dissapointing result.
With these results we can take further steps to process the data to make it amenable for machine learning algorithms. This involves conducting rigorous analysis of the statistical properties of the indicator. Studying a distribution of the indicator values will reveal any problems with skew and outliers. All of which can degrade model training.
As an example we examine some statistical properties of two indicators analyzed above.
The distribution of Williams's percent range reveals how almost all the values are spread across the entire range, apart from being multi modal the distribution is fairly uniform. Such a distribution is ideal and is reflected in the entropy value.
This is in contrast to the Market Facilitation Index distribution which has a long tail. Such an indicator would be problematic for most learning algorithms and requires transformation of the values. Transforming the values should lead to an improvement in indicator's relative entropy.
Improving the information content of an indicator
It should be stated that alterations that boost indicator entropy should not be seen as a way to improve the accuracy of signals provided by the indicator. Boosting entropy will not turn a useless indicator into the holygrail. Improving entropy is about processing the indicator data for effective use in predictive models .
This option should be considered when the entropy value is hopelessly bad, anything well below 0.5 and closer to zero. Upperbound thresholds are purely abitrary. It is up to the developer to pick an acceptable minimum value. The whole point is to produce a distribution of indicator values that is as close to uniform as possible. The decision to apply a transformation should be based on analysis conducted on a sizeable and representative sample of indicator values.
The transformation applied should not alter the undelying behaviour of the indicator. The transformed indicator should have a similar shape as the raw indicator , for example the location of troughs and peaks should be similar in both series. If this is not the case then we risk losing potentially useful information.
There are numerous methods of transformation that target different aspects of test data imperfections. We will only consider a few simple transformations that look to fix obvious defects revealed through basic statistical analysis. Preprocessing is a vast branch of machine learning. Anyone who hopes to master the application of machine learning methods is advised to gain more knowledge in this field.
To illustrate the effect of some transformations, we present a script that has the option to apply various transformations and also displays the distribution of the data being analyzed. The script implements 6 examples of transformation functions:
- the square root function transform is appropriate for squashing occasional indicator values that deviate considerably from the majority.
- the cube root transform is another squashing function that is best applied indicators with negative values.
- whilst the log transform compresses values to a larger extent than previously mentioned compressing transformations.
- the hyperbolic tangent and logistic transforms should be applied on data values of a suitable scale to avoid problems of producing invalid numbers (nan errors).
- the extreme transform induces extreme uniformity in a data set. It should be applied only to indicators that produce mostly unique values with very few similar figures.
A script to compare transformed indicator values
Compared to the earlier script it contains the same user inputs to specify the indicator to be analyzed. The new inputs are described below:
- DisplayTime - the script displays a graphic of the distribution of the indicator. DisplayTime is an integer value in seconds, which is the amount of time the graphic will be visible before being removed.
- ApplyTransfrom - is a boolean value that sets the mode for the script. When false the script draws the distribution and displays basic statistics of the sample along with the relative entropy. If true is set, it applies a transformation on the raw indicator values and displays the relative entropy values before and after transformation. The distribution of the modified samples with also be drawn as a curve in red.
- Select_transform - is an enumeration providing the transforms described earlier that can be applied to possibly boost the entropy of the indicator.
//+------------------------------------------------------------------+ //| IndicatorAnalysis.mq5 | //| Copyright 2023, MetaQuotes Software Corp. | //| https://www.mql5.com | //+------------------------------------------------------------------+ #property copyright "Copyright 2023, MetaQuotes Software Corp." #property link "https://www.mql5.com" #property version "1.00" #property script_show_inputs #include<Entropy.mqh> //--- input parameters input ENUM_TIMEFRAMES Timeframe=0; input ENUM_INDICATOR IndicatorType=IND_CUSTOM; input string CustomIndicatorName=""; input bool UseDefaults=false; input string IndicatorParameterTypes=""; input string IndicatorParameterValues=""; input int IndicatorBuffer=0; input datetime HistoryStart=D'2023.02.01 04:00';; input int HistorySize=50000; input int DisplayTime=30;//secs to keep graphic visible input bool ApplyTransform=true; input ENUM_TRANSFORM Select_transform=TRANSFORM_LOG;//Select function transform int handle=INVALID_HANDLE; double buffer[]; MqlParam b_params[]; //+------------------------------------------------------------------+ //| Script program start function | //+------------------------------------------------------------------+ void OnStart() { //--- if(!processParameters(UseDefaults,b_params)) return; int y=10; while(handle==INVALID_HANDLE && y>=0) { y--; handle=IndicatorCreate(_Symbol,Timeframe,IndicatorType,ArraySize(b_params),b_params); } //--- if(handle==INVALID_HANDLE) { Print("Invalid indicator handle, error code: ",GetLastError()); return; } ResetLastError(); //--- if(CopyBuffer(handle,IndicatorBuffer,HistoryStart,HistorySize,buffer)<0) { Print("error copying to buffer, returned error is ",GetLastError()); IndicatorRelease(handle); return; } //--- DrawIndicatorDistribution(DisplayTime,ApplyTransform,Select_transform,IndicatorType==IND_CUSTOM?CustomIndicatorName:EnumToString(IndicatorType),buffer); //--- IndicatorRelease(handle); } //+------------------------------------------------------------------+ bool processParameters(bool use_defaults,MqlParam ¶ms[]) { bool custom=(IndicatorType==IND_CUSTOM); string ind_v[],ind_t[]; int types,values; if(use_defaults) types=values=0; else { types=StringSplit(IndicatorParameterTypes,StringGetCharacter(",",0),ind_t); values=StringSplit(IndicatorParameterValues,StringGetCharacter(",",0),ind_v); } int p_size=MathMin(types,values); int values_to_input=ArrayResize(params,(custom)?p_size+1:p_size); if(custom) { params[0].type=TYPE_STRING; params[0].string_value=CustomIndicatorName; } if(use_defaults) return true; int i,z; int max=(custom)?values_to_input-1:values_to_input; for(i=0,z=(custom)?i+1:i; i<max; i++,z++) { if(ind_t[i]=="" || ind_v[i]=="") { Print("Warning: Encountered empty string value, avoid adding comma at end of string parameters"); break; } params[z].type=EnumType(ind_t[i]); switch(params[z].type) { case TYPE_INT: params[z].integer_value=StringToInteger(ind_v[i]); break; case TYPE_DOUBLE: params[z].double_value=StringToDouble(ind_v[i]); break; case TYPE_STRING: params[z].string_value=ind_v[i]; break; default: Print("Error: Unknown specified parameter type"); break; } } return true; } //+------------------------------------------------------------------+ //| | //+------------------------------------------------------------------+ ENUM_DATATYPE EnumType(string type) { StringToLower(type); const ushort firstletter=StringGetCharacter(type,0); switch(firstletter) { case 105: return TYPE_INT; case 100: return TYPE_DOUBLE; case 115: return TYPE_STRING; default: Print("Error: could not parse string to match data type"); return ENUM_DATATYPE(-1); } return ENUM_DATATYPE(-1); } //+------------------------------------------------------------------+
Continuing with the examples we compare application of both square root and cube root transforms.
Both provide an improvement in entropy but that right tail could be problematic, the two transforms applied so far have not been able to deal with it effectively.
The log transform produces an even better entropy value. Still the tails are quite significant. As a last resort we can apply the extreme transform.
Conclusion
We explored the concept of entropy to evaluate the need for indicator values to be transformed before being used in predictive model training.
The concept was implemented in two scripts. Namely EntropyIndicatorAnalyis which prints the relative entropy of a sample in the experts tab. The other script IndicatorAnalysis goes one step further by drawing the distribution of the raw and transformed indicator values along with displaying the before and after relative entropy values.
Whilst the tools may be useful it should be noted they cannot be applied to all kinds of indicators. Generally, arrow based indicators that contain empty values would not be appropriate for the scripts described here. In such cases other encoding techniques would be necessary.
The topic of data transformation is only a subset of possible preprocessing steps that should be considered when building any kind of predictive model. Using such techniques will help in extracting truly unique relationships that could provide the edge needed to beat the markets.
File name | Description |
---|---|
Mql5/Include/Entropy.mqh | include file that contains various definitions for functions used to calculate entropy and also utility functions used by the scripts attached. |
Mql5/Scripts/IndicatorAnalysis.mq5 | a script that displays a graphic showing the distribution of indicator values, along with its entropy. |
Mql5/Scripts/EntropyIndicatorAnalysis | a script that can be used to calculate the entropy of an indicator |





- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets
You agree to website policy and terms of use
...
Intervals - the number of intervals to be sampled. The author of TTMTS specifies 20 intervals for a sample of several thousand people, 2 being an absolute minimum. I have added my own approach to an appropriate value by implementing the ability to vary the number of intervals in relation to the sample size, specifically 51 for 1000 samples. This option is available if the user enters a value less than 2. Thus, by setting Interval to a number less than 2, the number of intervals used will vary according to the number of bars analysed.
"
Question: is this the same thing? If so, why does the number of separators in the range of indicator values depend on the number of bars? What is the logic behind this? So far, I can only assume that this is useful for indicators that have an accumulative component.
If this is not the case, what does the number of divisors in the range correspond to?
The article lacks a table that would clearly show the benefit of transforming indicator values.
To estimate the entropy of a continuous variable, we divide its range of values into equal intervals and count the observations in each interval. The script lets you choose the number ofintervals. If you enter < 2 (or leave the default value), the script triggers its own heuristic: 51 intervals per 1,000 observations, i.e. a number proportional to the sample size. If you enter a value > 2, it uses it as is, so these are not two competing methods. One describes the concept, the other explains how the code chooses the parameter when you don't set it yourself.
When you have too few intervals, you artificially glue the values together. Entropy is underestimated (bias). When you have too many intervals for a small sample, there are a lot of empty bins or bins with 1 observation. The entropy is very noisy (variance). Several automatic rules exist for histograms: Sturges, square root, Freedman-Diaconis, Scott, etc). They all have the same idea: increase resolution when you have more data, because you can then estimate finer probabilities without blowing up the variance.
For 1,000 bars, 51 intervals => 20 points per bin if the distribution were uniform. This ratio (between 15 and 30 obs / class) is a classic compromise that the author has taken from the literature. This has nothing to do with whether or not an indicator is accumulative. The logic is purely statistical. The finesse of the grid is adapted to the amount of information available.
The higher the number of vertical bars in the histogram :the more visible the fine distribution of the indicator,
the more detail (peaks and troughs) the entropy calculation can capture,
but the more data is needed for these frequencies to be stable.
It's true that the article would have benefited from showing, for example, the entropy of the same indicator before and after the transformations. But the demonstration is easy to do yourself. Just tick ApplyTransform=true in the script and read the double output: before / after. The code has deliberately left this part interactive so that everyone can test their own assets and horizons.
The more vertical bars in the histogram:
the more noticeable is the subtlety of the distribution of the index,
the more detail (peaks and troughs) the entropy calculation can capture,
but the more data are needed to make these frequencies stable.
As I understand it, visualisation of indicator scores in a sample through a histogram has nothing to do with the author's data transformation methods. Am I correct?
I am more interested in the question of the training effect of these transformations. I can assume it for neural networks, but not for tree methods.
If I understand correctly, the visualisation of indicator scores in a sample using a histogram has nothing to do with the author's data transformation methods. Am I right?
I'm more interested in the question of the knock-on effect of these transformations. I can assume this for neural networks, but not for tree methods.
The histogram drawn by the script is simply a visual diagnostic tool. It shows how the values of an indicator are distributed in the sample, before or after transformation. The functions sqrt, log, tanh, etc. act on the data. The histogram simply displays the result. The two stages are therefore independent. First, the series is transformed (or not), then its histogram is drawn to see if the entropy has changed.
Transforming an indicator that has become rather monotonous (log, sqrt) often does not change the score. On the other hand, non-monotonic transformations (tanh which saturates) change the order of certain points. This means that certain transformations better prepare the ground for creating non-linear interactions.
The histogram built by the script is simply a visual diagnostic tool. It shows how the indicator values are distributed in the sample before or after transformation. The functions sqrt, log, tanh, etc. affect the data. The histogram simply displays the result. Thus, the two steps are independent. First, the series is transformed (or not), then its histogram is plotted to see if the entropy has changed.
Ok, so I've got the point right now. I was originally thinking of something else.
Transforming an exponent that has become fairly monotonic (log, sqrt) often doesn't change the result. On the other hand, non-monotonic transformations (tanh, saturates) change the order of certain points. This means that certain transformations better prepare the ground for creating nonlinear interactions.
How does this change the order of points? Can you give an example of such a transformation. So far I understood the saying to mean that there were ABC points with eigenvalues in ascending order, and after the transformation the order in ascending order became, alternatively, BAC.
Okay, I get the idea now. I was originally thinking of something else.
How does it change the order of the points? Can you give an example of such a transformation? So far I've understood that there are points ABC with eigenvalues in ascending order, and that after the transformation the ascending order has become, alternately, BAC.
The functions in the script (root, log, tanh, etc.) are all monotonically increasing. They all retain the order of the points. My previous sentence was ambiguous. The order only changes from ABC to BAC if a non-monotonic transformation is used. For example, unless I'm mistaken, the function f(x)=∣x-50∣ is not monotonic because it folds the axis around x= 50. Therefore, the order becomes BAC. In his introduction the author of the article points us to the book"Testing and Tuning Market Trading Systems (TTMTS) by Timothy Masters". As far as I'm concerned, I plan to get my hands on it because, like you, I still have a few questions. What's more, I'm not very familiar with learning models or neural networks. Apparently, it's fairly easy to find on online sales sites. Its content should be of great help to us because the article, however interesting it may be, is undoubtedly incomplete and/or in any case rather synthetic.