Machine learning in trading: theory, models, practice and algo-trading - page 2876

 
Aleksey Nikolayev #:

Maybe ask for a simple example of using logistic regression in a classification task with a variable number of features?

If the number of features is always different, it may be difficult to apply traditional machine learning algorithms that expect a fixed-size feature vector as input. One solution to this problem is to use a method called "feature hashing," which allows you to represent a variable-length feature set as a fixed-size feature vector.

Feature hashing works by applying a hash function to the features in the feature set and using the hash values as indices in a fixed-size feature vector. This allows you to map a variable-length feature set to a fixed-size vector, which can then be used as input to a machine learning model.

For example, suppose you have a dataset with a variable number of features, and you want to use logistic regression to classify the data. You can apply feature hashing to the features in the dataset, resulting in a fixed-size feature vector that can be used as input to the logistic regression model.

Overall, feature hashing is a useful technique for handling datasets with a variable number of features, and can be used in combination with machine learning algorithms such as logistic regression to solve classification problems.

 
Aleksey Nikolayev #:

1) Is a model with a non-fixed number of features used for itself? (Apparently yes.)

2) Ask for a general form of a recursive function that maps a number to a numeric list of arbitrary length

1 about itself is somehow reluctant to answer, but there is a transformer there, so yes

2 a general view of a recursive function that maps a number to a numeric list of arbitrary length

A recursive function is a function that calls itself as part of its execution. Here is a general example of a recursive function that maps a number to a numeric list of arbitrary length:

def map_to_list(n: int, lst: List[int]) -> List[int]:
  if n == 0:
    return lst
  else:
    lst.append(n)
    return map_to_list(n-1, lst)

This function takes an integer n and a list lst as input. It then checks if n is equal to 0. If it is, the function returns the list lst . If n is not equal to 0, the function appends n to the end of the list lst and then calls itself with the modified lst and the value n-1 . This process continues until n is equal to 0, at which point the function returns the final version of the list.

This is just one example of how a recursive function can be used to map a number to a list. There are many other ways to write such a function, and the specific implementation will depend on the specific requirements and constraints of the problem you are trying to solve.


- show this code in R language

Here is the recursive function from the previous example, written in R:

map_to_list <- function(n, lst) {
  if (n == 0) {
    return(lst)
  } else {
    lst <- c(lst, n)
    return(map_to_list(n-1, lst))
  }
}

This function is similar to the previous example, but it uses R syntax and functions. The if statement is used to check if n is equal to 0, and the return function is used to return the final version of the list. The c function is used to append n to the end of the list lst .

As before, this is just one example of how a recursive function can be used to map a number to a list. The specific implementation will depend on the specific requirements and constraints of the problem you are trying to solve.

 
Maxim Dmitrievsky #:

Feature hashing works by applying a hash function to the features in the feature set and using the hash values as indices in a fixed-size feature vector. This allows you to map a variable-length feature set to a fixed-size vector, which can then be used as input to a machine learning model.

Let him then tell you what variants of hash functions are used for numeric vectors.

 
Maxim Dmitrievsky #:

If the number of features is always different, it may be difficult to apply traditional machine learning algorithms that expect a fixed-size feature vector as input. One solution to this problem is to use a method called "feature hashing," which allows you to represent a variable-length feature set as a fixed-size feature vector.

Feature hashing works by applying a hash function to the features in the feature set and using the hash values as indices in a fixed-size feature vector. This allows you to map a variable-length feature set to a fixed-size vector, which can then be used as input to a machine learning model.

For example, suppose you have a dataset with a variable number of features, and you want to use logistic regression to classify the data. You can apply feature hashing to the features in the dataset, resulting in a fixed-size feature vector that can be used as input to the logistic regression model.

Overall, feature hashing is a useful technique for handling datasets with a variable number of features, and can be used in combination with machine learning algorithms such as logistic regression to solve classification problems.

upd


For data with a variable number of features, some possible representations you can use include:

  1. Fixed-size feature vectors: As mentioned before, you can use feature selection techniques to select a fixed number of features from your data and represent it as a feature vector.

  2. Sparse feature vectors: Instead of using a dense feature vector, you can use a sparse feature vector, which only includes a subset of the possible features. This can be useful if you have a very large number of possible features and only a small number of them are present in any given instance.

  3. Graph data: You can represent your data as a graph, with each instance represented as a node in the graph and the edges between nodes representing the relationships between instances. This can be useful if you are using a machine learning model that is designed to handle graph data, such as a graph convolutional network.

Again, the best representation will depend on the specific requirements and constraints of your problem. It's important to consider the characteristics of your data and the goals of your project when deciding which representation is best for your needs.

 
Aleksey Nikolayev #:

Let him then tell what variants of hash functions are used for numeric vectors.

Here is a list of some common hash functions that can be used to map integers or other data types to lists or other data structures:

  1. Modular hashing: Modular hashing is a simple hash function that maps an integer or other data type to a list or other data structure by taking the modulo of the value at different powers of a prime number.

  2. Polynomial hashing: Polynomial hashing is a hash function that maps an integer or other data type to a list or other data structure by evaluating a polynomial at a specific point. The polynomial is defined by the value and a prime number.

  3. Universal hashing: Universal hashing is a hash function that maps an integer or other data type to a list or other data structure using a randomly generated hash function. The hash function is chosen from a family of hash functions that have a low probability of collision.

  4. Hash tables: Hash tables are data structures that use a hash function to map keys to values. They can be used to map an integer or other data type to a list or other data structure.

  5. Hash maps: Hash maps are data structures that use a hash function to map keys to values. They can be used to map an integer or other data type to a list or other data structure.

  6. Hash sets: Hash sets are data structures that use a hash function to store a set of values. They can be used to map an integer or other data type to a list or other data structure.

These are just a few examples of hash functions and data structures that can be used to map integers or other data types to lists or other data structures. The specific hash function or data structure that you choose will depend on the characteristics of your data


Here are examples of the modular hashing, polynomial hashing, universal hashing, hash table, hash map, and hash set functions in R:

modular_hash <- function(n, length, p) {
  lst <- vector("numeric", length)
  for (i in 1:length) {
    lst[i] <- n %% p^(i+1)
  }
  lst
}

polynomial_hash <- function(n, length, p) {
  lst <- vector("numeric", length)
  for (i in 1:length) {
    lst[i] <- sum(n * p^(i-1)) %% p^(length+1)
  }
  lst
}

universal_hash <- function(n, length, p) {
  a <- sample(p^length, length, replace=TRUE)
  b <- sample(p^length, length, replace=TRUE)
  lst <- vector("numeric", length)
  for (i in 1:length) {
    lst[i] <- sum(a[i] * n + b[i]) %% p^(length+1)
  }
  lst
}

hash_table <- function(n, length, p) {
  lst <- vector("list", p^length)
 
Maxim Dmitrievsky #:

2 a general view of a recursive function that maps a number to a numeric list of arbitrary length

On the contrary - the input is a list, the output is a number. Like, we make a forecast based on quotes).

Then ask what MO algorithms can be used to build such functions.

 
Aleksey Nikolayev #:

On the contrary - input is a list, output is a number. Like, we use the quotes to make a prediction)

Then ask what MO algorithms can be used to build such functions.

If you want to hash a list of different length to an integer, you can use a hash function that maps the list to an integer. One way to do this is to use a modular hash function, which maps the list to an integer by taking the modulo of the sum of the sum of the elements of the list at different powers of a prime number.

def modular_hash(lst: List[int], p: int) -> int:
  h = 0
  for i, x in enumerate(lst):
    h += x * p**i
  return h % M

This function takes a list lst and a prime number p as input, and returns an integer hash value by summing the elements of the list at different powers of p and taking the modulo of the result.

For example, if we call modular_hash([1, 2, 3], 2) , the function will return 7 .

You can use a similar approach in R as well. Here is the code for a modular hash function in R that maps a list to an integer:

modular_hash <- function(lst, p, M) {
  h <- 0
  for (i in 1:length(lst)) {
    h <- h + lst[i] * p^(i-1)
  }
  h %% M
}
polynomial_hash <- function(lst, p, M) {
  h <- 0
  for (i in 1:length(lst)) {
    h <- h + lst[i] * p^(i-1)
  }
  h %% M
}

universal_hash <- function(lst, p, M) {
  a <- sample(p^length(lst), length(lst), replace=TRUE)
  b <- sample(p^length(lst), length(lst), replace=TRUE)
  h <- 0
  for (i in 1:length(lst)) {
    h <- h + a[i] * lst[i] + b[i]
  }
  h %% M
}


Hash functions are not typically generated using machine learning algorithms. Rather, they are typically designed based on the characteristics of the data being hashed and the requirements of the application.

That being said, machine learning algorithms can be used to evaluate the performance of different hash functions, or to optimize the parameters of a hash function for a specific task. For example, you might use a machine learning algorithm to optimise the prime number used in a modular hash function, or to choose the coefficients of a polynomial hash function that performs well on a particular dataset.

Overall, while machine learning algorithms are not typically used to build hash functions, they can be used to evaluate and optimize the performance of hash functions in certain applications.

 
-

and yet how can I build a similar hash function using a machine learning algorithm


*

it is not common to build a hash function using a machine learning algorithm, as the primary goal of a hash function is to map data to a fixed-size representation in a way that preserves the structure of the data as much as possible. Machine learning algorithms, on the other hand, are typically used to learn patterns or make predictions based on data, rather than to generate fixed-size representations of data.

However, if you still want to explore the possibility of using a machine learning algorithm to build a hash function, one approach you could try is to use an unsupervised learning algorithm to learn a compact representation of the data, and then use that representation as the hash function.

For example, you might use an autoencoder or a clustering algorithm to learn a compact representation of the data, and then use that representation as the hash function. This would involve training the machine learning model on the data, and then using the trained model to generate the hash values for new data.

Overall, while it is not common to build a hash function using a machine learning algorithm, it is possible to use machine learning to learn a compact representation of the data that could be used as a hash function in certain applications.


 
Maxim Dmitrievsky #:

Here is a list of some common hash functions that can be used to map integers or other data types to lists or other data structures:

I thought it promised to map a vector of arbitrary size to a vector of fixed specified size, but here it maps a number to a vector?

Everything is complicated and incomprehensible) Life was much simpler without GPT) We will have a hard time in the age of AI)

 
Aleksey Nikolayev #:

I think it promised to map a vector of arbitrary size into a vector of fixed specified size, but here it maps a number into a vector?

Everything is complicated and incomprehensible) Life was much easier without GPT) We will have a hard time in the age of AI)

I've written more about vector to number.

Vector into a fixed vector?

Reason: