Quantitative trading - page 33

 

Deep Learning Trading Strategy from the beginning to the production using TensorFlow 2.0 and TFX


Deep Learning Trading Strategy from the beginning to the production using TensorFlow 2.0 and TFX

I am delighted to introduce myself as Denis, and I warmly welcome you to ClosetoAlgotrading, your go-to channel for all things related to algorithmic trading.

It has been quite some time since the idea first sparked in my mind: creating a trading strategy solely utilizing the power of deep learning. The concept revolves around developing a neural network capable of autonomously identifying the parameters required to execute profitable trades. To embark on this exciting journey, I have decided to conduct a live experiment, which I hope will captivate your interest as well. In the upcoming episodes, I will guide you through each step involved in implementing a data science approach to crafting a robust trading strategy, right from the initial stages to the final production.

My aim is to cover all the intermediate steps comprehensively, including data preparation, rigorous testing, and beyond. Looking ahead, I will be utilizing TensorFlow 2.0 and exploring the possibilities offered by the tfx pipeline. I intend to avoid complex calculations or rules, relying instead on the power of deep learning to ascertain whether we can create a truly profitable venture. If this idea resonates with you, I encourage you to subscribe to the channel and join me on this captivating expedition. Together, we will navigate the twists and turns of algorithmic trading and strive to unlock its hidden potential.

For those who are curious to delve deeper into the data science approach, a simple Google search will yield various resources elucidating the subject. Among the results, you may come across informative articles and visual representations that outline the steps involved in following this methodology.

The first crucial block in our journey is the business understanding phase. At this juncture, we must meticulously define our goals and establish Key Performance Indicators (KPIs) that serve as measurable values to assess the effectiveness of our strategy in achieving these objectives. To embark on this phase, it is crucial to dedicate time to thoroughly comprehend the intricacies of your specific business domain. By gaining a profound understanding of your target, you can proceed with clarity and focus. It is imperative to discern whether you aim to predict specific outcomes or classify particular phenomena.

To determine the nature of our task, we must seek answers to fundamental questions. If our inquiry revolves around "How Much" or "How Many," we are dealing with a regression task. On the other hand, if we inquire about "which category," we are venturing into the realm of classification, and so forth. Once we grasp the type of task we seek to accomplish, it is essential to define the metrics that will signify success. In our case, these metrics could include Return on Investment (ROI) and accuracy. After all, our ultimate objective is to earn profits from the market, necessitating a firm grasp of future price movements.

Effectively predicting the future price movement entails not only determining the direction but also identifying the precise price level and the time at which it will be reached. However, merely knowing the direction is insufficient. We require the exact price level and the time when it will be attained.To avoid interminable waiting periods for our target price, we can define a minimum expected price target and a maximum waiting period. In other words, we seek to ascertain the direction of future price movements, resembling a classification task where the price can rise, fall, or remain at the same level.

To initially measure the performance of our model in predicting the direction, we can employ classification accuracy. This metric quantifies the number of correct predictions divided by the total number of predictions, multiplied by 100 to express it as a percentage. Now, what about our target price level? For our trading strategy, we can define the profit level as a percentage of our invested capital. Additionally, we must determine our risk tolerance by establishing a stop-loss level as the maximum acceptable risk for a single trade. Our profit and stop levels serve as ROI values for each trade. For instance, let's assume we purchase shares at $100 and sell them when they reach $105. This 5% price movement would yield a 5% return on our investment. With our exit levels defined, encompassing both take profit and stop-loss, we must address the issue of time. We do not wish to wait indefinitely for the price to reach our desired levels.

Hence, we establish a maximum holding period, albeit using traded volume instead of a fixed time frame. The rationale behind this choice will become clearer in the upcoming episode, where I will elaborate on data preparation.

To summarize our approach thus far: we are developing an intraday strategy that employs take profit, stop loss, and a maximum holding period for closing positions. To initiate trades, we will train a classification model capable of predicting the future price's direction: up, down, or flat. Initially, we will employ accuracy as the measure of our model's efficacy. With this foundation in place, we conclude our discussion for today. We have defined our goals, targets, and performance metrics. In the subsequent episode, we will delve into our data, preparing datasets and labels for further analysis and development.

Don't miss out on the next installment of our journey. Until then, take care, and I eagerly await our next encounter.

Deep Learning Trading Strategy from the beginning to the production using TensorFlow 2.0 and TFX
Deep Learning Trading Strategy from the beginning to the production using TensorFlow 2.0 and TFX
  • 2019.11.20
  • www.youtube.com
This is the first episode of the video series where we will try to create a trading strategy using the data science approach, deep learning models, TensorFlo...
 

Deep Learning Trading Strategy from the beginning to the production. Part II.


Deep Learning Trading Strategy from the beginning to the production. Part II.

I am delighted to welcome you to the second part of our captivating journey in creating a trading system. In the previous video, we discussed our goals, and today we will focus on preparing and labeling our dataset. So let's dive right in!

To begin, I have defined a set of functions that will aid us in our dataset preparation. First and foremost, we need to load our data. For today's demonstration, I will be using a small dataset to keep the runtime of the code manageable. As you can see, the dataset comprises tick data, including information such as Time, Price, Volume, Bid, and Ask. For the purpose of this demonstration, I will be utilizing data from a one-year period. Let's start by examining some statistics to gain an understanding of the dataset's characteristics. One of the key aspects we observe is the minimum and maximum price, which hover around $100. This is advantageous as it allows for easy calculation of profit percentages.

Additionally, I have introduced a crucial parameter to the dataset: the spread size. The spread is calculated as the difference between the Ask and Bid prices. Why is the spread size important? To illustrate, let's consider an example where the Bid price is $100 and the Ask price is $101. In this case, the spread equals $1. If we were to buy and immediately sell the stock, we would always lose the spread, which, in this example, amounts to $1. To gain insight into the spread size, I have calculated the mean spread over all days for every second. As depicted in the graph, the spread typically ranges between 1 and 2 cents, with occasional occurrences of slightly larger spreads. Based on this analysis, we can decide to execute trades only if the spread is less than 3 cents.

Interestingly, the graph shows that the largest spreads tend to occur in the first few minutes after the market opens. Consequently, it is prudent to skip the initial 10-15 minutes when implementing an intraday strategy. Now that we have defined the time period and examined the spread, we can proceed to generate labels for our model, which will predict the price movement direction. How do we generate these labels? As mentioned in the previous video, since we lack triggers for opening a position, we need to generate labels for each bar based on the expected return.

To accomplish this, we will employ the window method, where labels are generated when prices cross the window barriers. Here's how it works: we define a window of length n bars and set upper and lower window barriers based on our expected return in percentage. As we slide this window over all bars, with a step size of one bar, if the price goes out of the window, the first bar in the window will receive the label. Before proceeding with label generation, let's establish the parameters for the window. While the idea behind it is simple, selecting the optimal window size and barrier levels can be challenging. Personally, I have spent considerable time grappling with this issue and have yet to find a definitive solution.

To tackle this challenge, I will calculate the historical volatility over the day and the entire dataset. For instance, the graph presented illustrates the price change on each tick within a day, along with the corresponding volatility. Furthermore, we can assess the volatility across the entire dataset. As shown, the mean volatility is a mere 0.003%, which equates to approximately 30 cents of the current price. However, I don't intend to use a window that spans the entire day. To determine the window length, I attempted to generate 100 windows with random sizes and evaluated the mean volatility within each window. The resulting graph portrays the mean volatility for windows of different lengths. By selecting a window size of 50 bars, for example, we can anticipate a volatility of around 0.001%.

This volatility value becomes useful in defining our minimum expected return and calculating the size of our stop-loss price. With this information in hand, we can proceed to generate volume bars from our tick data. Utilizing bars instead of ticks enables us to calculate the window length more easily, as one bar typically contains a similar volume, ensuring stable conditions. To generate a volume bar, we iterate through the ticks and accumulate the volume until it surpasses or equals a predefined target volume (e.g., 1000). The ticks encountered during this accumulation phase represent one volume bar. Let's generate volume bars for a single day as an example. As illustrated, we obtain 179 bars for the selected day.

Consequently, the price graph now consists of these volume bars. Furthermore, we can calculate the percentage change in each bar and the daily volatility using the closing price. However, I don't intend to utilize a window spanning the entire day. To determine the window length, I employed the mean volatility and randomly generated windows for the entire dataset. The resulting graph showcases the window volatility over the entire dataset.

Now that we have completed these preparatory steps, we are ready to generate the labels. For this demonstration, I have chosen a window size of 50 bars and an expected return of 0.003%, which corresponds to approximately 30 cents based on the mean price. After the labeling process concludes, we may find several similar labels, known as crossing labels. To avoid having identical labels for different events, we will only retain labels with the closest distance between the first bar of the window and the bar where the price crosses the window barrier. Upon inspection, we observe that we have around 700 labels, evenly distributed among the three categories (up, down, and flat).

Now, let's save our dataset. We will create two files: one containing the volume bar dataset and another file containing tick information for each bar. The latter may prove useful in our model, so it's worth preserving. With this, I will pause our discussion for today. I believe we have covered ample ground, and for those interested in delving deeper into data labeling, I recommend exploring Chapters 3 and 4 of Marcos Lopez de Prado's book, which provide valuable insights.

Our next step will involve feature engineering and running everything through the tfx pipeline. I hope to create a new episode soon to share more intriguing information.

Until then, take care, and I look forward to our next video.

Deep Learning Trading Strategy from the beginning to the production. Part II.
Deep Learning Trading Strategy from the beginning to the production. Part II.
  • 2019.12.11
  • www.youtube.com
This is the second episode of the video series where we will try to create a trading strategy using the data science approach, deep learning models, TensorFl...
 

Deep Learning Trading Strategy from the beginning to the production. Part III. TFX Pipeline.


Deep Learning Trading Strategy from the beginning to the production. Part III. TFX Pipeline.

I am thrilled to welcome you back to another episode of "Close to AlgoTrading" with me, Denis. We have resumed our experiment after the New Year holidays, and though progress has been slow, we are still moving forward. In today's video, we will take a closer look at our labeled data and explore the TFX pipeline. So, let's dive right in!

In the previous video, we successfully created labels for our data, but I forgot to show you how they appear on a graph. As a quick refresher, we stored all the data in a new data frame. Let's go ahead and read this data frame.

Within our data frame, the 'dir' column contains the labels, while the 'cross_idx' column represents the tick number when the price crosses our defined window. To visually represent the open and close position events based on these columns, I've created a simple function. On the graph, an open position event is denoted by a filled triangle, while a close position event is represented by an unfilled triangle.

As you can see, most of the open events occur at local maximum or minimum points in the price chart. Moving forward, we will continue working with a small dataset, similar to what we've used previously. Additionally, I will split the dataset into train, evaluation, and test datasets.

Now that we have a better understanding of our labels, let's move on to the next step and start working with the TFX pipeline. For those unfamiliar with TFX, it stands for Tensorflow Extended, and it is a powerful framework designed specifically for scalable and high-performance machine learning tasks. The TFX pipeline consists of a sequence of components that perform various stages of the machine learning workflow, such as data ingestion, modeling, training, serving inference, and deployment management.

To familiarize ourselves with TFX, I recommend exploring the official Tensorflow TFX webpage, which provides detailed information about its components and how they are interconnected. You can find the relevant links in the video description.

As I am also new to TFX, we will be learning together in each episode. Today, we will focus on the first four components of the pipeline. Let's briefly introduce them:

  1. ExampleGen: This initial input component of the pipeline ingests the input dataset and optionally splits it into different subsets. In our case, it doesn't directly support custom time series splits, so I manually split the data into train, evaluation, and test datasets.

  2. StatisticsGen: This component calculates statistics for the dataset, providing insights into the data distribution, standard deviation, missing values, and more. It generates statistics artifacts for further analysis.

  3. SchemaGen: After examining the statistics, the SchemaGen component creates a data schema based on the observed data characteristics. The schema describes the structure and properties of our data.

  4. ExampleValidator: This component checks for anomalies and missing values in the dataset, using the statistics and schema as references. It helps identify any unexpected or inconsistent data patterns.

To ensure we are on the right track, I will use the Chicago taxi example provided by the Tensorflow team as a template. This example demonstrates the end-to-end workflow, including data analysis, validation, transformation, model training, and serving.

Now, let's switch our focus back to our own data. After importing the required modules and setting the necessary variables for input and output data folders, we can attempt to load our data into the TFX pipeline. The input folder contains subfolders for the evaluation, training, and test datasets.

Using the ExampleGen component, we should be able to easily load our data into the pipeline. However, it seems that ExampleGen does not directly support custom time series splits. By default, it splits the data into training and evaluation sets only. Fortunately, we can manually split the data and configure our own input split configuration, ensuring a one-to-one mapping between the input and output splits.

As a result, the ExampleGen component produces two artifacts: one for the training dataset and another for the evaluation dataset. Let's examine the first three elements of our training set to verify that it matches our original dataset. Moving on, we pass the output from the ExampleGen component to the StatisticsGen component. This component generates statistics artifacts for both the training and evaluation datasets. With just a single command, we can visually represent the dataset statistics, including data distribution, standard deviation, missing values, and more.

Here, we can observe the statistics for the training dataset, gaining valuable insights into the data characteristics. We can also examine the same set of statistics for the evaluation set. Based on the statistics, we notice that only 2% of our labels are non-zero, suggesting that our entry events for profitable trades might be outliers. This could pose a challenge in the future due to the imbalance between classes.

Next, we generate the data schema automatically using the SchemaGen component. This schema is derived from the observed statistics, but we could also define our own data description if desired. The output is a schema that provides a comprehensive description of our data's structure and properties. Finally, we reach the ExampleValidator component, which validates the data based on the generated statistics and schema. It checks for any anomalies or inconsistencies in the dataset. For instance, in the Chicago taxi example, the feature '_company' had an unexpected string value. We can use the ExampleValidator to detect such issues in our own dataset.

In our case, fortunately, we don't encounter any anomalies or inconsistencies in our dataset. This is a positive sign, indicating that our data is relatively clean and aligned with our expectations. Well, that wraps up our quick introduction to TFX. In the next episode, we will delve deeper into the remaining TFX components and explore how to transform our data and train our model.

Thank you for watching, and I look forward to seeing you in the next video!

Deep Learning Trading Strategy from the beginning to the production. Part III. TFX Pipeline.
Deep Learning Trading Strategy from the beginning to the production. Part III. TFX Pipeline.
  • 2020.01.19
  • www.youtube.com
This is the third part of the Deep Learning Trading Strategy from the beginning to the production series.In this video we are going to review our generated l...
 

Part IV. Deep Learning Trading Strategy from the beginning to the production. TFX Pipeline 2.


Part IV. Deep Learning Trading Strategy from the beginning to the production. TFX Pipeline 2.

Welcome to another episode of "Close to Algotrading." I'm Denis, and today we're going to continue exploring the TFX pipeline as part of our trading strategy construction.

In the previous video, we covered the first four components of the TFX pipeline: ExampleGen, StatisticsGen, SchemaGen, and ExampleValidator. These components laid the foundation for our pipeline, ensuring data validation and consistency.

Now, let's dive into the remaining components: Transformer, Trainer, Evaluator, ModelValidator, and Pusher. These components will enable us to transform our data, train our model, evaluate its performance, validate the model against a baseline, and finally, push the validated model to a production environment.

But before we proceed, let me address a few important points. While the TFX pipeline offers a powerful framework, it's worth noting that it may still have some bugs and certain components might be under construction. However, let's approach each step carefully and discuss any limitations along the way. In the previous video, we successfully validated our data, and now it's time to move on to the transformation step. For this purpose, we will use the Transformer component provided by TFX.

The Transform component is responsible for performing data transformations and feature engineering that are consistent for both training and serving. This means we can use the same data transformation function for both training the model and using it in production. This functionality is one of the main reasons I started exploring TFX in the first place.

To begin the transformation process, we need to create a couple of Python files. The first file will contain our constants, such as a list of numerical features and labels. The second file will contain a preprocessing function (preprocessing_fn), which is a callback function used by tf.Transform to preprocess the input data. In this function, we'll define the data transformation and feature construction steps. For now, let's focus on transforming all the numerical input features to z-scores and changing the label values from -1, 0, and 1 to 0, 1, and 2, respectively. This label transformation is necessary because the TensorFlow estimator expects positive values.

It's important to note that in the current version of TFX, only TensorFlow estimators are supported. If you prefer to use a Keras model, you would need to convert it to an estimator using the model_to_estimator API. Although TFX version 0.21 claims to support Keras models for the Trainer component, I've encountered some issues with its interactive context functionality. Hopefully, the TFX team will address these problems soon and provide a stable and fully functional tool.

Now, let's proceed with the data transformation. As you can see, the Transform component expects the following parameters as input: the generated examples, data schema, and the path to the file containing the preprocessing_fn function. After the transformation is complete, we can move on to creating and training our model. The Trainer component will be responsible for training the model based on the specifications we define.

In the Python file containing our model and input functions, we define a trainer_fn function that will be called by the Trainer component. This function should return a dictionary with the following items:

  • estimator: The TensorFlow estimator used to train the model.
  • train_spec: The configuration for the training part of the TensorFlow train_and_evaluate() call.
  • eval_spec: The configuration for the evaluation part of the TensorFlow train_and_evaluate() call.
  • eval_input_receiver_fn: The configuration used by the ModelValidator component when validating the model.

Within this file, we define the _input_fn function, which generates the input features and labels for both training and evaluation. We also have additional functions, such as _example_serving_receiver_fn, which builds the serving inputs, and _eval_input_receiver_fn, which prepares the necessary inputs for TensorFlow Model Analysis (TFMA).

To create our estimator, we define a build_estimator function. In this function, we set up the input feature set and create our estimator. It's worth mentioning that I've used DNNLinearCombinedEstimator because the DNNClassifier and DNNEstimator were causing errors related to passing features from TensorFlow 1. I believe this is not the case since we are using TensorFlow 2 methods. Unfortunately, I haven't found a solution to this issue. However, the Linear estimator seems to work fine.

Now that our model is defined, we can proceed with training it using the Trainer component. As you can see, the Trainer component expects a module_file that contains the trainer_fn function, along with other parameters such as transformer_examples, data schema, transform graph, and training and evaluation arguments. For now, we have only specified the number of steps for training and evaluation. After the model training is complete, we can move on to model analysis using TensorFlow Model Analysis (TFMA). Before we proceed, make sure you have installed TFMA by following the link provided. TFMA allows us to perform analysis on our model, either on the entire dataset or on specific feature slices. In this case, we'll perform analysis on three slices: the complete dataset, the label, and two specific features.

Analyzing the complete dataset, we observe a remarkable accuracy of 98 percent. However, when examining the model's performance based on the labels, we notice that it consistently predicts label 0. This outcome was expected due to the unbalanced labels and the absence of useful features in our model. Nevertheless, TFMA provides a convenient way to assess the model's performance.

Moving forward, we have the ModelValidator component, which helps us validate our exported models. It compares new models against a baseline (such as the currently serving model) and determines if they meet predefined criteria. This validation includes evaluating the models on an eval dataset and computing metrics such as AUC and loss. If the new model's metrics satisfy the developer-specified criteria relative to the baseline, the model is considered "good enough" and marked as such.

Finally, we have the Pusher component, which checks whether the model has passed validation. If the model meets the validation criteria, it is pushed to a specified file destination. This step ensures that only validated models are deployed to the production environment.

To conclude our pipeline, we can export all the components into an Apache Beam pipeline. We can then package all the necessary files into a zip archive. By unpacking the files from the archive into our working directory on the server, we can execute the pipeline. Once the execution is complete, we will have a trained model ready for use with TensorFlow Serving. If you're interested in learning how to start and use TensorFlow Serving from a Docker container, you can find a video tutorial on my channel.

Although we've covered a lot of ground in this series, it's important to note that the current version of TFX still has certain limitations. Depending on your specific requirements, it might be necessary to implement a pipeline that solely relies on the transformation function with a Keras model. However, I hope this exploration of TFX has been informative and beneficial.

In the next video, we'll return to our main goal of developing a trading strategy. Stay tuned for that! Don't forget to comment, subscribe, and like this video. Thank you for watching, and see you in the next one!

Part IV. Deep Learning Trading Strategy from the beginning to the production. TFX Pipeline 2.
Part IV. Deep Learning Trading Strategy from the beginning to the production. TFX Pipeline 2.
  • 2020.02.13
  • www.youtube.com
This is the part IV of the Deep Learning Trading Strategy from the beginning to the production series.In this video we are going to diving deeper into the TF...
 

Part V. GIGO. Deep Learning Trading Strategy from the beginning to the production.


Part V. GIGO. Deep Learning Trading Strategy from the beginning to the production.

Welcome back to another episode of "Close to Algotrading." My name is Denis, and I'm here to share my experiences and failures in developing a deep learning trading system.

I have been dedicating a significant amount of time to implementing the entire system using TensorFlow, but I must admit that I am quite disappointed with the quality of the library. While I appreciate the concept, the documentation provided by TensorFlow has been a source of frustration. It seems that even the brilliant minds with their PhDs struggle to write a well-structured documentation. Although it has improved compared to the past, it remains confusing at several points.

However, the documentation is just a minor issue compared to the bigger problem I encountered. Many functions simply do not work as expected when I tried to use them, and even the official examples sometimes fail to function properly. This has left me questioning whether I may have missed some core points of this framework. But let's get back to the main topic of this video: my failures. As you may recall, I decided to generate labels for price movement based on a specified percentage. The majority of these labels turned out to be local minimums or maximums in the data.

After performing some simple data processing and preparation, I created a simple LSTM network and put it to the test. My expectations were not exceptionally high, but the results were truly disappointing. As you can see in the accompanying visual, the network failed to learn anything meaningful.

Now, it's important to remember that a neural network is not meant to perform miracles. Perhaps it's time to reevaluate our approach. The events we are trying to predict represent only 2% of the entire dataset. These events can be considered outliers, and our simple network struggles to distinguish them from the rest of the data. Furthermore, there is a high probability that these events are unrelated and have distinct causes, making it difficult to build a model that accurately predicts them.

Given these challenges, it became evident that we need to focus on understanding the data better and identifying features with predictive power. With that in mind, I decided to revisit a simple moving average strategy. In our initial approach, we lacked a clear understanding of why we should enter a position. However, now we have a strategy for opening positions. Although it is not perfect, it is a widely used strategy that we can leverage. I added two moving averages and an RSI (Relative Strength Index) indicator to our dataset and collected all the events where the moving averages crossed each other. This allowed us to estimate the price movement directions.

Next, I created meta-labels using the same labeling method as before, but this time, I only labeled the bars where the moving averages crossed each other. There is an important difference here: since we already know the position direction (long or short), the meaning of our labels will differ. A label of 1 indicates a profit, -1 represents a loss, and 255 is assigned for other cases. With the updated dataset and labels, I trained a simple LSTM model and obtained a perfect ROC (Receiver Operating Characteristic) curve. The ROC curve shows the performance of a classification model at different classification thresholds. The AUC (Area Under the Curve) value of the ROC curve helps us evaluate how well the model distinguishes between classes.

However, when examining the predictions of the model, I noticed that it consistently predicted class 0. This outcome occurred because, once again, I used the entire dataset instead of only the relevant events. Our dataset remained imbalanced, and the features lacked predictive ability. ROC curves are not ideal for imbalanced datasets, as they can mask the poor performance of the model in predicting other classes.

To address the imbalance in the data, I made adjustments and focused only on the data relevant to the open events. Unfortunately, even after these modifications, my model failed to exhibit predictive power. It produced the same results as a random model. I also attempted a simple feedforward model, which showed slightly better but still unsatisfactory results. In conclusion, as you can see, there is no magic in deep learning if you feed your model with poor-quality data. The principle of "Garbage in, garbage out" holds true in this context. Moreover, when dealing with highly imbalanced datasets and lacking features with strong predictive power, a classification model will tend to predict the class that constitutes the majority of the dataset.

As I mentioned earlier, it is crucial for us to understand our data and identify processes that can help us find a way to beat the market.

That wraps up today's episode. I hope you found it insightful and informative. Take care, stay home, and stay healthy.

Part V. GIGO. Deep Learning Trading Strategy from the beginning to the production.
Part V. GIGO. Deep Learning Trading Strategy from the beginning to the production.
  • 2020.03.21
  • www.youtube.com
This is the part V of the Deep Learning Trading Strategy from the beginning to the production series.In this video we are going to have a look at my fails. W...
 

Part VI. Reinforcement Learning. Deep Learning Trading Strategy from the beginning to the production


Part VI. Reinforcement Learning. Deep Learning Trading Strategy from the beginning to the production

I'm Denis, and welcome to another episode of "Close to Alga Trading." In our previous episodes, we have been working towards building a profitable trading strategy using deep learning. Let's recap our progress so far.

We started by observing market conditions in search of inefficiencies. Once identified, we created simple rules and prepared the data for our deep learning models. However, upon implementing our direct approach, we discovered that it did not yield the desired results. This led us to realize that we need to rethink our approach and consider alternative strategies.

In today's episode, we will explore a new direction: reinforcement learning. Reinforcement learning has gained significant success in various domains, surpassing human performance in games like chess, Dota, and Go. It is also utilized in control robotics and self-driving cars. So, why not apply it to trading? While I cannot guarantee its success, it is an intriguing avenue to explore.

To begin, let's delve into a brief introduction to reinforcement learning and familiarize ourselves with the key definitions.

Reinforcement learning can be described as learning through trial and error to solve problems of optimal control. In simpler terms, it involves finding the best action to take in a given environment state to maximize a final numeric reward. To better understand this concept, let's visualize it with a simple analogy.

Imagine a trader sitting in front of a monitor, buying and selling stocks. Here, the trader represents the agent, while the chart price and broker form the environment. The agent observes the current state of the environment and takes actions, such as buying a stock. In return, the agent receives a reward, which could be the broker's fees. As the environment changes, the agent can buy more stocks, sell, or do nothing. At the end of the trading day, the agent receives a final reward, which can be positive or negative. The goal of the agent is to maximize this final reward.

In reinforcement learning, our objective is to design agents that can learn by interacting with an environment. Now, let's familiarize ourselves with the main definitions that we will encounter throughout our journey.

  1. Agent: The algorithm that makes decisions, performs actions, observes the environment, receives feedback, and aims to maximize the reward.

  2. Environment: The world in which the agent resides. It encompasses all the data available for our agent to make decisions.

  3. State: The configuration of the environment that the agent senses.

  4. Reward: The feedback that the agent receives after taking an action. It is a value that the agent seeks to maximize. Importantly, in reinforcement learning, the reward does not necessarily mean money or a physical trophy. It is simply a numeric value that can also be negative, and our goal is to maximize this value.

  5. Action: Anything that the agent is capable of doing in the given environment. For simplicity, let's consider three actions: buy, sell, or do nothing.

  6. Episode: A complete run of the entire task.

These are the main definitions that we will be using throughout our journey. There are additional important terms that we will cover in future videos.

With this introduction to reinforcement learning, we will proceed to learn about the environment and create our own environment using TensorFlow in the next video.

Thank you for joining me today, and I look forward to exploring reinforcement learning further with you. See you soon!

 

Part VII. Reinforcement Learning. Trading Environment.


Part VII. Reinforcement Learning. Trading Environment.

I'm Denis, and you're watching "Close to AlgoTrading." In our last episode, we provided a brief introduction to reinforcement learning. Today, we'll dive into creating a simple trading environment for our agent. Let's explore the details.

First, we need to consider what the trading environment represents. Is it solely based on price data, or should we incorporate additional factors like the complete order book, news, or even rumors from Twitter? This decision will shape the complexity of our environment.

Next, we need to determine the actions our agent can perform. Should it be limited to "buy" and "sell" actions, or can it also skip certain steps? Defining the action space is crucial for our agent's decision-making process.

Now, let's discuss the target and how we'll define rewards. In the real world, the trading environment has a continuous state space. However, for simplicity, let's keep the environment as straightforward as possible, enabling us to understand its inner workings and implementation.

To start, instead of using real price data, we'll generate a random walk process consisting of only 20 values. These simulated daily prices will serve as our input. The episode will begin from the middle of this price sequence, meaning the initial state will include the 10 historical prices. The remaining invisible prices will be filled with zeros. With each step, a new price will be added to the array. The episode will conclude when all 20 prices become visible to the agent. Therefore, we have a total of 10 steps until the episode's completion.

In addition to the price data, the state will include information about our open position. Let's represent the position using one-hot encoding: [0,0] for no open position, [1,0] for a long position, and [0,1] for a short position.

Considering typical trading actions, we'll include "buy" and "sell" for cases when no position is open. However, if there's already an open position, the agent can only choose to skip or close the position. Thus, in this scenario, the "sell" or "buy" action is equivalent to "skip."

Our agent's goal is to maximize profit and loss (PnL) over the 10-day period. Therefore, we'll define the reward as the daily PnL.

With a clear understanding of our environment's structure and behavior, we can now proceed to the implementation stage. As mentioned earlier, we'll utilize the TensorFlow (tf-agent) framework. The tf-agent team has provided a comprehensive guide on developing a suitable environment for tf-agent, which I recommend checking out for a deeper understanding of the code.

To create our environment, we'll start by inheriting the PyEnvironment class, as it defines the interface that all Python environments must implement. Within the Init function, we'll initialize variables, set initial states, and most importantly, specify the action and observation specifications. In our case, the action space will consist of four different actions, with minimum and maximum values set to 0 and 3, respectively. The observation specification will describe the environment state, with the price starting from 0 and no maximum limit. The data type for prices will be float. Returning the action and observation specifications is crucial for seamless integration with tf-agent.

The next significant function to implement is reset. This function will reset the environment to its initial state. Ensuring the correct implementation of this function is vital for the environment's proper functioning.

Now, let's discuss the most critical function: step. The step function receives an action as an input parameter and is responsible for managing the environment's state transitions. In this function, we'll handle all possible actions and calculate the PnL (reward). The function returns a time_step, which consists of the observation (the part of the environment state the agent can observe to choose its actions at the next step), the reward (the agent's learning objective), step_type (indicating whether this time step is the first, intermediate, or last in a sequence), and discount.

In the step function, we have two different return scenarios: one for intermediate steps and another for the last step in the episode.

After creating the environment, it's essential to validate it to identify and fix any bugs. The util package provides a function called validate_py_environment, which can be used for this purpose. Additionally, checking the environment specifications will help ensure everything is functioning as expected.

With our environment ready for use, it's a good idea to test it with different actions to debug and validate its behavior.

I conducted some tests using a simple DQN tf-agent, and here are the results. After 20,000 steps, the agent demonstrated acceptable performance. However, keep in mind that we only had one time series for the agent to learn, making it relatively straightforward. If we introduce a second time series and run 20,000 steps, the results may not be as promising. But with around 100,000 steps, the agent's performance improved significantly.

 

Part VIII. Reinforcement Learning Trading Strategy. DQN: QNetwork, QRNNNetwork


Part VIII. Reinforcement Learning Trading Strategy. DQN: QNetwork, QRNNNetwork

Hello, my name is Denis, and you are watching "Close to AlgoTrading." In this video, we will provide a short update on our previous environment for the reinforcement learning agent. We'll also give a brief description of Q-Learning and the DQN agent. We'll then proceed to implement the main learning steps for an rDQN agent from the tf-agents library and discuss the test results.

To begin, let's revisit the trading environment we created in the previous video. In that environment, we used synthetic generated data. However, in this updated version, our environment will receive a dataframe with historical data as input parameters. The observation_spec for the environment is a dictionary with two keys: "Price" and "Pos." The "Price" key contains 20 elements with open, close, high, low, and volume data. The "Pos" key contains information about our open position.

At the start of each episode, we randomly select a slice of 20 prices from our data. This change allows our reinforcement learning agent to learn from real historical data.

Moving on, let's discuss Q-Learning and the concept of a Q-Table. Q-Learning involves assigning a Q-value to every pair of (state, action). This table, known as the Q-Table, is used by the agent to select actions with the maximum Q-value in the current state. The agent explores the environment, receives rewards, and updates the Q-values based on the observed rewards.

To update the Q-values, we use a formula that involves the old Q-value and the future Q-value. We calculate the future Q-value by looking up the maximum Q-value for the next state in our Q-Table. With the obtained future Q-value, we update the Q-value associated with the starting pair (state, action).

However, in our case, the financial market has a very wide state space, making it impractical to use a Q-Table. To overcome this challenge, we can use a deep neural network to predict the Q-values for a given state. This approach, known as the DQN agent (Deep Q-Network agent), uses a Q-Network, where the Q-values are updated by minimizing the loss through backpropagation. The loss function used in the DQN agent is given by a specific equation.

Now that we have a good understanding of Q-Learning and the DQN agent, let's proceed to implement the main learning steps for the rDQN agent using the tf-agents library.

The overall training algorithm follows these steps:

  1. Create an environment.
  2. Create an agent.
  3. Collect data from the environment using some policy.
  4. Train the agent using the collected data.
  5. Validate the agent's performance.
  6. Repeat from step 3.

Creating the environment is a straightforward task. We wrap our trading environment into the TFPyEnvironment, which allows seamless integration with tf-agents.

Next, we create a Q-Network for the DQN agent. The tf-agents library provides a Q-Network class that we can use to define our Q-network. We define a simple Q-network with one hidden fully connected layer consisting of 40 neurons. As our observation is a dictionary, we also define a simple preprocessing layer for it.

With the Q-Network created, we proceed to create a DQN agent. We instantiate the tf_agents.agents.dqn.dqn_agent class, passing our Q-network as a parameter. We also define the optimizer and the loss function for training the model.

To train the agent, we need data from the environment. We collect this data using a policy, and for this step, we can randomly select the actions. The DQN agent has two policies: agent.policy, which is used for evaluation and deployment, and agent.collect_policy, which is used for data collection.

Data collection involves taking the current state, selecting an action, receiving the next state and reward, and storing this information in a buffer. We collect multiple steps or episodes, forming trajectories. The tf-agents library provides a driver called DynamicEpisodeDriver, which collects steps until the end of an episode. The driver updates observers, including a reply buffer.

For storing the data, we can use the commonly used TFUniformReplayBuffer from the tf-agents library. We define the specifications of the data elements the buffer will store, the batch size, and the maximum length of each batch segment.

Once the data collection step is complete, we can train our agent. The agent requires access to the replay buffer. We create a tf.data.Dataset pipeline to feed data to the agent. Each row of the replay buffer stores a single trajectory, but the DQN agent needs both the current and next observations to compute the loss. Therefore, we set the num_steps parameter to 2, which allows the dataset pipeline to sample two rows for each item in the batch.

At this point, we have everything in place to train two DQN agents on the same data and evaluate their performance. One agent uses a simple Q-Network, while the other uses a QRNNNetwork. Both agents are trained using 200 days of historical price data.

After 700 training steps, we observe that the simple Q-Network agent did not learn much and mostly shows a negative average return. However, the QRNNNetwork agent mostly shows positive average returns. This result aligns with expectations, as the RNN agent can capture some dynamics in the data and learn faster.

While this simple experiment provides some hope for using reinforcement learning in building a profitable agent, there are still other metrics to consider to evaluate the agent's performance. We'll explore those in a future video.

Thank you for watching, and I'll see you in the next episode.

Part VIII. Reinforcement Learning Trading Strategy. DQN: QNetwork, QRNNNetwork
Part VIII. Reinforcement Learning Trading Strategy. DQN: QNetwork, QRNNNetwork
  • 2020.10.31
  • www.youtube.com
This is the part VIII of the Deep Learning Trading Strategy from the beginning to the production series.In this video we are going to explore the training st...
 

Part IX Reinforcement Learning Trading Strategy. DQN: Testing Trading Strategy based on Agent Policy


Part IX Reinforcement Learning Trading Strategy. DQN: Testing Trading Strategy based on Agent Policy

I'm thrilled to announce that we have reached a significant milestone in our journey: testing our strategy. For this purpose, we will utilize a straightforward backtesting Python framework to conduct our tests. To get started, you can install the framework by executing the command "pip install" followed by the framework's name.

The framework's author has provided a simple example that demonstrates how to use it effectively. However, before we dive into that, we need to implement our own strategy. Let's begin by initializing and defining all the variables we'll need for our strategy.

One interesting aspect is that we can load our saved policy using the TensorFlow API. Since we are using a QRNNNetwork, we need to obtain the initial state of the policy. Consequently, we have implemented the initialization function. Now, it's time to implement the "next" function, which will be called for every new step. At the start, we need to collect data for the first 10 days, and afterwards, we can feed our observation dictionary. With each step, we update the observation and pass it to the policy.

Once we have created the observation dictionary, we need to create a timestep object since our policy model requires it as input. To facilitate this process, I've created a simple function that converts our observation data to the timestep object. The crucial elements here are the observation data and step_type. After obtaining our timestep object, we can retrieve an action from the policy. As you can see, the "runPolicy" function resets the policy state if the step_type is equal to 0 and returns the action and the new policy_state.

Next, we need to update our position states and execute the action. Finally, at the end of the "next" function, we increment the counter and reset everything to the initial states to simulate the start of a new episode. Great! We have successfully implemented our strategy. Now, we need some data for testing purposes. We can use the panda_datareader library to retrieve daily data from Yahoo Finance. Let's start by testing our strategy on Intel stock, using a one-year historical data.

We create a backtest object and commence testing. The test results show a return of 106%, which is impressive. However, it's important to note that the backtesting framework starts calculations from 100%, meaning our actual return is only 6%. Nonetheless, this is not a bad outcome considering our policy wasn't extensively trained. To provide a more comprehensive assessment, let's also test our strategy on AMD stock. As you can see, the result for AMD shows a decline of approximately 40%. Thus, we can compare the performance of our strategy side by side on AMD and Intel stocks.

Now you know how to use the agent policy with a backtesting framework. Similarly, if you're using Python for your real trading environment, you can employ the policy in the same manner. However, for those using other languages, you can deploy the policy using the Flask framework and access it via a REST API.

I hope you found these videos interesting and informative. If you did, please consider subscribing, and I'll see you in the next episode.

Part IX Reinforcement Learning Trading Strategy. DQN: Testing Trading Strategy based on Agent Policy
Part IX Reinforcement Learning Trading Strategy. DQN: Testing Trading Strategy based on Agent Policy
  • 2020.12.13
  • www.youtube.com
This is the final episode of the Deep Learning Trading Strategy from the beginning to the production series.In this video we are going to see how to use the ...
 

What is a Quantitative Trading System? Structure and description.


What is a Quantitative Trading System? Structure and description.

Hello, everyone! I hope you're all doing well. It has been quite some time since I last released a video, but I want to assure you that I haven't forgotten about you. Today, I'm excited to start a new and interesting topic: the software architecture of an automated trading system.

Before we delve into the details of software architecture, let's first understand what a trading system is and what it consists of. In this video, we will explore the structure and elements of an automated trading system. Now, as the saying goes, "There is nothing new under the sun." When I started my journey in this field, I was on a quest to find a well-structured description of a trading system. I wanted something that would make it easy to understand which blocks to implement and how to create a robust software solution.

I came across a book by Rishi K. Narang called "Inside the Black Box," where he describes a quantitative system composed of five common blocks: the alpha model, risk model, transaction cost model, portfolio construction model, and execution model. Additionally, there is one more essential block: data.

Let's take a closer look at this structure, starting with the data block. Although the data block is not technically a part of the trading system, it plays a vital role as the oxygen that all components of the system rely on. The data block encompasses various types of data required by the trading system. This data can come from different sources such as exchanges, regulators, news agencies, and other relevant sources like microeconomic data, broker's fees, or portfolio information.

Now that we understand the data block, let's explore the elements of the trading system and the relationships between them. In the diagram, you can see arrows representing the flow of information. The alpha model, risk model, and transaction cost model do not make final decisions; instead, they provide information to the portfolio construction model, which, in turn, triggers the execution model. It's important to note that there are strategies where only a subset of these elements is present, and the relationships between elements may vary. However, this structure gives us a holistic view of the main elements of a trading system.

The first element in this structure is the alpha model. The alpha model represents the trading idea or strategy designed to predict future outcomes. Typically, the output from this model is a return or direction forecast. There are two well-known types of trading models: technical models based on price data and technical analysis, and fundamental models that utilize financial data and fundamental analysis. We can also have hybrid models that combine aspects of both. Regardless of the complexity, the primary purpose of the alpha model is to provide advice in the form of forecasting.

Next, we have the risk model. The risk model is designed to help reduce or minimize exposure to factors that could lead to losses. Risk models can be categorized into two types. The first type focuses on position sizing to reduce risks, employing strategies such as hard sizing or complex functions. The output from this type of risk model is the position size. The second type of risk model aims to mitigate specific types of risks, such as market direction risk. In such cases, the model may suggest a hedge position as the output.

The third element is the transaction cost model. This model provides information about the cost associated with executing a trade. There are three main costs: commissions and fees, slippage, and market impact. Transaction cost models can range from simple models that return a flat cost value to more complex ones, like Quadratic Cost Functions, which aim to predict the actual cost as accurately as possible. The graph below demonstrates how different cost functions can work.

Once we have all the elements that provide information, we move on to the portfolio construction model. This model takes inputs from the alpha model, risk model, and transaction cost model, and decides how to allocate funds among different assets. It aims to construct a portfolio based on some objective function. There are two primary types of portfolio construction models: rule-based models (e.g., equal weights, equal risk, decision tree methods) and portfolio optimizers. The latter involves optimizing an objective function to achieve a more optimal asset allocation in the portfolio.

Finally, we have the execution model, which receives information from the portfolio construction model and focuses on executing orders at the best possible price. There are various types of execution models, ranging from simple market or limit orders to more complex ones that analyze market micro-structure and utilize machine learning algorithms.

That concludes the brief description of the main elements of a quantitative trading system. I hope this overview has provided you with a better understanding of the trading system structure and how it operates in general.

In the next videos, I will try to create a software system architecture based on this description. If you find this topic interesting, please consider subscribing to the channel for future updates. Thank you for watching, and I'll see you in the next video.
What is a Quantitative Trading System? Structure and description.
What is a Quantitative Trading System? Structure and description.
  • 2019.10.14
  • www.youtube.com
This video briefly describes the common structure of a Quantitative Trading System. Also, it provides the information about the relationships between the ele...
Reason: