### 1. The Deep Learning Revolution

This short video highlights the revolutionary role of artificial intelligence (AI) in achieving superhuman abilities, discovering new materials and conserving scarce resources.

These technologies allow visually impaired persons to recognize faces and to read text, as well as help the blind read to their children. Self-driving vehicles give us the freedom to explore remote areas without street maps.

The video emphasizes the role of AI technology in strengthening people's ability to make better decisions and solve complex problems.

- 2016.06.09
- www.youtube.com

### 2. Visualization of information processing processes in deep learning neural networks

A series of short videos: better watch than try to describe.

### 3. [DeepLearning | video 1] What "Is" a Neural Network?

**But what is a neural network? | Chapter 1, Deep learning **

This video introduces viewers to neural networks and explains how they work. Neural networks contain several layers of neurons, each of which is connected to the neurons of the previous and next layers through weights and biases. The activation of a neuron is determined by the weighted sum of the activations of the previous layer neurons, which is then compressed by a sigmoid function.

- 00:00:00 An introduction to neural networks telling how they are inspired by the brain and how they can be used to recognize handwritten digits. The video also explains the structure of the neural network, including the input layer, hidden layer, and output layer.
- 00:05:00 This part explains why the layered structure of a neural network can be expected to behave intelligently. It is stated that each neuron in the network's middle layers corresponds to one of the several sub-components that make up the overall image. For example, a neuron can be activated when a looped image is input to the input layer. This allows the network to assemble various components that make up an image and to ultimately recognize the digit represented in the image.
- 00:10:00 The weights and biases of a neural network determine its behavior, while learning is the process of adjusting these values to achieve the desired behavior. Neural networks consist of layers of neurons, each of which is connected to the neurons of the previous and next layer through weights and biases. The activation of a neuron is determined by the weighted sum of the activations of the previous layer neurons, which is then compressed by a sigmoid function. This final vector is then fed into the next layer.
- 00:15:00 In this video, the author explains what a neural network is and how it works. He also introduces the sigmoid function and explains how it is used to compress the corresponding weighted sum between zero and one.

- 2017.10.05
- www.youtube.com

### 4. [DeepLearning | video 2] Gradient Decent, How Neural Networks Learn

**Gradient descent, how neural networks learn | Chapter 2, Deep learning **

This video explains how gradient descent helps neural networks learn more effectively.

- 00:00:00 Introducing the idea of gradient descent, which is at the heart of how neural networks and many other machine learning algorithms learn. The video then shows how the handwritten digit recognition network is parameterized with a cost function, weights, and biases. The performance of the network is evaluated on the example training dataset, and as the network gets better at recognizing numbers, the cost function becomes more and more accurate.
- 00:05:00 Gradient descent is a powerful tool for training neural networks and it is important to remember that the cost function must have a smooth output in order to effectively minimize it.
- 00:10:00 Explaining the gradient descent algorithm and the operation of artificial neurons. Gradient descent is used to find a local minimum in the cost function while moving in small steps down a slope. This process is repeated until the network reaches a good solution. The video then shows an example of gradient descent in action with a network of neurons trained to recognize numbers. Although the network isn't perfect, what's impressive is that it can handle images that it hasn't seen before.
- 00:15:00 Gradient descent is a technique used to train neural networks. In the first part, we saw that deep neural networks trained on randomly labeled data achieve the same accuracy as on properly labeled data. The second part shows that if a neural network is trained on a dataset with the correct labels, then the local minima of its cost function are of the same quality.
- 00:20:00 Demonstrating how gradient descent works in neural networks and how it can help the network learn more effectively.

- 2017.10.16
- www.youtube.com

### 5. [DeepLearning | video 3] What Is Backpropagation really doing?

**What is backpropagation really doing? | Chapter 3, Deep learning**

The backpropagation algorithm is used in neural networks to help them learn. The algorithm computes the gradient of the cost function, which depends on the weights and biases of the network. The gradient is then used to adjust the weights and biases of the network.

- 00:00:00 Backpropagation is at the heart neural network training. The algorithm computes the gradient of the cost function, which depends on the weights and biases of the network. The gradient is then used to adjust the weights and biases of the network.

- 00:05:00 Backpropagation is a supervised learning algorithm that helps increase neuron activation in a deep learning network. The algorithm adjusts the weights and biases of the neurons in the previous layer so that this change is proportional to the size of the respective weights. Backpropagation also helps propagate the desired changes to the weights and biases of the neurons in the second layer.

- 00:10:00 Backpropagation is used to adjust the weights and biases of the neural network. This is a stochastic gradient descent algorithm that randomly splits data into mini-batch and updates weights and biases based on the mini-batch. This algorithm is computationally faster than true gradient descent and can converge to a local minimum of the cost function.

- 2017.11.03
- www.youtube.com

### 6. [DeepLearning | video 4] Backpropagation Calculus

**Backpropagation calculus | Chapter 4, Deep learning **

This video explains the math behind the backpropagation algorithm for deep learning using a simple network with one neuron per layer defined by weights and biases. A chain rule is introduced to understand how weight changes affect cost, and the sensitivity of cost to small weight changes is found through derivatives of cost functions, activation functions, and weighted sum. Sensitivity is taken into account by iteratively calculating the sensitivity of the previous activation in the extension of the chain rule to find the sensitivity to previous weights and biases. The approach remains similar even with multiple neurons per layer, with each weight having its own index to keep track of its position in the layer.

- 00:00:00 In the first part, we dive into the necessary mathematical apparatus for backpropagation in deep learning. The video shows the example of a simple network with one neuron per layer defined by three weights and three biases. The goal is to understand how sensitive the cost function is to these variables and what adjustments will be most effective in reducing the cost function. A chain rule is introduced to understand how changes in weight variables affect the cost function. The sensitivity of the cost function to small changes in the weight is calculated using the derivatives of the cost function, the activation function, and the weighted sum.
- 00:05:00 The second part introduces the concept of sensitivity in relation to weights and biases in a neural network. The derivative of the cost function with respect to weights and biases is found through an extension of the chain rule, requiring sensitivity to be taken into account. Although sensitivity can be thought of as the number of neurons that fire together and communicate with each other, the derivative requires that the expression be averaged over all training examples. The sensitivity of the previous activation in the chain rule extension is calculated and used to iteratively calculate the sensitivity to previous weights and biases. The approach does not change much even if the layers in the neural network have multiple neurons; however, each weight must be indexed with additional indices to keep track of its position in the layer.

- 2017.11.03
- www.youtube.com

### Artificial Intelligence Full Course | Artificial Intelligence Tutorial for Beginners | Edureka

Above, we presented you with the best materials for an introduction to artificial neural networks. This video by Edureka will give you a comprehensive and detailed knowledge of AI concepts with practical examples.

For your convenience, we provide a general timeline and then a detailed one for each part. You can go directly to the right moment, watch in a mode convenient for you and not miss anything.

- 00:00:00 - 01:00:00 Part 1 provides an introduction to artificial intelligence, discusses its history, different areas and concepts, and how deep learning is used to solve real-world problems. It also talks about different types of artificial intelligence and popular programming languages for developing AI.

- 01:00:00 - 02:00:00 Part 2 discusses the different types of artificial intelligence and how they can be used to solve different types of problems. It explains how linear regression can be used to predict the average maximum temperature for a given temperature range, and logistic regression can be used to predict the likelihood that an outcome will be one or zero. It also discusses the decision tree algorithm and how it can be used to build a decision tree. Finally, it explains how random forest can be used to create a more accurate and stable forecast.

- 02:00:00 - 03:00:00 In Part 3, Edureka tutor Michael Kennedy explains how the K-means clustering algorithm works and how it can be used to compress huge datasets into a small number of meaningful values. He also discusses that reinforcement learning is another kind of machine learning that helps agents learn to achieve their goals in an unknown environment.

- 03:00:00 - 04:00:00 In Part 4, we learn how to calculate the information gain for a parent node, a child node, and a different type of road. The entropy is calculated for the right side and turns out to be zero, indicating that there is no uncertainty. When the road is flat, the speed of the car is high, indicating that there is no uncertainty in this information. When the road is steep, the speed of the car may be slow or fast, indicating that the information is not specific to any particular type of road.

- 04:00:00 - 04:50:00 In Part 5, Edureka instructor Kirill Eremenko provides a comprehensive overview of artificial intelligence, covering the basics of programming, data, and machine learning. He explains how neural networks work and how they can be used to predict stock prices. It also describes the necessary steps for training a neural network, including data preparation, partitioning, and scaling. Finally, he discusses model architecture parameters for an AI system, including the number of neurons in each hidden layer, bias size, and cost function.

### Detailed timeline for parts of the video course

**Part 1**

- 00:00:00 Zulaikha from Edureka talks about the history of AI, different fields and concepts related to it, how AI came into being, the limitations of machine learning, and how deep learning is needed. She also introduces the concept of deep learning and shows how it can be used to solve real-world problems. Finally, she talks about the next module, natural language processing.

- 00:05:00 Artificial intelligence is the science and engineering of creating intelligent machines that can perform tasks that would normally require human intelligence, such as visual perception, speech recognition, decision making, and translation between languages. Recent advances in computing power and algorithms have made it possible to more effectively integrate artificial intelligence into our daily lives. Universities, governments, start-ups and major technology companies are pouring their resources into AI because they believe it is the future. Artificial intelligence is rapidly developing both as a field of study and as an economy.

- 00:10:00 Artificial intelligence is used in various fields, from finance to healthcare and social media. AI has become so important that even businesses like Netflix are using it.

- 00:15:00 Artificial intelligence is divided into three stages, and we are currently in the weak AI stage. Artificial general intelligence, or strong AI, is still far from being achieved, but if this happened, it would be a milestone in human history.

- 00:20:00 This section introduces different types of artificial intelligence and then discusses different programming languages for AI. Python is considered the best language for AI development and R is also a popular choice. Other languages include Python, Lisp, Prolog, C++, MATLAB, Julia and JavaScript.

- 00:25:00 Python is a flexible and easy to use programming language that is becoming popular in the field of artificial intelligence. Machine learning is a technique that allows machines to learn from data in order to improve their predictions.

- 00:30:00 Machine learning is a subset of artificial intelligence that uses algorithms to automatically learn and improve with experience. The main component of the machine learning process is a model that is trained using a machine learning algorithm.

- 00:35:00 The difference between an algorithm and a model is that an algorithm maps all the decisions the model needs to make based on a given input, while a model will use a machine learning algorithm to extract useful insights from the input and give you a result that is very accurate. We then have a predictor variable, which is any feature of the data that can be used to predict the output. So in the same example, the height will be the response variable. The response variable is also known as the target variable or output variable. This is the variable you are trying to predict using the predictor variables. So, the response variable is the function or output variable that needs to be predicted using the predictor variables. Then we have what is called the training data. You will often encounter the terms "training" and "testing" data in the process of machine learning. Training data is the data that is used to build the machine learning model. So in the process of machine learning, when you load the data into the machine, it will be divided into two parts. Dividing data in two subsets is also known as data splitting. You will take the input data, divide it into two parts.

- 00:40:00 Data collection is one of the most time-consuming steps in machine learning and if you have to manually collect data, it will take a lot of time. But fortunately, there are many resources online that provide extensive datasets. All you have to do is web scraping where you just have to download the data. One site I can tell you about is Cargill. So if you are new to machine learning, don't worry about data collection and all that. All you have to do is go to websites like Cargill and download the dataset.

- 00:45:00 Supervised learning is a technique in which a machine is trained using well-labeled data. Supervised learning is similar to how teachers help students understand mathematical concepts.

- 00:50:00 In supervised learning, the training dataset contains information about what objects look like, such as images of Tom and Jerry. A machine learning algorithm is trained on this labeled dataset to learn how to identify and classify images. In unsupervised learning, a machine learning algorithm does not receive labeled data, but instead trains on unlabeled data. In reinforcement learning, an agent is placed in an environment and learns to behave by performing actions and observing the rewards received for these actions.

- 00:55:00 Machine learning consists of three main types of learning: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning is used to learn labeled data, unsupervised learning is used to learn unlabeled data, and reinforcement learning is used to learn actions and rewards. There are three types of problems that can be solved with machine learning: regression, classification, and clustering. There are many algorithms that can be used to solve regression, classification and clustering problems, but the most commonly used are Linear Regression, Logistic Regression, Support Vector Machine and Naive Bayes.

**Part 2**

- 01:00:00 Artificial intelligence can be used to solve classification, regression and clustering problems. Supervised learning algorithms such as linear regression are used to predict target variables such as the house price index based on input data.

- 01:05:00 Linear regression is a supervised learning algorithm used to predict a continuous dependent variable, y, based on values of the independent variable, x. Linear regression starts by building a relationship between y and x using the best linear fit, and then calculates the slope and y-shift of the linear regression line.

- 01:10:00 Edureka instructor Michael Kennedy demonstrates linear regression on a data set of weather conditions recorded on different days around the world. He shows how to import the required libraries and read the data, how to plot data points and find a linear relationship between variables. He also discusses the warning message and explains that the main purpose of this demonstration is the weather forecast.

- 01:15:00 This lesson explains how linear regression can be used to predict the average maximum temperature for a given temperature range. The model is trained by splitting the dataset into train and test sets, and importing the appropriate linear regression class. After the tutorial, the instructor shows how to calculate the slope and y-shift for the line corresponding to the data.

- 01:20:00 Explaining how to use the regression algorithm to predict the percentage score of a test dataset. The video also shows how to plot the results and compare them to the actual values.

- 01:25:00 Logistic regression is a technique used to predict the dependent variable, y, given the independent variable, x, such that the dependent variable is a categorical variable, i.e., the output is a categorical variable. The result of logistic regression is always categorical and the basic technique used in logistic regression is very similar to linear regression.

- 01:30:00 Logistic regression is used to predict the probability that an outcome will be 1 or 0 using the equation Pr(X = 1) = beta0 + beta1*X. The logistic function, or S-curve, ensures that the range between zero and one is respected.

- 01:35:00 The decision tree algorithm is a supervised learning algorithm that is easy to understand. It consists of a root node (where the first split occurs), internal nodes (where decisions are made), and leaf nodes (where results are stored). Branches between nodes are represented by arrows, and the algorithm works by traversing the data through the tree until a terminal node is reached.

- 01:40:00 The "ID3" algorithm is the algorithm used to generate the decision tree. The steps required to use this algorithm are: (1) select the best attribute, (2) assign that attribute as the decision variable for the root node, (3) create a child for each value of the decision variable, and (4) label the leaf nodes classification. If the data is correctly classified, then the algorithm stops; if not, then the algorithm continues iterating through the tree, changing the position of the predictor variables or the root node. The best attribute is the one that most effectively separates the data into different classes. Entropy and information gain are used to determine which variable separates the data best. The highest measure of information gain will be used to partition the data at the root node.

- 01:45:00 In this video tutorial on artificial intelligence for beginners, we learn how to calculate the information gain for a parent node, a child node, and a different type of road. The entropy for the right hand side is calculated and turns out to be zero, which indicates the absence of uncertainty. When the road is straight, the speed of the vehicle is high, indicating that there is no uncertainty in this information. When the road is steep, the vehicle speed may be slow or fast, indicating that the information is not specific to any particular type of road.

- 01:50:00 Discussing how entropy is used to calculate the information gain in a decision tree. Calculating the entropy for the parent node, the weighted average for the child nodes, and the information gain for each predictor variable. The entropy for the road type variable is zero, which means there is no uncertainty in the dataset. The information gain for the road type variable is 0.325, which means that the dataset contains little information about the road type variable. The information gain for the obstacle variable is zero, which means that the obstacle variable does not affect the decision tree. The information gain for the rate limit variable is one, which means that the rate limit variable has the largest impact on the decision tree.

- 01:55:00 In a random forest, several decision trees are built, which are then combined to create a more accurate and stable forecast. Bootstrapping is used to create a small dataset which is then used to train decision trees. Random Forest is more accurate than Decision Trees at predicting new data because it reduces overfitting (memorization of training data).

**Part 3**

- 02:00:00 This video explains how to create a decision tree using a random forest. First, two or three variables are randomly selected to be used at each node of the decision tree, and then the information gain and entropy for each of them are calculated. This process is then repeated for each next branch node, creating a decision tree that predicts the output class based on the selected predictor variables. Finally, we return to the first step and create a new decision tree based on a subset of the original variables. This process is repeated until multiple decision trees have been created, each predicting the output class based on different predictor variables. Finally, the accuracy of the model is evaluated using the out-of-bag dataset.

- 02:05:00 In this video, Edureka instructor Michael Kennedy explains how Random Forest works. First, a bootstrap data set is created to ensure accurate predictions. A decision tree is then created using a random set of predictors. This process is repeated hundreds of times until the model is created. Model accuracy can be calculated using out-of-bag sampling.

- 02:10:00 The K nearest neighbor algorithm is a supervised learning algorithm that classifies a new data point into a target class or output class depending on the features of its neighboring data points.

- 02:15:00 The KNN algorithm is a supervised learning algorithm that uses data to predict the output of new input data points. It is based on the similarity of features to neighboring data points and is not parametric. The KNN algorithm is lazy and can remember the training set instead of learning the discriminant function.

- 02:20:00 Edureka instructor Alan C explains the basics of KNN and SVM classification algorithms. Each algorithm has its own strengths, and SVM is a popular choice for classification due to its ability to handle non-linear data.
- 02:25:00 Considering the use of various classification algorithms using Python. First, the data is read and grouped into fruit types according to their labels. Then various algorithms are implemented and tested on the data. Finally, the results are shown and discussed.

- 02:30:00 Discussing the importance of visualization in machine learning and explains the use of boxplots, histograms, and scalers. Also discussing the importance of splitting the data into training and test sets.

- 02:35:00 This video covers the use of logistic regression, decision trees, and support vector machines in a classification problem. The logistic regression classifier gave a good result on the training dataset, but was less accurate on the test dataset. The decision tree classifier was more accurate on the training dataset, but performed worse on the test dataset. The support vector machine gave good results on both training and test datasets.

- 02:40:00 The K-means clustering algorithm is an unsupervised machine learning algorithm used to group similar items or data points into clusters. It is used for targeted marketing, such as promoting a specific product to a specific audience.

- 02:45:00 The elbow method is a simple method used to find the optimal value of k for a particular problem. The elbow method starts by calculating the sum of squared errors for different values of k and then visualizing them. As k increases, the error decreases, indicating that more clusters result in less distortion. The optimal value of k is at the point on the graph where the decrease in error slows down, forming an "elbow" shape.

- 02:50:00 The elbow method is a simple way to choose the optimal K value for the K-means algorithm. This method starts by calculating the sum of squared errors for different values of K, and plotting them on a graph. As K increases, the error decreases, indicating that more clusters result in less distortion. The optimal K value for K-means is at the point where the distortion decreases drastically. This method can be easily implemented using standard library functions. In this video, we use a sample image from the scikit-learn dataset to demonstrate the elbow method.

- 02:55:00 This video explains how the K-means clustering algorithm works and how it can be used to compress large datasets down to a small number of meaningful values. It also considers that reinforcement learning is a different kind of machine learning that helps agents learn how to achieve their goals in an unknown environment.

**Part 4 **

- 03:00:00 A reinforcement learning agent in a video game, such as Counter Strike, tries to maximize its reward by taking the best action according to its current state and environment. For example, if an agent approaches a tiger, he may lower his expected reward to account for the possibility of being killed. These concepts, such as action, state, reward, and gamma, will be discussed in more detail in the next slides.

- 03:05:00 In this video, Edureka instructor Adriano Ferreira discusses the concepts of exploration and exploitation, a mathematical approach to solving Markov decision making, and the shortest path problem. He then goes on to show an example of how to choose a strategy to solve a problem using the greedy strategy and an example of how to choose a strategy using the exploration strategy.

- 03:10:00 The Edureka instructor explains the basics of reinforcement learning, including the three main methods: policy-based, value-based, and action-based. He then demonstrates the Q-learning algorithm, which is an important reinforcement learning algorithm. The goal of Q-learning is to find the state with the highest reward, and the terminology used in Q-learning includes state and action.

- 03:15:00 Explaining the basics of artificial intelligence, including how it works and how to create an agent that can learn from experience. The video explains how the reward matrix and the uniform Q matrix are used to determine the agent's current state and future rewards. Gamma is used to control exploration and agent usage.

- 03:20:00 The video tells about the basic concepts of artificial intelligence, including how the matrix Q of an agent stores its memory and how to update it. It then moves on to how to do the same in Python using the NumPy and R libraries.

- 03:25:00 The video demonstrates how to create an artificial intelligence (AI) system by teaching beginners how to use the code to create a reward matrix and a Q matrix, and set the gamma parameter. The video then shows how to train the AI system by running it for 10,000 iterations and how to test the system by choosing a random state and trying to reach the target state, which is room number five.

- 03:30:00 Machine learning is a field of study that helps computers learn from data. However, it is unable to handle high dimensional data. Another limitation of machine learning is its growing computational power requirements as the number of measurements increases.

- 03:35:00 Artificial intelligence is limited in its ability to be used for image recognition because images contain many pixels and have a lot of arrogant data. Feature extraction is an important part of the machine learning workflow because the effectiveness of an algorithm depends on how deeply the programmer has analyzed the data. Deep learning mimics how our brains work and can self-learn to focus on the right features, requiring very little guidance from the programmer.

- 03:40:00 Deep learning is a set of machine learning techniques that allow you to efficiently learn feature hierarchies in data. Deep learning consists of a neural network of artificial neurons that work just like our brains. The number of layers and the number of perceptrons in each layer is completely dependent on the task or application.

- 03:45:00 Explaining how weight is used in the calculation of the activation function. This activation function then determines how much of a particular input (X one) is used to create an output (one).

- 03:50:00 A multilayer perceptron has the same structure as a single layer perceptron, but with one or more hidden layers. The weights are initially assigned randomly and it is necessary that the weights be correct to minimize the error. Back propagation is a way to update weights to reduce error.

- 03:55:00 Edureka instructor Emmanuel tells the audience how to calculate the output of a model using backpropagation. First, they calculate an error that shows where the model is inaccurate. They then use error backpropagation to update the weights in a way that minimizes the error. If the error remains high, they stop updating the weights and find the global loss minimum, then stop.

**Part 5**

- 04:00:00 Back propagation is a mathematical technique that is used to adjust the network weights to reduce the error in the output layer. Gradient descent is used to optimize network forward propagation performance. Recurrent neural networks are a type of artificial neural networks that can be used to recognize patterns in a sequence of data.

- 04:05:00 Explains how deep neural networks work and how they can be used to predict stock prices. It covers the basics of forward neural networks, multilayer perceptrons, and recurrent neural networks.

- 04:10:00 Describes the steps required to train a neural network, including data preparation, partitioning, and scaling. The use of placeholders and initializers is also discussed.

- 04:15:00 The model architecture parameters for an artificial intelligence system are discussed, including the number of neurons in each hidden layer, the bias dimension, and the cost function. It then explains how the activation function transforms the hidden layers and how the output is transposed and costed.

- 04:20:00 Edureka instructor Kirill Eremenko explains the basics of deep learning, including the role of neural networks, optimizers, and initializers. It also explains how mini-batch training works and how epochs are used to train a neural network.

- 04:25:00 Demonstrating deep learning by comparing the predicted values of the model with the actual observed targets, which are stored in y. TensorFlow is then used to update the weight and bias factors. The model is then trained on the test data and its prediction is compared to the actual values. After 10 epochs, the accuracy of the model is shown to be very close to the actual value.

- 04:30:00 Text mining or text analysis is the process of extracting meaningful information from natural language text. Text mining is a vast field that uses NLP to perform text mining and text data analysis. NLP is a part of text mining that helps machines understand data as zeros and ones. Natural language processing is what computers and smartphones use to understand our language, both spoken and written. Examples of text mining and natural language processing applications include spam detection, predictive typing, and sentiment analysis.

- 04:35:00 It discusses the importance of tokenization, stemming, and lemmatization in natural language processing. It explains that tokenization breaks a sentence into words, stemming reduces words to their base form, and lemmatization links words back to their lemma. Stop words are common words that are removed to focus on important words.

- 04:40:00 This Edureka tutorial explains how to perform natural language processing using the NaiveBayesClassifier, which is a library that contains all the necessary functions to accomplish this task. They then demonstrate the process by performing a sentiment analysis on the movie review dataset. The classifier was able to pinpoint which reviews were positive and which were negative.

- 04:45:00 The Edureka Machine Learning Engineer program includes nine modules with over 200 hours of interactive learning covering Python programming, machine learning, natural language processing (NLP), graphical modeling, deep learning, and Spark. The curriculum includes supervised and unsupervised algorithms, statistics and time series, deep learning and Spark. The average annual salary for a machine learning engineer is over $134,000.

- 04:50:00 A comprehensive introduction to artificial intelligence is presented, including the basics of programming, data processing, and machine learning. After completing this introductory module, the student will be able to proceed with additional courses aimed at deepening their understanding of these topics.

- 2019.06.02
- www.youtube.com

### The MIT Introduction to Deep Learning course is designed for rapid and intensive learning of the fundamental principles of deep learning

The MIT Introduction to Deep Learning course is designed to provide a fast and intensive education in the fundamental principles of deep learning, with applications in computer vision, natural language processing, biology, and other fields. Within the course, students will gain basic knowledge of deep learning algorithms and practical experience in building neural networks in TensorFlow. The program culminates in a project proposal competition, which is evaluated by staff and industry sponsors. It is assumed that students are familiar with calculus (i.e. able to take derivatives) and linear algebra (i.e. able to multiply matrices), but everything else will be explained as the course progresses. Experience working with Python is helpful but not required.

**Lecture 1. MIT Introduction to Deep Learning | 6.S191**

In this video, MIT Alexander Amini introduces the basics of deep learning with a discussion of the perceptron. He goes on to show how to build a neural network from scratch, simplifying the process by using a library called TensorFlow. He finishes the video by discussing how to create a single-layer and a two-layer neural network with an output layer.

- 00:00:00 In this one-week introduction to deep learning course, students learn the foundations of the field and get hands-on experience using deep learning software labs. The course is fake, but the video and audio are actually generated using deep learning techniques. This allows the instructor to show high-quality, realistic examples of deep learning in action.
- 00:05:00 This video introduces the basics of deep learning, including the terminology and concepts involved. The class is split between technical lectures and software labs, with the final project focusing on a creative, innovative idea. The video finishes with a brief overview of the course's instructors and how to contact them if you have any questions.
- 00:10:00 The main goal of deep learning is to learn features from data, which is done by training deep neural networks using hierarchical layers of neurons. This allows for massive parallelization, as well as the ability to detect hierarchical features.
- 00:15:00 In this lecture, you will learn about the technical concepts behind deep learning, including the sigmoid activation function and the relu function. You will also see how activation functions are used in modern neural networks to introduce non-linearity. Finally, you will be shown how to use the perceptron equation to compute the weighted combination of input data points.
- 00:20:00 Alexander Amini introduces the basics of deep learning with a discussion of the perceptron. He goes on to show how to build a neural network from scratch, simplifying the process by using a library called TensorFlow. He finishes the video by discussing how to create a single-layer and a two-layer neural network with an output layer.
- 00:25:00 In this section he describes how deep learning works and how to build a neural network to predict whether a student will pass or fail a class. The network is not trained correctly, and as a result, the predicted probability of passing is incorrect.
- 00:30:00 In this video, Alexander Amini discusses the basics of deep learning, and how to optimize a neural network using gradient descent. He explains that deep learning involves training a network to improve its predictions based on data. The goal is to find weights (w's) that minimize the network's error on average, across all data sets.
- 00:35:00 In deep learning, back propagation is a process of propagating gradients all the way back to the input of a neural network in order to determine how each weight needs to change in order to decrease its loss. In practice, using learning rates that are neither too small nor too large avoids local minima and still converges towards a global optimum.
- 00:40:00 Alexander Amini discusses how to train deep neural networks using gradient descent, adaptive learning rates, and batching. He also discusses the dangers of overfitting and how to mitigate it.
- 00:45:00 In this lecture, the main points that were covered were the fundamental building blocks of neural networks, how to complete the puzzle and train them, and how to use a loss function. In the next lecture, Ava will be talking about deep sequence modeling using rnn's and a new and exciting type of model called the transformer.

- 2022.03.11
- www.youtube.com

### MIT 6.S191 (2022): Recurrent Neural Networks and Transformers

**Lecture 2. MIT 6.S191 (2022): Recurrent Neural Networks and Transformers**

This section of the MIT lecture provides an introduction to sequence modeling, explaining the importance of handling sequential data with examples such as predicting a ball's trajectory. Recurrent neural networks (RNNs) are introduced as a means to handle sequence modeling, with the lecture explaining how RNNs use an internal memory (or state) to capture prior history that informs present and future predictions. The lecture also discusses how RNNs can be unrolled across time to make weight matrices more explicit and introduces design problems and criteria for sequence modeling. The video also addresses common issues with RNNs, such as the vanishing gradient problem, and introduces the concept of attention as a potential solution that allows the model to attend to the most important parts of the input without recurrence. The lecture concludes with a discussion on how self-attention mechanisms can be applied to domains beyond language processing, such as in biology and computer vision.

- 00:00:00 In this section of the video, the lecture introduces the concept of sequence modeling and its importance in handling tasks that involve sequential data. The lecturer starts with a simple example of predicting the trajectory of a ball, where adding the ball's previous position data greatly improves the prediction. Sequential data is all around us in various forms like audio, text, EKG signals, and stock prices. The lecturer then explains the difference between feed-forward models and sequence models and provides examples of real-world applications where sequence modeling is required. To build a fundamental understanding of sequence modeling, the lecturer revisits the concept of the perceptron and demonstrates step-by-step how to modify it to handle sequential data.
- 00:05:00 In this section, the video discusses the concept of recurrent neural networks (RNNs) and how they handle sequential data. The output of an RNN at a particular time step depends not only on the input at that time step but also on the memory from the prior time step. This memory captures the prior history of what has occurred previously in the sequence, and it is passed forward from each prior time step. The video explains how RNNs use an internal memory (or state) to capture this notion of memory and how the output at a particular time step is a function of both the current input and past memory. The video also describes how these types of neurons can be defined and depicted as a recurrence relation.
- 00:10:00 In this section of the lecture, the instructor explains the concept of recurrence relation in neural networks and how it forms the key idea behind recurrent neural networks (RNNs). The RNN maintains an internal state, h of t, which is updated at each time step by applying a recurrence relation that functions as a combination of both the current input and the prior state. The parameters of this function are represented by a set of weights that are learned during training. The RNN predicts the output after having processed all the words and time points in the sequence. The output vector, y of t, is generated by passing the internal state through a weight matrix and applying a non-linearity. The lecture provides a visual representation of how the RNN loop feeds back on itself and can be unrolled across time.
- 00:15:00 In this section, the speaker explains how an RNN can be unrolled across time to make weight matrices that compute applied to the input more explicit. The weight matrices are reused across all individual time steps. The speaker also includes an example of how to implement an RNN from scratch and defines the call function, which defines the forward pass through the RNN model. The speaker highlights the usefulness of RNNs in a variety of applications and motivates a set of concrete design criteria to keep in mind.
- 00:20:00 In this section, the speaker discusses design problems and criteria for sequence modeling, which include handling variable-length sequences, tracking long-term dependencies, preserving and reasoning about order, and sharing parameters across sequences. The speaker explains the importance of embeddings for representing language as numerical vectors that can be fed into a neural network, with one example being a one-hot embedding where binary vectors indicate the identity of a word. The speaker also suggests using machine learning models, such as neural networks, to learn embeddings. Overall, these concepts serve as the foundation for recurrent neural networks and the emerging transformer architecture, which will be discussed later in the lecture.
- 00:25:00 In this section, the concept of learned embeddings is introduced, which is the mapping of the meaning of words to a more informative encoding that allows similar words with similar meanings to have similar embeddings. Recurrent neural networks (RNNs) are able to handle variable sequence lengths, capture and model long-term dependencies, and retain a sense of order, making them useful for sequence modeling tasks such as predicting the next word in a sentence. The backpropagation through time algorithm is introduced as the means for training RNNs, which involves backpropagating errors across each time step and performing matrix multiplications, potentially resulting in computational issues.
- 00:30:00 In this section, the problem of exploding gradients and vanishing gradients in recurrent neural models is discussed, and three solutions to mitigate the issue of vanishing gradients are presented. The vanishing gradient problem can cause a neural model to prioritize short-term dependencies over long-term ones, leading to inaccurate predictions. The three solutions discussed are to choose an appropriate activation function, to intelligently initialize weights, and to use a more complex recurrent unit such as a long short-term memory network (LSTM) that can selectively control information flow through its various gates. The LSTM network uses multiple gates that interact to maintain relevant information and eliminate irrelevant information.
- 00:35:00 In this section of the video, the lecturer discusses the fundamentals of recurrent neural networks (RNNs) and their architecture, including the importance of gated structures and mitigating against the vanishing gradient problem. They then provide concrete examples of how RNNs can be used, including predicting the next musical note in a sequence to generate new music and tweet sentiment classification. The lecturer also highlights the limitations of RNNs, such as the encoding bottleneck, inefficiency, and potential loss of information in encoding.
- 00:40:00 In this section, the limitations of recurrent neural networks (RNNs) in handling long sequences are discussed, particularly the bottleneck caused by the recurrence relation. The concept of attention is introduced as a potential solution to this problem, allowing the model to identify and attend to the most important parts of the input. Attention is explained as an emerging and powerful mechanism for modern neural networks, particularly in the context of the transformer architecture. The intuition behind self-attention is developed by considering the ability of humans to identify important parts of an image and extract features from those regions with high attention.
- 00:45:00 In this section, the concept of search and how it relates to self-attention in neural networks like transformers is explained. The idea is to attend to the most important features in the input sequence without recurrence, making use of embeddings that have some notion of position. The process involves extracting the query, the key, and the value features, which are three distinct transformations of the same positional embedding. The attention mechanism computes the overlaps between the query and key, and the extracted value is based on this computation, which allows the network to identify and attend to the most relevant parts of the input.
- 00:50:00 In this section of the video, the instructor explains how the attention mechanism works in neural networks. The attention mechanism computes the weighting of attention to be paid to different areas of the input. This can be achieved by computing the similarity between query and key vectors using a dot product and scaling it. The softmax function is then used to squash each value so that it ranges between 0 and 1. The resulting matrix is the attention weighting, which reflects the relationship between the input components. This weighting matrix is used to extract features with high attention, and multiple individual attention heads can be used to attend to different aspects of the input. This attention mechanism is a powerful tool, as exemplified by its use in transformer architectures that have a variety of applications, most notably in language processing.
- 00:55:00 In this section, the speaker discusses how self-attention mechanisms can be applied to domains beyond language processing, such as in biology with the Alpha Fold 2 neural network architecture for protein structure prediction, and in computer vision with Vision Transformers. The speaker also summarizes the previous discussion on sequence modeling tasks and how RNNs are powerful for processing sequential data, and how self-attention mechanisms can effectively model sequences without the need for recurrence. The remaining hour of the lecture is dedicated to the software lab sessions where students can download the labs from the course website and execute the code blocks to complete the labs, with virtual and in-person office hours available for assistance.

- 2022.03.18
- www.youtube.com

### MIT 6.S191: Convolutional Neural Networks

**Lecture 3. MIT 6.S191 (2022): Convolutional Neural Networks**

This video introduces convolutional neural networks, a type of machine learning algorithm that is used to detect features in images. The video explains that by using a smaller number of features, the network can more accurately classify images. The video also discusses how a cnn can be used to detect and localize a number of objects in an image.

- 00:00:00 In this video, Alexander Amini discusses how deep learning has revolutionized computer vision and applications, and how one example is data facial recognition.
- 00:05:00 In this section he discusses how computer vision is used to recognize and classify images. He also discusses how to detect features in images, and how to classify images using features.
- 00:10:00 This part discusses how convolutional neural networks can be used to detect features in images. The video explains that by flattening an image, the spatial structure is lost, making it harder for the network to learn the features. The video also explains that by using patches of weights, the network can preserve the spatial structure and make it easier for it to learn the features.
- 00:15:00 Convolutional neural networks are a type of machine learning algorithm that are used to detect features in images. The algorithm works by sliding a small patch across an image, and detecting features that are present in the patch. The weights for the patch are then determined by training the network on a set of examples.
- 00:20:00 Convolutional neural networks are a type of machine learning algorithm that can be used to extract features from images. The goal of convolution is to take as input two images and output a third image which preserves the spatial relationship between pixels.
- 00:25:00 In this video, the presenter describes how convolutional neural networks are implemented in neural networks and how they are structured. He also explains how the three main operations in a convolutional neural network - convolution, nonlinearity, and pooling - work.
- 00:30:00 This video introduces convolutional neural networks, a node in the machine learning pipeline that is connected to other nodes at the ith output. Convolutional layers are defined by parameters that define spatial arrangement of the output of a layer. The objective of a convolutional layer is to learn hierarchical features from one convolutional layer to the next. This is done by stacking three steps--feature extraction, spatial downscaling, and max pooling--in tandem. Finally, the video shows code for a first end-to-end convolutional neural network.
- 00:35:00 Alexander Amini discusses how convolutional neural networks can be used for image classification tasks. He explains that by using a a larger number of features, the downscaled image of a car can be more accurately classified as a taxi. He also discusses how a cnn can be used to detect and localize a number of objects in an image, even when they are located at different locations in the image.
- 00:40:00 The MIT 6.S191: Convolutional Neural Networks course discusses a heuristic for object detection that is much slower and more brittle than other methods. The faster rcnn method, which attempts to learn regions instead of relying on a simple heuristic, is proposed as a solution to these issues.
- 00:45:00 In this video, Alexander Amini discusses convolutional neural networks, their origins, and their applications. He also covers the impact of cnns on a wide range of tasks and fields.

- 2022.03.25
- www.youtube.com

- Free trading apps
- Over 8,000 signals for copying
- Economic news for exploring financial markets

You agree to website policy and terms of use

MQL5 now supports matrix and vector operations which are used in various computational tasks, including machine learning. We have created this thread to select and share some materials that may be useful to you. Machine learning technology is based on neural networks.

Neural networks are mathematical models which try to emulate the activity of the human brain. They are comprised of interconnected nodes which transmit signals to each other to make decisions based on these signals.

During the machine learning process, the computer utilizes data to train models to predict results using new data. Machine learning is applied in various fields, including medicine, business, materials science, and others.

Deep learning is a subfield of machine learning, which uses multi-layer neural networks to solve data processing problems. Deep learning models can explore data with high accuracy and automatically extract features from complex hierarchical structures, which is usually a difficult task for traditional machine learning algorithms.

Deep neural networks usually consist of multiple layers which sequentially process the input data. Each layer is a set of neurons which process data and forward the results to the next layer. The idea of the model training is in adjusting the weights of neural connections between the layers in an effort to minimize the error on the training dataset. One of the most widely used approaches to deep neural network training is backpropagation. This algorithm allows the model to determine how changes in the weights used on different layers affect the model error, and use this information to adjust the weights based on the gradient descent.

Deep learning enables the creation of more accurate models, compared to classical machine learning methods, such as logistic regression or decision trees. However, it requires a large dataset and extensive computing resources for training, which can be a problem in some fields.

Deep learning is applied in various fields, such as computer vision, natural language processing, speech processing and recommender systems, among others. In recent years, deep learning has been overwhelmingly successful in image recognition and natural language processing problems.

In this thread, we will share videos that will assist you in understanding how these technologies work.