Machine Learning and Neural Networks - page 35

 

CS 198-126: Lecture 21 - Generative Audio



CS 198-126: Lecture 21 - Generative Audio

In this lecture on generative audio, the presenter covers various topics such as quantization, aliasing, signal processing, projections, deep learning, and Transformers. The lecturer discusses how to sample and quantize continuous signals and the trade-off between the bit depths' precision and computation power. The Shannon-Nequist sampling theorem and its effects on reconstructing signals and the importance of projections and their use for signal reconstruction are also explained. Deep learning is explored for audio reconstruction, and the presenter introduces generative audio and how it can reconstruct music from lost or damaged recordings. The use of Transformers is discussed for audio generation, and the process of representing music as a series of tokens is explained. The speaker also emphasizes the importance of having a large and varied dataset and discusses the transformer model's operation for music predictions. The lecture concludes with a demo of generated music, showing the model's ability to predict future notes accurately.

  • 00:00:00 In this section of the lecture, the focus is on generative audio and how to discretize continuous signals, which is necessary for computers to process audio. The process of sampling and quantizing a continuous signal is used to generate Digital Signals. The lecture explains how the analog to digital converter uses the Sample and Hold circuit, and how the output is discretized, depending on the level of precision required. The lecture also discusses the digital to analog converter and how a low-pass filter is used to maintain the passband of the signal, with certain cutoff frequencies determining the signal slope. These concepts are essential to generative audio and lay an important foundation for the understanding of the lecture's later material.

  • 00:05:00 In this section, the lecture covers the quantization levels and their correlation to the dynamic range of the signal being quantized. Higher bit depth leads to a more precise signal approximation, reducing errors significantly until reaching an almost perfect approximation at 16-bit depth. However, there is a trade-off when it comes to computation power, which may ask whether a lossless pitch or a much faster lossy pitch would suffice for the listener's ear. The Shannon-Nequist sampling theorem asserts that a signal can be reconstructed from its samples without any loss of information if and only if the original signal's frequencies lie below half of the sampling frequency. Failure to meet this criterion will lead to aliasing, which produces a problematic approximation of the signal.

  • 00:10:00 In this section, we learn about aliasing and its effects on signal processing, particularly in terms of poor sampling resulting in a modified output signal compared to the original input. We see examples of this through waveform visualizations and image sampling. Additionally, we hear about geometric signal theory, specifically the use of projections for signal reconstruction, and the use of deconvolutions in image segmentation. Lastly, the presenter shares a fun demo of generating 8-bit music using one line of C code.

  • 00:15:00 In this section, the lecturer discusses projections and how they can be used for reconstruction. The projection formula is the dot product of two vectors, and this similarity measure can be used to reconstruct a signal using a linear combination of projections onto another set of vectors. However, a basis is required, and the set of vectors used must be orthogonal to each other to ensure the maximum amount of information gained. By taking the projection onto different bases that are orthogonal to each other, we can gain information about the vector being projected and ultimately reconstruct the signal.

  • 00:20:00 In this section, the lecturer introduces the use of deep learning for audio reconstruction and how it can generate high-resolution audio by reconstructing a low-quality waveform. The model architecture resembles that of a unit which utilizes a one-dimensional representation of a sub-pixel convolution for upsampling. The down-sampled waveform passes through eight downsampling blocks using convolutional layers with a stride of two, and batch normalization is applied with a ReLU activation function. At the bottleneck layer, which is constructed identically to a downsampling block, the waveform connects to eight upsampling blocks. These blocks have residual connections to the downsampling blocks and use a sub-pixel convolution to reorder information along a certain dimension to expand the information gain, increasing the waveform's resolution while preserving features from the low-resolution waveform. The final convolutional layer has a restacking operation that reorders the information after the subpixel deconvolution, and the output waveform's upsampling is generated using the mean squared error loss function.

  • 00:25:00 In this section, the lecturer discusses the use of generative audio and how it can be used to reconstruct music from bands recorded during the mid to late 1900s whose recordings may not have been preserved in full quality. She talks about the downsampled spectrum and how it can be improved to match the true waveform by adding clarity and color to it. The lecturer then transitions into Transformers for audio generation and how the Transformer architecture can be used to predict music notes in a melody. This requires converting data, which is music files, into a token sequence, a unique problem that needs to be extensively considered due to the time series that needs to be captured, such as the musical signature, key, and beats.

  • 00:30:00 In this section, the speaker discusses the process of representing music as a series of tokens that can be fed into a transform model for generative audio. They explain how pitch, duration, and other attributes can be used to capture information about music notes, but also note the challenge of tokenizing 2D piano roll data into a single dimension. Different approaches, such as one-to-many notes or mapping many notes to a single token, are compared, and the use of separator tokens and a reduced vocabulary size is introduced. The speaker concludes by touching on data augmentation as a way to increase the diversity of training data for generative audio models.

  • 00:35:00 In this section, the speaker discusses the importance of having a large and varied dataset when using generative audio models. They explain how a single song can be transformed into 12 songs of different keys and how the more data and generalizability a model has, the better it will perform. The speaker also discusses the use of positional beat encoding as a way to provide metadata to the model to give it a better sense of musical timing. They note that the method of positional structure used in natural language processing can also be applied to music. The section concludes by discussing teacher forcing, a way to apply an attention mask to keep the model from accessing all of the information at once and leaking information about the tokens it is supposed to predict next.

  • 00:40:00 In this section, the speaker discusses how the transformer model used in generative audio operates. The transformer XL used in the implementation features relative position encoding and hidden state memory, which enable fast and accurate inference for music predictions. Since positionality matters in music, the model uses relative position instead of absolute position only. The model also captures two attributes of each note, namely pitch and duration, to store in memory and predict future notes accurately. The speaker then presents a demo of the Pachelbel Canon in D Major generated using the model, which shows that although the generated notes do deviate from the original composition, they still sound good.
CS 198-126: Lecture 21 - Generative Audio
CS 198-126: Lecture 21 - Generative Audio
  • 2022.12.03
  • www.youtube.com
Lecture 21 - Generative AudioCS 198-126: Modern Computer Vision and Deep LearningUniversity of California, BerkeleyPlease visit https://ml.berkeley.edu/decal...
 

CS 198-126: Lecture 22 - Multimodal Learning



CS 198-126: Lecture 22 - Multimodal Learning

Multimodal learning involves representing objects in different ways, such as through text, images, videos, or audio, while still recognizing they are the same object. The lectures explain the importance of multimodal learning in capturing diverse data sets and solving the distribution shift problem. The video focuses on CLIP, a Contrastive Language Image Pre-Training method, that uses text and image encoders to create embeddings for similar image-caption pairs. The embeddings can be used for classification, robotics, text to image generation, and 3D vision. The speaker emphasizes that the universality of CLIP latents shows the importance of representation learning and its usefulness in machine learning. The CLIP method has led to the evolution of the field of multimodal learning.

  • 00:00:00 In this section of the video, the lecturer explains the concept of multimodal learning and its importance. Multimodal learning involves representing objects in many different ways, such as through text, images, videos, or audio, and capturing their nuances while still recognizing that they are the same object. Multimodal datasets can comprise all of these data types, and the goal is to keep all the information to provide more context for learning. The lecturer argues that multimodal learning is important because data can come from different sources and datasets, and throwing away all this extra information may result in less information for learning.

  • 00:05:00 In this section, the focus is on the distribution shift problem, which arises when a computer vision model is trained on photorealistic data and tested on cartoonish data. The problem is that individual data sets have small distributions compared to what is possible. Due to diversity in data, the distribution shift problem becomes a significant issue as there are diverse data sets with different objects, data formats, and relationships. Multimodal learning aims to solve this problem by using all available data and relationships between the data to train better models for more diverse data. The goal is to learn meaningful compressed representations for everything from images to text and audio.

  • 00:10:00 In this section, the video discusses the importance of multimodal learning and the challenges that come with training models on diverse data sets. The paper being examined is called CLIP, which stands for Contrastive Language Image Pre-Training, and aims to investigate the relationships between images and corresponding text. The idea behind the CLIP model is that if an image and a caption are related, then the representations learned for both the image and the caption should be similar. The model uses two different models: a vision transformer for processing images and a transformer for natural language processing, and trains them from scratch. The training procedure involves pre-training both the text and image encoders using a large amount of image-caption pairs from various sources, with the goal of generating embeddings for both that are similar for a matching image-caption pair and different for a different pair.

  • 00:15:00 In this section, the speaker explains how the image encoder and text encoder work together to create embedded versions of image and text data that are very similar for matching pairs and very dissimilar for non-matching pairs. The diagonal elements of the resulting matrix represent the dot product between the embeddings for matching pairs, which are ideally very large, while the off-diagonal elements represent the similarities between embeddings that do not match and should be very small or negative. The speaker explains that this approach is similar to a classification task, where the loss function tries to make the diagonal elements as large as possible while minimizing the off-diagonal elements. The text encoder and image encoder work together to achieve this goal and create similar embeddings for matching pairs.

  • 00:20:00 In this section, we learn about an application of multimodal learning called CLIP, or Contrastive Language-Image Pre-training. CLIP uses a text encoder and an image encoder to create embeddings of images and captions. It then performs a dot product of the embeddings to see how well the image matches the caption. By doing this, CLIP can predict what the image is of without any fine-tuning, which is called a zero-shot prediction. This method shows that the encoded representations of text and images are meaningful and can generalize well to new data.

  • 00:25:00 In this section of the lecture, the speaker discusses the applicability and importance of representation learning via CLIP latents. The latents are generalizable and meaningful outputs of image encoding through a frozen pre-trained model. These latents have several use-cases, including classification of objects and robotics, where they can help in embedded representation of images. The speaker emphasizes that representation learning is applicable everywhere and can be useful in text to image generation and 3D vision. The Radiance field can be optimized using the loss function where the rendered image matches the clip latent corresponding to input captions. The process of optimization is differentiable, making it an effective tool for image reconstruction.

  • 00:30:00 In this section, the speaker discusses the concept of using image embedding to generate radiance fields based on a prompt given. Although this method is expensive, it demonstrates the universality of clip latents, which are based on pre-training a variety of data and using representations or relationships to generate usable latents for any domain. This multimodal learning technique proved to be effective and is considered an important subfield in the field of machine learning. The speaker notes that while the clip method is just the beginning, it has led to further evolution in the field.
CS 198-126: Lecture 22 - Multimodal Learning
CS 198-126: Lecture 22 - Multimodal Learning
  • 2022.12.03
  • www.youtube.com
Lecture 22 - Multimodal LearningCS 198-126: Modern Computer Vision and Deep LearningUniversity of California, BerkeleyPlease visit https://ml.berkeley.edu/de...
 

Tensorflow for Deep Learning Research - Lecture 1



Tensorflow for Deep Learning Research - Lecture 1

The video "Tensorflow for Deep Learning Research - Lecture 1" introduces the tutorial on TensorFlow by covering the need for a deep-level tutorial, explaining the library's basics and practical applications. The lecture covers how to build and structure models using TensorFlow from a deep learning perspective. The tutorial also covers the tools used in TensorBoard for visualizing a computational graph model, including how to operate with nodes, edges, and sessions, which provide efficient computation options by running subgraphs. The lecturer recommends learning TensorFlow from scratch to build custom models and efficiently handle resources, with the ability to run on CPU, GPU, Android or iOS, while providing the ability to deploy models.

  • 00:00:00 In this section, the speaker introduces the tutorial on TensorFlow by explaining that there is a need for video tutorials that are not too shallow and address the points which other tutorials miss. He announces that the lecture will follow the Stanford University content of CS 20 and uses the slides created by Chip Huyen, with his personal thoughts. Additionally, he points out that TensorFlow is a library for numerical computation, developed by Google, which allows graphs and numerical computations, making it suitable for building various machine learning models ranging from logistic regression to deep learning models. TensorFlow also provides the ability to deploy models on CPU, GPU, android or iOS.

  • 00:05:00 In this section, the speaker discusses the concept of checkpoints- saved states of models that enable continuation from the same point in case of interruptions. The importance of differentiation in deep learning frameworks is highlighted, and the speaker suggests trying to write backpropagation layers manually to understand the ease Tensorflow brings to the process. The popularity of Tensorflow is attributed to its use by a large community, and various projects such as image style transfer, handwriting generation, and StackGANs are discussed. The lecture aims to cover Tensorflow's computational graph model, building functions, and structuring models from a deep learning perspective. Finally, the speaker advises using the Tensorflow website as a source for the latest APIs and libraries.

  • 00:10:00 In this section, the speaker discusses the importance of learning TensorFlow from scratch. While high-level abstractions can handle more complex tasks, understanding the workings behind the code is crucial for creating custom models. The speaker explains that TensorFlow separates the definition of computation from its execution, assembling a computational graph and using a session to execute operations. The section explains how defining operations in TensorFlow creates a graph that can be visualized using the built-in tool TensorBoard. The sample code used shows nodes for operators and tensors, where the edge values represent tensor values.

  • 00:15:00 In this section, the lecturer discusses the nodes and edges in a TensorFlow graph, as well as the use of sessions to run computations within the graph. Nodes can be operators, variables, or constants, while edges are tensors. A session is used to instantiate the graph and run computations, with the session taking care of necessary resources. Running a specific node in the graph with a session will compute the graph and return the node's value, and the lecturer demonstrates this with an example using addition. The session object encapsulates the resource environment for operators and tensor objects to execute within. The lecturer also mentions the use of the "with" statement as an alternative to explicitly closing the session.

  • 00:20:00 In this section, the video explains the concept of lazy evaluation in Tensorflow. It states that when a graph is created in Tensorflow, it is not pre-calculated or cached somewhere, but calculations are done lazily as and when they are needed. The power of lazy evaluation comes from the fact that not everything needs to be calculated, so it saves a bunch of computation as we follow along deeper into the models. When calling the session dot run method, the first argument is a list of nodes within the fetches list that need to be calculated, and Tensorflow calculates all these different nodes and gives the results back as a list.

  • 00:25:00 In this section, the lecturer discusses the benefits of modeling computations as a graph in TensorFlow, including the ability to run subgraphs in parallel across multiple CPUs or GPUs. They explain how to specify a specific graph to run on a particular GPU using the command TF device GPU, and how to create multiple graphs in TensorFlow, which could be useful in cases like ensemble learning where you might have different models running in parallel. The lecturer also notes that sessions in TensorFlow can be greedy in terms of resource usage, so it may be necessary to manage resources carefully when using multiple graphs.

  • 00:30:00 In this section, the lecturer discusses how to create and manipulate graphs in TensorFlow. They explain that multiple sessions can be created, but it can be complex to pass data between them. It is recommended to use only one session for simplicity. To add nodes to a graph, TF dot Graph API is used, where the instantiated graph can be set as default. When the session is instantiated, the graph can be passed as an argument, and the entire execution environment is created around it. It is important to avoid mixing two graphs, and to obtain a handle on the default graph, the API TF dot get underscore default is used.

  • 00:35:00 In this section, the speaker discusses the advantages of using graphs in TensorFlow. One of the benefits is the ability to run subgraphs and even a single op, rather than having to run the entire graph, making computations more efficient. TensorFlow also has a distributed mode, allowing computations to be run across different devices and machines when the computation is described using subgraphs. The takeaway from this section is the importance of setting up graphs of computation and running subgraphs within a session.
Tensorflow for Deep Learning Research - Lecture 1
Tensorflow for Deep Learning Research - Lecture 1
  • 2017.01.29
  • www.youtube.com
These series of tutorials on tensorflow are based on the publicly available slides from the Stanford University class - CS20SI -being offered in the winter o...
 

Tensorflow for Deep Learning Research - Lecture 2



Tensorflow for Deep Learning Research - Lecture 2

The lecture on TensorFlow for Deep Learning Research covers a wide range of topics, including basic operations, tensor types, placeholders, and lazy loading. The importance of utilizing TensorBoard for visualizing the graph being run is emphasized, and various functions from the TensorFlow API are discussed, including random shuffle, random crop, TF.dot multinomial, and the random gamma function. The video also covers the concepts of defining 0-ness for different data types, initializing variables, assigning values to them, and the benefits of using a TensorFlow interactive session. Finally, the use of placeholders in TensorFlow is covered in detail, and the potential issues when using placeholders with undefined shapes are discussed.

The speaker also discusses the use of placeholders in TensorFlow, including how to feed multiple data points and use free dicts. The lecture then moves on to lazy loading, where computation is deferred until runtime to avoid bloating the graph caused by multiple nodes of the same operation in loops. Separating the definition of operation objects from computation and structuring code for variable definitions and computing functions helps avoid problems with calling functions in TensorFlow. The speaker also covers how the optimizer minimizes cross-entropy and updates weights and biases while using properties to structure TensorFlow code efficiently.

  • 00:00:00 In this section of the lecture, the instructor covers basic operations, tensor types, placeholders, and lazy loading. They demonstrate how to visualize the TF graph using TensorBoard with a simple graph program that is visualized using TensorBoard. The process involves adding a TF table summary file writer to the session context, providing a location to write the events to, specifying what to write, and then closing the writer when finished. They also show how to make the graph more readable by adding in name arguments.

  • 00:05:00 In this section, the speaker emphasizes the importance of utilizing TensorBoard for visualizing the graph being run, which is an automatic feature in TensorFlow that is not readily available in other frameworks. The lecture then delves into the signature of the TF.constant function and how its value, shape, and data type can be specified or inferred. Additionally, the speaker explains what happens when verify_shape is set to true or false, and how TensorFlow handles unique names for constants. Finally, the lecture demonstrates how to create a vector and a matrix, add them together, and use TensorBoard to visualize the graph.

  • 00:10:00 In this section of the lecture, the topic of broadcasting in TensorFlow is introduced, and it is demonstrated how it is similar to NumPy. Broadcasting is shown through examples of adding and multiplying constants to a matrix. The concept of creating tensors pre-filled with certain values is also covered, whether it be with zeros or ones, and how to create tensors filled with a custom value using the TF.fill method. The importance of correctly setting the data type for these tensors is emphasized. Overall, the lecture stresses the importance of understanding the similarities and differences between TensorFlow and other numerically-based libraries, like NumPy.

  • 00:15:00 In this section, the speaker discusses some different types of constraints and sequences that can be used in Tensorflow. One example is the use of linspace to create a sequence of equidistant intervals, where start and stop values are float 32 or float 64. Another example is the random normal function, which generates a tensor of a specified shape by sampling from a normal distribution. Truncated normal is similar, but samples within two standard deviations of the mean. Finally, random shuffle is discussed as a way to shuffle the values of a tensor on a specific dimension. The speaker suggests practicing these functions to develop muscle memory and to avoid constantly relying on documentation.

  • 00:20:00 In this section, the lecturer discusses various functions from the Tensorflow API, starting with the random shuffle function, which shuffles data on the fourth dimension by default, making it useful for image datasets. The random crop function crops contiguous blocks of data of a specified shape from a tensor. The TF.dot multinomial function samples from a multinomial distribution with a given logit tensor and number of samples. Finally, the random gamma function is discussed, which is another distribution in statistics that has two parameters: shape and beta.

  • 00:25:00 In this section, the speaker discusses the use of TensorFlow functions to generate randomly generated constants and sequences for deep learning research. The set_random_seed function is introduced, which allows users to set up a random feed for the whole graph, making the results more deterministically random and enabling users to repeat experiments consistently. The speaker also explains various TensorFlow functions for basic operations, such as element-wise addition, matrix multiplication, and modulo operations. Furthermore, the speaker emphasizes that some of the native Python types, such as boolean and string, can be used with TensorFlow.

  • 00:30:00 In this section of the lecture, the speaker discusses the concepts of defining 0-ness for different data types, such as vectors, matrices and strings, in TensorFlow, and what the expected output will be for each type. They also cover TensorFlow datatypes, which include slow 32 64, into 8 16 32 64, and how they can be used interchangeably with NumPy. Additionally, the speaker cautions against using constants in TensorFlow, as they are stored in the graph definition, which can cause issues later on.

  • 00:35:00 In this section, the speaker discusses the use of TensorFlow variables, explaining that users can define a variable using TFDOTvariable along with a value and optional name. Sizes are inferred from the value being put in, and users can choose to initialize their variables using the TF.global_variables_initializer() function. The speaker warns that uninitialized variables will lead to errors, but that users can initialize only a subset of the variables if necessary. Additionally, the speaker explains that constants are different from variables, as constant is an operator whereas a variable is a class with multiple functions and methods that users can call.

  • 00:40:00 In this section, the video explains the different ways of initializing variables in TensorFlow, one of which is by calling the "assign" op on them with a certain value. This approach can be useful when training a model using transfer learning, where some layers are assigned values from a pre-trained model while others are initialized randomly. The video also discusses how to initialize a single variable and how to get the value of a variable using the "eval" method. Additionally, the video explains that when assigning a value to a variable, it is okay if the variable has not been initialized before, and the "assign" op can initialize the variable before assigning the value.

  • 00:45:00 In this section, the video covers initializing variables and assigning values to them. The initializer op assigns a variable's initial value to the variable itself. Furthermore, the assign op adds, subtracts, or multiplies the current value of a variable with a new value, while the assign sub op subtracts the new value from the current value. Multiple sessions will maintain their own copies of variables and initializations, and it is important to carefully manage and follow the execution path when dealing with multiple sessions to avoid unexpected results. Lastly, uninitialized variables will not take any effect on assignments or operations, and all sessions should be closed to release resources.

  • 00:50:00 In this section, the speaker discusses the initialization of variables in the TensorFlow library. When running TensorFlow, all the variables used in code must be initialized. This can be risky when using multiple variables since they may depend on each other. In cases such as these, one should use the variable initialize value to ensure that the variable's value is safe before using it to initialize another variable. The speaker then goes on to explain the benefits of using a TensorFlow interactive session and how to use the session to evaluate a series of operations within the code. Finally, the speaker discusses control dependency, a method of ensuring that all the relevant operations are executed before calling the final operation. This can be useful in complex machine learning models containing many operations.

  • 00:55:00 In this section, the video discusses placeholders in TensorFlow, which allow for the assembly of a graph without knowing the values of the data that will be used in the computation. Placeholders act as placeholders for actual values that will be calculated down the line, and they are defined using the TF.placeholder operator with a defined type and shape. When running operations that involve placeholders, a dictionary must be created with the placeholders and their values, and this dictionary is then fed into the session.run() call. It's worth noting that while placeholders can have the shape set to none, some operations do require the shape to be defined, which may cause errors.

  • 01:00:00 In this section, the video discusses placeholders and how they are valid first-class ops on their own footing, which can be visualized in TensorBoard. The section also covers how to feed multiple data points in and how to use free dict for any of the variables or constants in your graph. The video then moves on to lazy loading, where objects are created only when they are needed, and normal loading where a node is created within the graph before executing it. The concept of lazy loading helps with memory management, especially when working with large datasets.

  • 01:05:00 In this section, the speaker explains lazy loading and its implications for TensorFlow. Lazy loading is a technique where the computation is deferred until runtime, rather than at graph construction - this can lead to multiple nodes of the same operation in the computation graph, especially in loops. In order to avoid bloating the graph and other associated problems, the speaker recommends separating the definition of operation objects from the computation and running of the operations. Additionally, the speaker emphasizes the importance of structuring code so that variable definitions are in one place and computing functions are in another.

  • 01:10:00 In this section, the lecturer explains how to structure code to avoid problems with calling functions when using TensorFlow for deep learning research. Using Python properties, the lecturer demonstrates how to define internal attributes such as 'start_prediction', 'optimized' and 'error'. The first part of the code block calculates the data size, target size, weight, and bias before adding an operation. The resulting output of this operation is added to the graph. The 'optimize' function follows the same pattern, creating the initial nodes the first time it is called and returning handles to these nodes subsequent times it is called.

  • 01:15:00 In this section, the speaker discusses how the optimizer minimizes cross entropy and updates weights and biases in TensorFlow. When the optimizer is called for the first time, TensorFlow does the backpropagation and updates the variables that contribute to the loss. When the optimizer is called subsequently, TensorFlow already has the graph and calculates the incoming nodes to minimize and update the weights without calling any additional nodes. The use of properties helps to structure TensorFlow code more efficiently. The next lecture will provide an example to better understand this process.
Tensorflow for Deep Learning Research - Lecture 2
Tensorflow for Deep Learning Research - Lecture 2
  • 2017.02.05
  • www.youtube.com
This is the second lecture in the series of tutorials on tensorflow and is based on the publicly available slides from the Stanford University class - CS20SI...
 

Tensorflow for Deep Learning Research - Lecture 3



Tensorflow for Deep Learning Research - Lecture 3

The third lecture on TensorFlow for deep learning research covers linear regression and logistic regression using the MNIST dataset. The lecturer shows how to train a linear regression model in TensorFlow by creating placeholders for input data, initializing trainable variables for weights and bias, coming up with predictions, calculating loss, and defining the optimizer as gradient descent with a specific learning rate. The lecture also explains mini-batch stochastic gradient descent and the importance of remembering the shape of variables. The accuracy of the model is calculated by comparing the index of the maximum value obtained from the TF argmax function with the target variable y, calculating the number of correct predictions using TF reduce sum and TF float, and dividing it by the total number of test examples. Finally, the lecturer notes that this model is not considered powerful and there are more robust models such as convolutional layers that yield higher accuracy.

  • 00:00:00 In this section, the speaker begins the third lecture on TensorFlow for deep learning research and starts with a review of the material from the previous lecture. They explain how TensorFlow separates the definition of the computation graph from its execution and how to assemble a graph by adding various operations to it. They then discuss TF constants, variables, and placeholders and their functionalities within a graph. The speaker emphasizes the importance of avoiding lazy loading and instead separate the assembling of the graph and executing it for optimal efficiency. They then introduce the first example of linear regression and explain how to predict the dependent variable Y based on independent variables in the data set. The speaker recommends that listeners follow along with the examples and solve the problems themselves.

  • 00:05:00 In this section, the lecturer explains the basics of linear regression and demonstrates how to train a linear regression model in TensorFlow. A simple linear model is used where the predicted value of y is W multiplied by X, added to B. The loss is calculated as the difference between the predicted value and the actual value of Y, and training occurs by minimizing the loss by optimizing the trainable variables of W and B. The lecturer then shares the code for a linear regression example with input data being the number of fires and the output target variable being the number of tests in a given sample. The code demonstrates how to create placeholders for the input data, initialize the trainable variables of W and B, come up with predictions, calculate loss, and define the optimizer as gradient descent with a specific learning rate.

  • 00:10:00 In this section, the lecturer describes the process of training a model in TensorFlow for deep learning research. After creating the model and designing the loss function, the next step is to initiate a session and run the model for a defined number of epochs. Each epoch involves repeatedly running the training data through the session and updating the variables to minimize the loss. The process is visualized through TensorBoard and the resulting model can be used to predict the output for any given input value. The lecturer also notes the presence of an outlier in the data that affects the model's prediction.

  • 00:15:00 In this section, the lecturer explains the role of optimizers in training models in TensorFlow, and lists several optimizers including gradient descent, grad momentum, Adam, RMSprop, proximal gradient, and proximal a grad. The lecturer emphasizes the importance of testing the model's performance on data it has not seen before to ensure that the model is generalizable. To address the sensitivity of the square error loss function to outliers, the lecturer introduces the Huber loss function and explains how it works. The lecturer also provides instructions on how to code the Huber loss function in TensorFlow.

  • 00:20:00 In this section, the lecturer explains the implementation of the Huber loss, which is a loss function commonly utilized in deep learning. The Huber loss is used for regression problems and works by finding the residual between predictions and labels. If the residual is less than the Delta value, a function called Small Res is returned. However, if it is larger than Delta, Large Res is returned. The lecturer then moves onto discussing logistic regression using the MNIST dataset. Logistic regression is used for classification problems and works by calculating the logit as X into W plus B. The result of this is then passed through a softmax function, producing a probability distribution. The loss function used in this case is cross-entropy loss, which measures the distance between two probability distributions.

  • 00:25:00 In this section, the instructor explains the concept of mini-batch stochastic gradient descent and the use of batches while training models for deep learning. By doing so, it helps to make the best use of GPU memory and is efficient when we cannot use the full training set. The tutorial includes steps to create a placeholder for the model, initialize the values of weight and biases for the input and output features, and create a model with tensorflow using the tf.random.normal function for initialization. The importance of remembering the shape of these variables while creating them was also emphasized, specifically with the last input feature dimension and the number of classes.

  • 00:30:00 In this section, the simple D F model with dimensionality to batch size of 10 is discussed, where the cross entropy loss function is used to calculate the loss. An optimizer is defined, and once the model is trained, it is tested to find the total number of correct predictions. The correct predictions are calculated using the logits batch and the softmax function, where probability values are fed to the model and compared to the actual labels to calculate correct prediction values. It is advised not to run the optimizer on the test set to avoid overfitting.

  • 00:35:00 In this section, the presenter explains how to calculate the accuracy of the model. The TF argmax function is used to obtain the maximum value in the row, and it returns the index of the digit that has the highest probability. We compare this index with the target variable y. Then, we calculate the number of correct predictions using TF reduce sum and TF float. Finally, we divide the number of correct predictions by the total number of test examples to obtain the accuracy, which is 90% for this specific linear model. The presenter also notes that this model is not considered powerful, and there are more robust models such as convolutional layers that yield higher accuracy.
Tensorflow for Deep Learning Research - Lecture 3
Tensorflow for Deep Learning Research - Lecture 3
  • 2017.02.15
  • www.youtube.com
This is the third lecture in the series of tutorials on tensorflow and is based on the publicly available slides from the Stanford University class - CS20SI ...
 

Tensorflow for Deep Learning Research - Lecture 4



Tensorflow for Deep Learning Research - Lecture 4

In Lecture 4 of the TensorFlow for Deep Learning Research series, the speaker delves into word embeddings in natural language processing based on deep learning. The lecture explains the concept of learning word embeddings for NLP problems and describes the process of representing words as numerical vectors in neural networks. The lecture discusses different methods of generating word vectors using AI-based CBOW and skip grams and addressing the computational complexity issue in softmax using negative sampling and NCE. Furthermore, the lecturer highlights the process of embedding variables in TensorFlow and using t-SNE for visualizing high-dimensional word vectors in reduced dimensions. Finally, the lecture concludes with a summary of concepts covered and a brief on the next lecture, which will focus on building word models.

  • 00:00:00 In this section, the lecturer discusses the concept of word embeddings in NLP based on deep learning. Word embeddings are a way to represent words as numbers in a neural network, which captures semantic relationships between words. The aim is to build a model that can cluster similar concepts, such as countries, together when projecting them onto a two-dimensional space. The lecturer also explains the process of learning word embeddings through counting and introduces the newer approach of using backpropagation with neural networks.

  • 00:05:00 In this section, the speaker discusses ways to represent words as vectors using the co-occurrence matrix. However, the co-occurrence matrix is very large and sparse, making it difficult to work with. To address this, the speaker suggests using truncated SVD to find a low-rank approximation of the matrix. This involves factorizing the matrix into three matrices and selecting only the first K of the right singular vectors to use as representations. However, SVD is a computationally expensive operation, especially for large vocabularies, making it challenging to scale. Moreover, adding a new word to the vocabulary would require redoing the entire calculation.

  • 00:10:00 In this section, the video discusses two methods for generating word vectors using AI creation-based methods: continuous bag of words (CBOW) and skip gram. CBOW uses context around the target word to predict the center word, while skip gram uses the center word to predict the context words. One-hot vectors are used to represent each word, and a weight matrix is created through a simple neural network to learn the word vectors. Training samples are chosen by selecting a window size and moving the window over the text. The neural network is trained to find the probability of any other word in the context given a specific word in the middle of the sentence.

  • 00:15:00 In this section of the lecture, the speaker explains the word embedding technique, which involves representing words as vectors to make them interpretable to the neural network. The technique involves randomly selecting a word from a context, and using a weight matrix to learn the word vectors, which are then paired with an output matrix to create a probability distribution of words in the vocabulary. The softmax function is used to normalize the scalar output distribution, so words with similar contexts will have similar probabilities assigned to them. By using this approach, an example in the lecture showed how the words intelligent and smart would occur in similar contexts and thus the word vectors for these two words would be very similar.

  • 00:20:00 In this section, the lecturer discusses potential solutions to the computational complexity issue that arises with softmax. The two methods that are commonly used to address this problem are negative sampling and noise contrast estimation (NCE). Although NCE can approximate softmax with a theoretical guarantee, both methods produce similar results for practical purposes. The lecture then delves into the definitions of empirical and knowledge distributions, where the goal is to approximate the empirical distribution with the help of the model's parameters. Finally, the lecturer introduces NCE as a method to reduce the language model estimation problem.

  • 00:25:00 In this section, the speaker explains how they will generate data for the two-class training problem using proxy binary classification. They sample a center word from the P tilde C and then take one true sample from PT little P tilde W comma C. They will generate K noise samples using QW and assign them the label, D, which is equal to 0 to indicate that these data points are noise. They then calculate the joint probability of D, comma W in the two-class data by creating a mixture of two distributions. Using the definition of conditional probability, they can turn this into the conditional probability of D, given W and C for the two cases when D is 0 and when it is 1.

  • 00:30:00 In this section, the speaker discusses the problem of the expensive calculation of the partition function and how to solve it using Noise Contrastive Estimation (NCE). NCE proposes adding the partition function as a parameter for every empirical context word and learning it through back propagation. By fixing the Vc parameter to one and replacing the P tilde W comma C with u theta W comma C divided by the partition function by Vc, a binary classification problem is obtained. The objective is to maximize the conditional likelihood of D with respect to the K negative samples, which can be written as a log probability. The speaker explains how NCE replaces the expectation with a Monte Carlo approximation, making the process less computationally expensive.

  • 00:35:00 In this section, the use of Noise Contrastive Estimation (NCE) instead of Negative Sampling (NS) is discussed. NCE is able to reduce the language modeling objective to a binary classification problem, and it is shown that the objective will be realized at the same point for the model parameters. The TensorFlow API for the NCE loss is presented as well as the use of name scope, which allows nodes to be grouped together for better visualization in TensorBoard. The scoping of variable names is also explained in the context of the name scope feature.

  • 00:40:00 In this section, the lecturer discusses the process of embedding variables in TensorFlow for deep learning research. By embedding variables within a particular scope, they become part of a nicely grouped visualization in the TensorFlow Board. The lecturer also explains how to visualize word vectors using a technique called t-SNE and provides a recipe for plotting the TC for the visualization. By taking the high dimensional vectors learned from the embedding matrix, t-SNE is used to reduce the dimensions into 2D or 3D, showing the nearest neighbor relationships between words. Finally, the lecturer gives an overview of the code used for the word to work model.

  • 00:45:00 In this section, the lecturer discusses the process of defining placeholders for input and output, constructing a variable for NCE loss, and defining embeddings and the embedding matrix. Using a vocabulary size of 50,000 and batch size of 128, the lecture defines an embedding matrix that learns a 128-dimensional word vector for each word. The process of inference in skip-gram is also explained, with the lecture focusing on one word at a time for ease of explanation. The loss function is then defined using the TensorFlow API, with the lecture providing a breakdown of key variables such as the negative samples and the number of training steps.

  • 00:50:00 In this section, the speaker discusses how NCE (noise contrastive estimation) is used to create a binary classification problem that helps solve the complexity of softmax. Each of 64 negative and one positive examples created for the proxy classification corresponds to a word vector. TensorFlow auto-differentiation is used to update the weights of a binary classifier to separate the one positive example from the 64 negative examples. The lecture ends with a summary of concepts covered so far, including word vectors, NCE, embedding matrices, and Principal Component Analysis (PCA). The next lecture will focus on managing different programs and experiments while building word models.
Tensorflow for Deep Learning Research - Lecture 4
Tensorflow for Deep Learning Research - Lecture 4
  • 2017.03.04
  • www.youtube.com
This is the fourth lecture in the series of tutorials on tensorflow and is based on the publicly available slides from the Stanford University class - CS20SI...
 

Tensorflow for Deep Learning Research - Lecture 5_1



Tensorflow for Deep Learning Research - Lecture 5_1

The fifth lecture in the TensorFlow for Deep Learning Research series covers several topics, including how to manage deep learning experiments effectively, the importance of automatic differentiation in TensorFlow, and the process of training models and saving variables. The speaker explains that automatic differentiation is provided in deep learning frameworks like TensorFlow, making it easier for users to code up their models without dealing with gradients. While it's not essential to calculate gradients manually, it's still helpful to work them out for simple functions and networks. The creation of a named entity recognition model with subclasses and the necessary placeholders and feed techniques is also covered, as well as saving and restoring variables in TensorFlow and the process of saving models across different sessions and machines.

  • 00:00:00 In this section, the speaker discusses how to manage deep learning experiments and the importance of automatic differentiation in TensorFlow. They explain that when writing a model, you will be trying various things and starting and restarting the training, so proper management is essential. Automatic differentiation is provided in deep learning frameworks like TensorFlow, making it easy for users to code up their models without dealing with the actual gradients. The speaker provides an example where TensorFlow's 'gradients' operation allows finding the gradients of Y with respect to each tensor in a list provided as the second argument. They also mention that everything in TensorFlow and deep learning is based on numerical differentiation.

  • 00:05:00 In this section, the speaker talks about whether it's necessary to learn how to calculate gradients by hand when TensorFlow already has a dot gradients feature. He suggests that while it's not essential, it's still helpful to work out gradients for simple functions or networks, especially when writing custom layers or dealing with gradient issues like explosion or vanishing gradients. He also suggests structuring the model in a more object-oriented way to make it easier to use and move the model out of the function entirely. The speaker mentions the CS 244 D assignment structure as an example of how to encapsulate components of a deep learning model.

  • 00:10:00 In this section, the speaker discusses the Model aspect of creating a deep learning model. They explain that the model is where the full inference is written, taking an input and performing a forward pass to give the output, then adding the loss operation which creates a loss scalar that compares the predicted output to the true output labels from the placeholders. The speaker suggests abstracting everything out as one base class and creating subclasses for specific types of models, such as a language model. Finally, they explain how to create a named entity recognition model with subclasses and the needed APIs like placeholders and feed tech.

  • 00:15:00 In this section, we learn about the process of training deep learning models using TensorFlow and how to manage experiments effectively. The process involves creating variable matrices and getting embeddings for training data before running an epoch of training data in a loop, which trains the embedding matrices. To save the progress, the TF to train.saver saves the variables in the graph in binary files, which can be called during future runs to start from where one stopped. The code example shows how to instantiate the saver object and loop through the training step to run the optimizer, saving the session and some directory name.

  • 00:20:00 In this section, the speaker delves into the details of TensorFlow's Save class and the variables associated with it. The global step variable, defined as a non-trainable code variable that starts at 0, can be incremented each time the training operation is called by providing it in the "minimize" function. Additionally, the "max to keep" variable limits the number of checkpoints saved to the most recent ones, and the "keep checkpoint every" variable controls how often a checkpoint is saved, making it useful for long training cycles.

  • 00:25:00 In this section, the instructor discusses how to save and restore variables in TensorFlow. He explains that users can specify a list of variables or a dictionary to save specific variables instead of saving everything. The benefit of this is to save space and increase throughput, especially when doing transfer learning. To restore variables, users can call the tf.latest_checkpoint function, which will check for the latest checkpoint in a specific directory and restore from there using the session.restore() function. The instructor also mentions that users can save the graph in TensorFlow 0.11 onwards by creating collections with keys and values corresponding to the variables, and then instantiating the saver object with all default values.

  • 00:30:00 In this section of the lecture, the instructor explains how to save and restore TensorFlow models across different sessions and even different machines. The steps involve running the global variables initializer, saving the session, and utilizing the "export meta graph" method to create a saved graph file. This saved graph can be restored and the variables can be re-initialized in a completely different process or machine if the original graph is not available. The instructor also mentions TF.summary, which will be covered in the next video.
Tensorflow for Deep Learning Research - Lecture 5_1
Tensorflow for Deep Learning Research - Lecture 5_1
  • 2017.03.18
  • www.youtube.com
This is the first part of the fifth lecture in the series of tutorials on tensorflow and is based on the publicly available slides from the Stanford Universi...
 

Tensorflow for Deep Learning Research - Lecture 5_2



Tensorflow for Deep Learning Research - Lecture 5_2

The video tutorial discusses the implementation of TF summary ops, which allow for visualization of data in TensorBoard. The tutorial covers three types of summary ops - TF.summary.scalar, TF.summary.histogram, and TF.summary.image - and explains how to merge them into one and write them to an event file using the FileWriter class. The lecturer demonstrates how to use name scopes to visualize the graph in TensorBoard and defines a test writer and a trained writer to write summaries to separate files. They emphasize taking advantage of TensorBoard's visualization capabilities to better understand one's model's performance. Overall, TensorBoard is a crucial tool for tracking training progress, and the API for adding ops and merging them is straightforward.

  • 00:00:00 In this section, the video tutorial discusses how to use TF summary ops, which are functions that attach information to nodes in the computational graph to generate summary data for visualization in TensorBoard. The tutorial covers three types of summary ops: TF.summary.scalar, for attaching to scalar value nodes in the graph such as loss, cross-entropy, and learning rates; TF.summary.histogram, for visualizing the distribution of a tensor, such as the weights of a particular layer; and TF.summary.image, for visualizing images, inputs, or even some intermediate layers. The tutorial explains how to merge all summary ops into one and write them to an event file using the FileWriter class.

  • 00:05:00 In this section, the speaker explains how to set up a file writer and a summary in TensorFlow in order to visualize data on TensorBoard. They recommend setting up the writer to run at specific intervals, as running it at every step might produce too much data. By providing the file writer with the graph object, the computational graph can be saved as a graph def and visualized on TensorBoard. The speaker demonstrates how to visualize scalars and histograms on TensorBoard by selecting specific steps on the x-axis to see the corresponding values on the y-axis. They encourage users to take advantage of TensorBoard's visualization capabilities to better understand their model's performance.

  • 00:10:00 In this section, the lecturer discusses defining a function called variable_summary, which takes a tensor and adds a summary node. The function covers mean, standard deviation, max, min, and histogram; each variable is going to be scoped out from a naming perspective. They also discuss the idea of disorder in the distributions where losses are the highest when the model is not trained enough, and as the step size increases, the losses decrease. Different file writers can be used for various experiments, and they can be saved in the log gear. The dense abode option permits selecting and toggling within different choices.

  • 00:15:00 In this section, the video focuses on implementing summary ops and using name scopes in TensorFlow for visualization. The code defines name scopes for cross entropy and accuracy to help with visualizing the graph in TensorBoard. The merged op is obtained by calling TF.summary.merge_all and this op is used when running the session. The code also defines a test writer and a trained writer for writing the resulting summaries to separate files. For each step, the code trains the model and writes the summaries. If the number of steps mod 10 is equal to zero, the summary is written to the test writer, and for all other steps, it is written to the trained writer.

  • 00:20:00 In this section, the speaker discusses the method for running TensorBoard to visualize events in both the test writer and train writer simultaneously. The command "s about - - clock there is equal to the path" helps to visualize everything, and the current host is localhost colon six zero zero six, which can be navigated via the browser. Additionally, the TF dot summary dot image has a name tensor max outputs by default, and it interprets a four-dimensional tensor with the shape path size height width and channels. The max outputs argument determines how many offset back images will be rendered and shown on TensorBoard. Overall, TensorBoard is a crucial tool to track training progress, and the API for adding various ops and merging them is straightforward.
Tensorflow for Deep Learning Research - Lecture 5_2
Tensorflow for Deep Learning Research - Lecture 5_2
  • 2017.04.03
  • www.youtube.com
This is the second part of the fifth lecture in the series of tutorials on tensorflow and is based on the publicly available slides from the Stanford Univers...
 

Intuition Behind Backpropagation as a Computational Graph



Intuition Behind Backpropagation as a Computational Graph

The intuition behind backpropagation as a computational graph is explained in this video. The speaker discusses how a surrogate function is used to estimate the empirical function that maps inputs to outputs, and that the goal is to find the parameters that minimize the loss function. Backpropagation allows for the computation of the gradient of the loss function with respect to each parameter through a backward pass of the graph. The local gradients for each gate in the graph are calculated, and they can be used to calculate the gradient of the final output with respect to each input. The speaker also explains how to handle gradients for branching and vectorized operations and how to ensure that dimensionality works out when calculating derivatives.

  • 00:00:00 this section, backpropagation enables the training of neural networks to converge quickly, and understanding it in depth can be beneficial for researchers. The lecture begins by explaining that the goal of a neural network is to find the function that maps inputs to outputs, denoted by F(x), which is an empirical function that might not be discoverable. The lecture outlines that a surrogate function G(x,θ) is used to estimate F, where θ denotes the neural network parameters, and the objective is to find the parameters that minimize a loss function J. The lecture then discusses how the backpropagation technique allows the network to find the optimal parameters by computing the gradient of the loss function with respect to each parameter through a backward pass of the graph.

  • 00:05:00 In this section, the speaker discusses finding the derivative of G with respect to theta, which is crucial in training a neural network. The example used is a simple function of X, Y, and Z, where the partial derivative of F with respect to X, Y, and Z must be calculated. The concept of local gradients is introduced, which are the derivatives of the output with respect to the input for a given operation. However, the chain rule is needed to compute the final output with respect to the distant variable, which in this case are X, Y, and Z.

  • 00:10:00 In this section, we see how to find the gradients in the computational graph using the backward flow of the gradient. The output node's gradient is trivially 1, and to calculate the gradient at each node, we multiply the local gradient with the gradient received from the previous node. The program can then calculate the final gradients of del F by Del X, del F by Del Y, and del F by Del V. It's crucial to keep in mind that we are always calculating the gradient with respect to the final loss function, which in the case of a neural network, is the J loss function. Finally, we learn how a neuron in a neural network performs a simple function during the forward pass and how, during backpropagation, we can locally calculate and use the gradients.

  • 00:15:00 In this section, the speaker explains the intuition behind backpropagation as a computational graph for a neuron. The local gradient is obtained by multiplying the backward function with the del Z by Del X and del Z by Del Y values. The backward flow goes in the direction of the output of the neuron, where other neurons that are sending data to it get Del J by Del X. The computational graph technique is also used for more complex functions such as the sigmoid function where the backward gradient is calculated numerically. The local gradient for each node is calculated by taking the derivative of the function, and then multiplying it with the gradient flow.

  • 00:20:00 In this section, the speaker explains how to calculate the local gradients of each gate in a computational graph and how they can be used to obtain the gradient of the final output with respect to each input. The example used in the video involves a small computational graph consisting of multiple gates, such as add and multiply gates, and the speaker explains the intuition behind the backpropagation algorithm by using this example. The speaker details how to calculate the local gradients for each gate in the graph and shows how they can be combined to calculate the gradient of the final output with respect to each input.

  • 00:25:00 In this section, the speaker discusses the intuition behind backpropagation as a computational graph. They explain how gradients are calculated for different types of gates, such as add gates and multiplicative gates, and the pattern that emerges when calculating backward flow. The speaker also explains how to define forward and backward functions for each gate, and once defined, these functions can be used for any computational graph, making the process more efficient.

  • 00:30:00 In this section, the concept of forward and backward passes in backpropagation is discussed as a computational graph. The forward pass calculates the output that becomes the input for the next node until the loss is reached. In the backward pass, DZ is initially set as one and calculations are made in the reverse direction. When dealing with vectors, the local gradient is taken using Jacobian matrices and left multiplication is used in the chain rule. Del L by Del X or Del L by Del Y is calculated using the two Jacobian matrices, del Z by Del X or del Z by Del Y and del L by Del V.

  • 00:35:00 In this section, the speaker explains how to calculate the forward pass and backpropagation of a given function using the computational graph. Using an example function, the speaker walks through the process of creating intermediate variables and calculating the forward pass. Then, the speaker shows how to calculate the backward pass by calculating the gradients with respect to each gate's input and output. The speaker emphasizes the usefulness of keeping in mind the derivatives of the sigmoid function.

  • 00:40:00 In this section, the speaker explains how to handle the gradients for branching and vectorized operations in backpropagation. When branching occurs, gradients are added, and when computing the gradients for functions, previous intermediate gradients can be reused. Vectorized operations, such as matrix multiplication, can be handled similarly by computing the gradients with respect to each element separately and using matrix operations to combine them. The dimensionality of the gradients is determined by the dimensionality of the output of the operation being performed.

  • 00:45:00 In this section, the speaker explains how to ensure that the dimensionality works out when calculating del D/del W and del D/del X in a function, using the example of matrix multiplication in TensorFlow. The dimensions of W and DD are known, and since the only way to make the multiplication work is to use W dot dot dot X, it is a good idea to memorize this formula. The speaker reminds viewers that understanding the computational graph operations in TensorFlow is important, even if frameworks like TensorFlow abstract away such complexity. This knowledge will prove useful when dealing with custom layers, which require the user to write their forward and backward passes.
Intuition Behind Backpropagation as a Computational Graph
Intuition Behind Backpropagation as a Computational Graph
  • 2017.03.13
  • www.youtube.com
Here I go into details on how to visualize backpropagation as a computational graph. As part of my other tutorials on tensorflow, we have discussed as to how...
 

Productionalizing deep learning for Computer Vision



Productionalizing deep learning for Computer Vision

The CTO and Chief Scientist at Jumio, Lavash Patel, discusses how their company uses a mix of AI and ID experts to establish trust online and verify the authenticity of ID documents. The process of identity verification is challenging due to the variety of IDs and subtypes, as well as the need for rectification and rejection of non-readable images. To maintain accuracy, a human-in-the-loop approach is taken, where AI models detect issues and a human does a sanity check on the results. Patel also discusses how Jumio productionalizes deep learning using a hybrid active learning algorithm, which adapts to new subtypes and improves by retraining itself. Additionally, he emphasizes the importance of clean data in face recognition and maintaining PCI compliance when dealing with sensitive data for machine learning purposes.

  • 00:00:00 In this section, Lavash Patel, the CTO and Chief Scientist at Jumio, describes the company's business of establishing trust online through the mix of AI and ID experts to verify the authenticity of an ID document. The problem with this kind of verification process is made challenging by the fact that they accept traffic from all kinds of channels, including mobile phones, webcams, and simple API calls. To tackle these problems, Jumio uses a mix of classification and Facemash models to help verify the authenticity of an ID and the identity of the person in front of the camera.

  • 00:05:00 In this section, the speaker discusses the challenges of using AI for identity verification, given the variety of IDs and subtypes that exist. The importance of rectification, or aligning images, is emphasized, as well as the need to reject non-readable images due to blur or glare. Approving ID images is a more stringent process than rejection, as every component must pass to be considered valid. In order to maintain strict accuracy and rejection of fraudulent users, a human-in-the-loop approach is taken where AI models are used to detect issues and a human is used to do a sanity check on the results. This approach allows for industry-leading accuracy in terms of conversion and fraud detection.

  • 00:10:00 In this section, the speaker discusses the classification aspect of productionalizing deep learning for computer vision. While image classification has been solved for the past few years using a pre-trained network like Inception Version 4 or Viji 16, the problem is slightly different due to the aggregate tags instead of granular tags. With over 100 million tagged images, discovering the different classes has become a morphed class discovery problem, which active learning can solve. Hybrid active learning involves a step of unsupervised learning or clustering built-in, starting with the most popular classes with labeled examples to get a handful of classes.

  • 00:15:00 In this section, the speaker explains how to productionalize deep learning for computer vision using a hybrid active learning algorithm. The algorithm collects a large number of samples per class and trains a classifier to confidently classify images into 50 classes, with output confidence levels. The algorithm then automatically clusters the images that the classifier is not confident in, and a human agent reviews them to add new classes as necessary. Once the model is created, it is deployed as an API, and if there are compliance regulations, logging, monitoring and health checks may be added. Additionally, the speaker notes that the algorithm adapts to new subtypes and continues to improve by retraining itself, as it did when a new ID with incorrect spelling was detected, and the human agent rejected it.

  • 00:20:00 In this section, the speaker discusses how they productionize new models by always having a champion model in place and evaluating new models based on their performance compared to the champion. This method also allows for continuous improvement using a hybrid active learning infrastructure that continuously harvests and cleans data sets. The production pipeline is sensitive to the cleanliness of data sets, with a target error rate for training to be no more than 20%. The entire process must also be PCI and GDP compliant, meaning that everything must be encrypted and brought to the data instead of data to the training. The speaker then presents a second case study on face matching, where they use data biases on age, gender, and ethnicity to their advantage in matching selfies to IDs.

  • 00:25:00 In this section, the speaker discusses a possible approach to tackling the problem of detecting fraudulent users through the use of face embeddings and the triplet loss function in a deep convolution network. The process involves cropping out faces using a face detector and using public datasets to train the model, followed by fine-tuning on production data. The model is fine-tuned through the process of active learning, which involves using a supervised infrastructure to harvest informative samples and continually augment the data. The speaker emphasizes the effectiveness of active learning across a wide range of use cases.

  • 00:30:00 In this section, the speaker emphasizes the importance of clean data in computer vision, especially in face recognition. He stresses that productionizing deep learning involves much more than just modeling and even simple problems like classification can end up having numerous complexities in the production stage. Collaborative intelligence or human-in-the-loop workflows can also be helpful in improving algorithms and are especially important in face recognition since it requires clean data sets. The speaker also mentioned that their company is hiring for their R&D teams based in Vienna and Montreal.

  • 00:35:00 In this section, the speaker discusses the challenges of maintaining PCI compliance when dealing with sensitive data for machine learning and AI purposes. The speaker explains that the data must only be used for its intended purpose, and that highly secure locations and procedures must be in place to prevent any unauthorized access. The speaker also explains that the data is put in Amazon S3 buckets within the PCI DMZ, and that obfuscated images are created for machine learning purposes, which are then remotely supervised to ensure that there is no leakage of personal data.
Productionalizing deep learning for Computer Vision
Productionalizing deep learning for Computer Vision
  • 2018.10.16
  • www.youtube.com
In this conference talk, I explore how we use deep learning algorithms for smarter data extraction, fraud detection, and risk scoring to continuously improve...
Reason: