Quantitative trading - page 11

 

6. Regression Analysis



6. Regression Analysis

In this comprehensive video, we delve into the topic of regression analysis, exploring its significance in statistical modeling. Linear regression takes center stage as we discuss its goals, the setup of the linear model, and the process of fitting a regression model. To ensure a solid foundation, we begin by explaining the assumptions underlying the distribution of residuals, including the renowned Gauss-Markov assumptions. Moreover, we introduce the generalized Gauss-Markov theorem, which provides a method for estimating the covariance matrix in regression analysis.

We emphasize the importance of incorporating subjective information in statistical modeling and accommodating incomplete or missing data. Statistical modeling should be tailored to the specific process being analyzed, and we caution against blindly applying simple linear regression to all problems. The ordinary least squares estimate for beta is explained, along with the normalization equations, the hat matrix, and the Gauss-Markov theorem for estimating regression parameters. We also cover regression models with nonzero covariances between components, allowing for a more flexible and realistic approach.

To further expand our understanding, we explore the concept of multivariate normal distributions and their role in solving for the distribution of the least squares estimator, assuming normally distributed residuals. Topics such as the moment generating function, QR decomposition, and maximum likelihood estimation are covered. We explain how the QR decomposition simplifies the least squares estimate and present a fundamental result about normal linear regression models. We define the likelihood function and maximum likelihood estimates, highlighting the consistency between least squares and maximum likelihood principles in normal linear regression models.

Throughout the video, we emphasize the iterative steps involved in regression analysis. These steps include identifying the response and explanatory variables, specifying assumptions, defining estimation criteria, applying the chosen estimator to the data, and validating the assumptions. We also discuss the importance of checking assumptions, conducting influence diagnostics, and detecting outliers.

In summary, this video provides a comprehensive overview of regression analysis, covering topics such as linear regression, Gauss-Markov assumptions, generalized Gauss-Markov theorem, subjective information in modeling, ordinary least squares estimate, hat matrix, multivariate normal distributions, moment generating function, QR decomposition, and maximum likelihood estimation. By understanding these concepts and techniques, you'll be well-equipped to tackle regression analysis and utilize it effectively in your statistical modeling endeavors.

  • 00:00:00 In this section, the professor introduces the topic of regression analysis, which is covered today, and its importance in statistical modeling. The methodology, particularly linear regression, is powerful and widely used in finance and other disciplines that do applied statistics. The professor discusses the various goals of regression analysis, including extracting/exploiting the relationship between independent and dependent variables, prediction, causal inference, approximation, and uncovering functional relationships/validating functional relationships among variables. Furthermore, the linear model is set up from a mathematical standpoint, and the lecture covers ordinary least squares, the Gauss-Markov theorem, and formal models with normal linear regression models, followed by extensions to broader classes.

  • 00:05:00 In this section, the concept of linear regression analysis is explored, where a linear function models the conditional distribution of a response variable given independent variables. The regression parameters are used to define the relationship, and residuals describe the uncertainty or error in the data. Furthermore, polynomial approximation and Fourier series can be applied to provide a complete description, especially for cyclical behavior. The key steps for fitting a regression model involve proposing a model based on the scale of the response variable and identifying key independent variables. It's worth noting that these independent variables can include different functional forms and lag values of the response variable, making the set up relatively general.

  • 00:10:00 In this section, the speaker discusses the steps involved in regression analysis. Firstly, one needs to identify the response of the explanatory variables and specify the assumptions underlying the distribution of the residuals. Secondly, one needs to define a criterion for how to judge different estimators of the regression parameters, with several options available. Thirdly, the best estimator needs to be characterized and applied to the given data. Fourthly, one must check their assumptions, which can lead to modifications to the model and assumptions, if necessary. Finally, the speaker emphasizes the importance of tailoring the model to the process being modeled and not applying simple linear regression to all problems. The section ends with a discussion of the assumptions that can be made for the residual distribution in a linear regression model, with the normal distribution being a common and familiar starting point.

  • 00:15:00 In this section, the speaker explains the Gauss-Markov assumptions used in regression analysis, which focus on the means and variances of the residuals. The assumptions include a zero mean, constant variance, and uncorrelated residuals. The speaker also discusses generalized Gauss-Markov assumptions that involve matrix-valued or vector-valued random variables. The speaker demonstrates how the covariance matrix characterizes the variance of the n-vector, and provides examples using mu and y values.

  • 00:20:00 In this section, the generalized Gauss-Markov theorem is introduced as a way of estimating the covariance matrix in regression analysis. The theorem allows for a general covariance matrix with nonzero covariances between the independent variables, the dependent variables, and the residuals, and assumes that they can be correlated. Nonlinear examples of why residuals might be correlated in regression models are discussed, as well as the use of various distribution types beyond the Gaussian distribution in fitting regression models to extend applicability. The lecture then covers the estimation criterion for regression parameters and various methods used to judge what qualifies as a good estimate, including least squares, maximum likelihood, robust methods, Bayes methods, and accommodation for incomplete or missing data.

  • 00:25:00 In this section, the speaker discusses the importance of incorporating subjective information in statistical modeling and the usefulness of Bayes methodologies in appropriate modeling. He also emphasizes the need to accommodate incomplete or missing data by using statistical models. Additionally, the speaker explains how to check assumptions in regression models by analyzing the residuals to determine if the Gauss-Markov assumptions apply. He also mentions the importance of influence diagnostics and outlier detection in identifying cases that might be highly influential or unusual, respectively. Finally, he introduces the concept of ordinary least squares and the least squares criterion to calculate the sum of square deviations from the actual value of the response variable.

  • 00:30:00 In this section, we learn about regression analysis and how to solve for the ordinary least squares estimate for beta. We use matrices, taking the y vector, the n values of the independent variable and X, the matrix of values of the dependent variable, to define the fitted value, y hat, equal to the matrix x times beta. By taking the cross product of the n-vector minus the product of the X matrix times beta, which yields the ordinary least squares estimates for beta, we can solve for the second derivative of Q with respect to beta, which ends up being X.transpose X, a positive definite or semi-definite matrix. Finally, we define the derivative of Q with respect to the regression parameters as minus twice the j-th column stacked times y.

  • 00:35:00 In this section, the concept of normal equations in regression modeling is introduced. The set of equations must be satisfied by the ordinary least squares estimate, beta. With the assistance of matrix algebra, the equation can be solved, and the solution for beta hat assumes that X transpose X inverse exists. To have X transpose X inverse, X must have full rank, indicating that having independent variables that are explained by other independent variables would result in reduced rank. It is discovered that if beta hat doesn't have full rank, our least squares estimate of beta may not be unique.

  • 00:40:00 In this section on regression analysis, the hat matrix is introduced as a projection matrix which takes the linear vector of the response variable into fitted values. Specifically, it's an orthogonal projection matrix that projects onto the column space of X. The residuals are the difference between the response value and the fitted value, and can be expressed as y minus y hat, or I_n minus H times y. It turns out that I_n minus H is also a projection matrix which projects the data onto the space orthogonal to the column space of x. This is important to keep in mind because it helps to represent the n-dimensional vector y by projection onto the column space, and to understand that the residuals are orthogonal to each of the columns of X.

  • 00:45:00 In this section, the Gauss-Markov theorem is introduced as a powerful result in linear models theory that is useful for estimating a function of regression parameters by considering a general target of interest, which is a linear combination of the betas. The theorem states that the least squares estimates are unbiased estimators of the parameter theta and provides a way to show that these estimates have the smallest variance among all linear unbiased estimators, assuming certain conditions are met. The concept of unbiased estimators is also briefly explained.

  • 00:50:00 In this section, the speaker discusses the Gauss-Markov theorem, which states that if the Gauss-Markov assumptions apply, then the estimator theta has the smallest variance amongst all linear unbiased estimators of theta. This means that the least squares estimator is the optimal estimator for theta as long as this is the criteria. The proof for this theorem is based on considering another linear estimate which is also an unbiased estimate and evaluating the difference between the two estimators which must have an expectation of 0. The mathematical argument for the proof includes a decomposition of the variance and keeping track of the covariance terms. This result is where the term BLUE estimates or the BLUE property of the least squares estimates comes from in econometrics class.

  • 00:55:00 In this section, the video discusses the regression model with nonzero covariances between the components and how the data Y, X can be transformed to Y star and X star to satisfy the original Gauss-Markov assumptions, making the response variables have constant variance and be uncorrelated. The video explains that with response values that have very large variances, these generalized least squares discount those by the sigma inverse. The video then delves into distribution theory for normal regression models, assuming residuals are normal with mean 0 and variance sigma squared and that the response variables will have a constant variance, though not be identically distributed because they have different means for the dependent variable.

  • 01:00:00 In this section, the concept of multivariate normal distribution is discussed with respect to the mean vector and covariance matrix. The goal is to solve for the distribution of the least squares estimator assuming normally distributed residuals. The moment generating function is introduced as a way to derive the joint distribution of Y and beta hat. For multivariate normal distributions, the moment generating function for Y is the product of the individual moment generating functions, with the distribution of Y being a normal with mean mu and covariance matrix sigma. The moment generating function for beta hat is solved for in order to determine its distribution, which is a multivariate normal.

  • 01:05:00 In this section, the speaker discusses the moment generating function of beta hat and how it is equivalent to a multivariate normal distribution with mean the true beta and covariance matrix given by a certain object. The marginal distribution of each of the beta hats is given by a univariate normal distribution with mean beta_j and variance equal to the diagonal, which can be proven from the Gaussian moment generating function. The speaker then moves on to discuss the QR decomposition of X, which can be achieved through a Gram-Schmidt orthonormalization of the independent variables matrix. By defining the upper triangular matrix R and solving for Q and R through the Gram-Schmidt process, we can express any n by p matrix as a product of an orthonormal matrix Q and an upper triangular matrix R.

  • 01:10:00 In this section, the QR decomposition and its application in simplifying the least squares estimate is discussed. By using the Gram-Schmidt process to orthogonalize columns of X, the QR decomposition can be calculated to obtain a simple linear algebra operation to solve for least squares estimates. The covariance matrix of beta hat equals sigma squared X transpose X inverse, and the hat matrix is simply Q times Q transpose. The distribution theory is further explored to provide a fundamental result about normal linear regression models.

  • 01:15:00 In this section, the professor discusses an important theorem for any matrix A, m by n, which can transform a random vector y into a random normal vector. The theorem proves that the least squares estimate beta hat and residual vector epsilon hat are independent random variables while constructing such statistics. The distribution of beta hat is multivariate normal while the sum of squared residuals is a multiple of a chi-squared random variable. The regression parameter estimates and t statistics are also discussed. Maximum likelihood estimation is also explained in the context of normal linear regression models. It turns out that ordinary least squares estimate are maximum likelihood estimates.

  • 01:20:00 In this section, the likelihood function and maximum likelihood estimates are defined. The likelihood function is the density function for the data given the unknown parameters of a multivariate normal random variable, and the maximum likelihood estimates determine the values of these parameters that make the observed data most likely. It is noted that using least squares to fit models is consistent with applying the maximum likelihood principle to a normal linear regression model. Additionally, generalized M estimators are briefly mentioned as a class of estimators used for finding robust and quantile estimates of regression parameters.
6. Regression Analysis
6. Regression Analysis
  • 2015.01.06
  • www.youtube.com
MIT 18.S096 Topics in Mathematics with Applications in Finance, Fall 2013View the complete course: http://ocw.mit.edu/18-S096F13Instructor: Peter KempthorneT...
 

7. Value At Risk (VAR) Models



7. Value At Risk (VAR) Models

The video provides an in-depth discussion on the concept of value at risk (VAR) models, which are widely used in the financial industry. These models employ probability-based calculations to measure potential losses that a company or individual may face. By using a simple example, the video effectively illustrates the fundamental concepts behind VAR models.

VAR models serve as valuable tools for individuals to assess the probability of losing money through investment decisions on any given day. To understand the risk associated with investments, investors can analyze the standard deviation of a time series. This metric reveals how much the average return has deviated from the mean over time. By valuing a security at the mean plus or minus one standard deviation, investors can gain insights into the security's risk-adjusted potential return.

The video highlights that VAR models can be constructed using different approaches. While the video primarily focuses on the parametric approach, it acknowledges the alternative method of employing Monte Carlo simulation. The latter approach offers increased flexibility and customization options, allowing for more accurate risk assessments.

Furthermore, the video explores the creation of synthetic data sets that mirror the properties of historical data sets. By employing this technique, analysts can generate realistic scenarios to evaluate potential risks accurately. The video also demonstrates the application of trigonometry in describing seasonal patterns observed in temperature data, showcasing the diverse methods employed in risk analysis.

In addition to discussing VAR models, the video delves into risk management approaches employed by banks and investment firms. It emphasizes the significance of understanding the risk profile of a company and safeguarding against excessive concentrations of risk.

Overall, the video offers valuable insights into the utilization of VAR models as risk assessment tools in the finance industry. By quantifying risks associated with investments and employing statistical analysis, these models assist in making informed decisions and mitigating potential financial losses.

  • 00:00:00 In this video, Ken Abbott discusses risk management approaches used by banks and investment firms. He first discusses risk, and goes on to discuss how risk management involves understanding the risk profile of the company, and protecting against concentrations of risk that are too large.

  • 00:05:00 Value at risk models are a way to estimate risk associated with specific investments, and can be used to help make informed decisions about which ones to own. These models are based on a statistical understanding of how stocks, bonds, and derivatives behave, and can be used to quantify how sensitive an investor is to changes in interest rates, equity prices, and commodity prices.

  • 00:10:00 The video explains that VAR models are used to measure risk and to determine how much money an investor needs to hold to support a position in a given market. The video also provides an overview of time series analysis, which is used to understand the behavior of markets over time.

  • 00:15:00 The video discusses the concept of value at risk (VAR), which is a financial model that uses probability to measure the potential losses a company may experience. The video uses a simple example to illustrate the concepts.

  • 00:20:00 Value at risk (VAR) models help individuals assess the probability of losing money on any given day through investment decisions. The standard deviation of a time series tells investors how much the average return has deviated from the mean over time. Valuing a security at the mean plus or minus one standard deviation gives an idea of the security's risk-adjusted potential return.

  • 00:25:00 Value at Risk (VAR) models allow for the identification of scenarios in which an investment could lose more than 4.2% of its value over a five year period. This information can be helpful in determining whether an investment is likely to be profitable or not.

  • 00:30:00 This video explains how value at risk (VAR) models work and how they help to mitigate risk. The concepts introduced include percentage changes and log changes, and the use of PV1 and durations to measure risk. The video also covers the use of VAR models in the finance industry.

  • 00:35:00 This video discusses the concept of value at risk (VAR), which is a risk management tool that calculates the potential financial loss that a company or individual may experience due to volatility in its assets. Yields are also discussed, and it is explained that they are composed of risk-free rates and credit spreads. The presenter provides an example of how VAR can be used to estimate the potential financial loss that a company may experience due to changes in its asset prices.

  • 00:40:00 This video discusses value at risk models, which measure risk in financial markets. covariance and correlation are two measures of risk, and covariance matrices are symmetric, with the variance on the diagonal and covariance on the off diagonal. Correlations are also symmetric, and can be calculated using covariance divided by the product of the standard deviations.

  • 00:45:00 The video discusses the concept of value at risk (VAR), which is used to measure the risk of financial losses associated with a portfolio of assets. The video explains that VAR can be calculated using a covariance matrix and a correlation matrix. The covariance matrix measures the degree of correlation between assets, while the correlation matrix measures the degree of correlation between assets and liabilities. The video then presents an example of how VAR can be calculated using a covariance matrix and a correlation matrix.

  • 00:50:00 Value at risk (VAR) models are a way to measure the risk associated with a financial investment. The model uses data from returns and covariance to calculate the position vector and order statistic. This is then used to determine the risk level of the investment.

  • 00:55:00 This video provides the key points of a 7-slide presentation on value-at-risk models. These models are used to calculate the probability of a financial loss, given that certain conditions are met. Missing data can be a problem, and various methods are available to fill in the gaps. The presentation also discusses how the impact of an assumption can have a material impact on the results of a model.

  • 01:00:00 The video discusses the value at risk (VAR) models. The model uses a parametric approach, but there is another method using Monte Carlo simulation. This method is more flexible and allows for more customization.

  • 01:05:00 Value at Risk (VAR) models are used to estimate the potential for a financial loss due to fluctuations in asset prices. These models can be used to quantify the risk associated with a particular investment or portfolio.

  • 01:10:00 In this video, the author discusses the importance of value at risk (VAR) models, explaining that these models help ensure that a company does not experience a negative eigenvalue. He goes on to say that, if you have a thousand observations, you need to fill in the missing data using a process called "missing data imputation." Finally, John demonstrates how to create a transformation matrix that will correlate random normals.

  • 01:15:00 In this video, the presenter explains how to create models that simulate the outcomes of investments, using Monte Carlo simulation. He also discusses how to use a Gaussian copula to generate more accurate models.

  • 01:20:00 The video explains how synthetic data sets can be created to have the same properties as historical data sets. It also demonstrates how trigonometry can be used to describe seasonal patterns in temperature data.
7. Value At Risk (VAR) Models
7. Value At Risk (VAR) Models
  • 2015.01.06
  • www.youtube.com
MIT 18.S096 Topics in Mathematics with Applications in Finance, Fall 2013View the complete course: http://ocw.mit.edu/18-S096F13Instructor: Kenneth AbbottThi...
 

8. Time Series Analysis I


8. Time Series Analysis I

In this video, the professor begins by revisiting the maximum likelihood estimation method as the primary approach in statistical modeling. They explain the concept of likelihood function and its connection to normal linear regression models. Maximum likelihood estimates are defined as values that maximize the likelihood function, indicating how probable the observed data is given these parameter values.

The professor delves into solving estimation problems for normal linear regression models. They highlight that the maximum likelihood estimate of the error variance is Q of beta hat over n, but caution that this estimate is biased and needs correction by dividing it by n minus the rank of the X matrix. As more parameters are added to the model, the fitted values become more precise, but there is also a risk of overfitting. The theorem states that the least squares estimates, now maximum likelihood estimates, of regression models follow a normal distribution, and the sum of squares of residuals follows a chi-squared distribution with degrees of freedom equal to n minus p. The t-statistic is emphasized as a crucial tool for assessing the significance of explanatory variables in the model.

Generalized M estimation is introduced as a method for estimating unknown parameters by minimizing the function Q of beta. Different estimators can be defined by choosing different forms for the function h, which involves the evaluation of another function. The video also covers robust M estimators, which utilize the function chi to ensure good properties with respect to estimates, as well as quantile estimators. Robust estimators help mitigate the influence of outliers or large residuals in least squares estimation.

The topic then shifts to M-estimators and their wide applicability in fitting models. A case study on linear regression models applied to asset pricing is presented, focusing on the capital asset pricing model. The professor explains how stock returns are influenced by the overall market return, scaled by the stock's risk. The case study provides data and details on how to collect it using the statistical software R. Regression diagnostics are mentioned, highlighting their role in assessing the influence of individual observations on regression parameters. Leverage is introduced as a measure to identify influential data points, and its definition and explanation are provided.

The concept of incorporating additional factors, such as crude oil returns, into equity return models is introduced. The analysis demonstrates that the market alone does not efficiently explain the returns of certain stocks, while crude oil acts as an independent factor that helps elucidate returns. An example is given with Exxon Mobil, an oil company, showing how its returns correlate with oil prices. The section concludes with a scatter plot indicating influential observations based on the Mahalanobis distance of cases from the centroid of independent variables.

The lecturer proceeds to discuss univariate time series analysis, which involves observing a random variable over time as a discrete process. They explain the definitions of strict and covariance stationarity, with covariance stationarity requiring the mean and covariance of the process to remain constant over time. Autoregressive moving average (ARMA) models, along with their extension to non-stationarity through integrated autoregressive moving average (ARIMA) models, are introduced. Estimation of stationary models and tests for stationarity are also covered.

The Wold representation theorem for covariance stationary time series is discussed, stating that such a time series can be decomposed into a linearly deterministic process and a weighted average of white noise with coefficients given by psi_i. The white noise component, eta_t, has constant variance and is uncorrelated with itself and the deterministic process. The Wold decomposition theorem provides a useful framework for modeling such processes.

The lecturer explains the Wold decomposition method of time series analysis, which involves initializing the parameter p (representing the number of past observations) and estimating the linear projection of X_t based on the last p lag values. By examining the residuals using time series methods, such as assessing orthogonality to longer lags and consistency with white noise, one can determine an appropriate moving average model. The Wold decomposition method can be implemented by taking the limit of the projections as p approaches infinity, converging to the projection of the data on its history and corresponding to the coefficients of the projection definition. However, it is crucial for the ratio of p to the sample size n to approach zero to ensure an adequate number of degrees of freedom for model estimation.

The importance of having a finite number of parameters in time series models is emphasized to avoid overfitting. The lag operator, denoted as L, is introduced as a fundamental tool in time series models, enabling the shifting of a time series by one time increment. The lag operator is utilized to represent any stochastic process using the polynomial psi(L), which is an infinite-order polynomial involving lags. The impulse response function is discussed as a measure of the impact of an innovation at a certain time point on the process, affecting it at that point and beyond. The speaker provides an example using the interest rate change by the Federal Reserve chairman to illustrate the temporal impact of innovations.

The concept of the long-run cumulative response is explained in relation to time series analysis. This response represents the accumulated effect of one innovation in the process over time and signifies the value towards which the process is converging. It is calculated as the sum of individual responses captured by the polynomial psi(L). The Wold representation, which is an infinite-order moving average, can be transformed into an autoregressive representation using the inverse of the polynomial psi(L). The class of autoregressive moving average (ARMA) processes is introduced with its mathematical definition.

The focus then turns to autoregressive models within the context of ARMA models. The lecture begins with simpler cases, specifically autoregressive models, before addressing moving average processes. Stationarity conditions are explored, and the characteristic equation associated with the autoregressive model is introduced by replacing the polynomial function phi with the complex variable z. The process X_t is deemed covariance stationary if all the roots of the characteristic equation lie outside the unit circle, implying that the modulus of the complex z is greater than 1. Roots outside the unit circle must have a modulus greater than 1 to ensure stationarity.

In the subsequent section of the video, the concept of stationarity and unit roots in an autoregressive process of order one (AR(1)) is discussed. The characteristic equation of the model is presented, and it is explained that covariance stationarity requires the magnitude of phi to be less than 1. The variance of X in the autoregressive process is shown to be greater than the variance of the innovations when phi is positive and smaller when phi is negative. Additionally, it is demonstrated that an autoregressive process with phi between 0 and 1 corresponds to an exponential mean-reverting process, which has been employed in interest rate models in finance.

The video progresses to focus specifically on autoregressive processes, particularly AR(1) models. These models involve variables that tend to revert to some mean over short periods, with the mean reversion point potentially changing over long periods. The lecture introduces the Yule-Walker equations, which are employed to estimate the parameters of ARMA models. These equations rely on the covariance between observations at different lags, and the resulting system of equations can be solved to obtain the autoregressive parameters. The Yule-Walker equations are frequently utilized to specify ARMA models in statistical packages.

The method of moments principle for statistical estimation is explained, particularly in the context of complex models where specifying and computing likelihood functions become challenging. The lecture proceeds to discuss moving average models and presents formulas for the expectations of X_t, including mu and gamma 0. Non-stationary behavior in time series is addressed through various approaches. The lecturer highlights the importance of accommodating non-stationary behavior to achieve accurate modeling. One approach is transforming the data to make it stationary, such as through differencing or applying Box-Jenkins' approach of using the first difference. Additionally, examples of linear trend reversion models are provided as a means of handling non-stationary time series.

The speaker further explores non-stationary processes and their incorporation into ARMA models. If differencing, either first or second, yields covariance stationarity, it can be integrated into the model specification to create ARIMA models (Autoregressive Integrated Moving Average Processes). The parameters of these models can be estimated using maximum likelihood estimation. To evaluate different sets of models and determine the orders of autoregressive and moving average parameters, information criteria such as the Akaike or Bayes information criterion are suggested.

The issue of adding additional variables to the model is discussed, along with the consideration of penalties. The lecturer emphasizes the need to establish evidence for incorporating extra parameters, such as evaluating t-statistics that exceed a certain threshold or employing other criteria. The Bayes information criterion assumes a finite number of variables in the model, assuming they are known, while the Hannan-Quinn criterion assumes an infinite number of variables but ensures their identifiability. Model selection is a challenging task, but these criteria provide useful tools for decision-making.

In conclusion, the video covers various aspects of statistical modeling and time series analysis. It begins by explaining maximum likelihood estimation and its relation to normal linear regression models. The concepts of generalized M estimation and robust M estimation are introduced. A case study applying linear regression models to asset pricing is presented, followed by an explanation of univariate time series analysis. The Wold representation theorem and the Wold decomposition method are discussed in the context of covariance stationary time series. The importance of a finite number of parameters in time series models is emphasized, along with autoregressive models and stationarity conditions. The video concludes by addressing autoregressive processes, the Yule-Walker equations, the method of moments principle, non-stationary behavior, and model selection using information criteria.

  • 00:00:00 In this section, the professor reviews the maximum likelihood estimation method as the primary estimation method in statistical modeling while discussing the likelihood function and its relation to normal linear regression models. The professor explains that maximum likelihood estimates are values that maximize the function whereby the observed data is most likely, and these values scale the unknown parameters in terms of how likely they could have generated the data values.

  • 00:05:00 In this section, the professor discusses how to solve the estimation problems for normal linear regression models. The maximum likelihood estimate of the error variance is Q of beta hat over n, but this estimate is biased and needs to be corrected by dividing by n minus the rank of the X matrix. The more parameters added to the model, the more precise the fitted values are, but it also increases the danger of curve-fitting. The theorem states that the least squares, now the maximum likelihood estimates, of regression models are normally distributed, and the residuals' sum of squares has a chi-squared distribution with degrees of freedom given by n minus p. The t-statistic is a critical way to assess the relevance of different explanatory variables in the model.

  • 00:10:00 In this section, the video explains the concept of generalized M estimation, which involves estimating unknown parameters by minimizing the function Q of beta. By choosing different functional forms for h, which is a sum of evaluations of another function, different kinds of estimators can be defined such as least squares and maximum likelihood estimation. The video also discusses robust M estimators, which involve defining the function chi to have good properties with estimates, and quantile estimators. Robust estimators help to control the undue influence of very large values or residuals under least squares estimation.

  • 00:15:00 In this section, the professor discusses M-estimators and how they encompass most estimators encountered in fitting models. The class is introduced to a case study that applies linear regression models to asset pricing. The capital asset pricing model is explained to suggest that stock returns depend on the return of the overall market, scaled by how risky the stock is. The case study provides the necessary data and details to collect it using R. The professor mentions regression diagnostics and how they determine the influence of individual observations on regression parameters. Finally, influential data points are identified using leverage, and the definition and explanation are given.

  • 00:20:00 In this section, the professor introduces the concept of adding another factor, such as the return on crude oil, in modeling equity returns to help explain returns. The analysis shows that the market, in this case study, was not efficient in explaining the return of GE; crude oil is another independent factor that helps explain returns. On the other hand, Exxon Mobil, an oil company, has a regression parameter that shows how crude oil definitely has an impact on its return since it goes up and down with oil prices. The section ends with a scatter plot that indicates influential observations associated with the Mahalanobis distance of cases from the centroid of the independent variables.

  • 00:25:00 In this section, the lecturer introduces the topic of univariate time series analysis, which involves observing a random variable over time and is a discrete time process. The definition of strict and covariance stationarity is explained, with covariance stationarity being weaker and requiring that only the mean and covariance of the process remain constant over time. Classic models of autoregressive moving average models and their extensions to non-stationarity with integrated autoregressive moving average models are also discussed, along with how to estimate stationary models and test for stationarity.

  • 00:30:00 In this section of the video, the speaker discusses the Wold representation theorem for covariance stationary time series. The theorem states that a zero-mean covariance stationary time series can be decomposed into two components: a linearly deterministic process and a weighted average of white noise with coefficients given by psi_i. The speaker also explains that eta_t, the white noise element, has constant variance and is uncorrelated with itself and the deterministic process. The Wold decomposition theorem provides a compelling structure for modeling such processes.

  • 00:35:00 In this section, the Wold decomposition method of time series analysis is discussed. This method involves initializing the parameter p, which represents the number of past observations in the linearly deterministic term, and estimating the linear projection of X_t on the last p lag values. By conducting time series methods to analyze the residuals, such as evaluating whether the residuals are orthogonal to longer lags and consistent with white noise, one can specify a moving average model and evaluate its appropriateness. The Wold decomposition method can be implemented as the limit of the projections as p gets large, converging to the projection of the data on its history and corresponding to the coefficients of the projection definition. However, the p/n ratio needs to approach 0 to avoid running out of degrees of freedom when estimating models.

  • 00:40:00 In this section, the speaker emphasizes the importance of having a finite number of parameters while estimating time series models because it helps to avoid overfitting. The lag operator is a crucial tool in time series models where a time series is shifted back by one time increment using the operator L. Any stochastic process can be represented using the lag operator with psi of L, which is an infinite-order polynomial of the lags. The impulse response function relates to the impact of the innovation at a certain point in time that affects the process at that point and beyond. The speaker uses an example of the Federal Reserve chairman's interest rate change to help explain the impact of innovation over time.

  • 00:45:00 In this section, the concept of long-run cumulative response is discussed in relation to time series analysis. The long-run cumulative response is the impact of one innovation in a process over time, and the value to which the process is moving. This response is given by the sum of individual responses, represented by the polynomial of psi with a lag operator. The Wold representation is an infinite-order moving average that can have an autoregressive representation using an inverse of the psi of L polynomial. The class of autoregressive moving average processes, with mathematical definition, is also introduced to the viewer.

  • 00:50:00 In this section, the focus is on autoregressive models in ARMA models. To better understand these models, simpler cases will be looked at, starting with autoregressive models and moving on to moving average processes. Stationarity conditions will also be explored, where the polynomial function phi, if replaced by a complex variable z, will be the characteristic equation associated with the autoregressive model. The process of X_t is covariance stationary if and only if all the roots of this characteristic equation lie outside the unit circle, meaning that the modulus of the complex z is greater than 1, and the roots, if outside the unit circle, have modulus greater than 1.

  • 00:55:00 In this section of the video, the concept of stationarity and unit roots in an autoregressive process of order one is discussed. The characteristic equation of the model is presented, and it is determined that covariance stationarity requires the magnitude of phi to be less than 1 in magnitude. The variance of X in the autoregressive process is shown to be greater than the variance of the innovations when phi is positive and smaller when phi is less than 0. Furthermore, it is demonstrated that an autoregressive process with phi between 0 and 1 corresponds to an exponential mean-reverting process that has been used theoretically for interest rate models in finance.

  • 01:00:00 In this section, the focus is on autoregressive processes, specifically AR(1) models. These models involve variables that typically return to some mean over short periods of time, but the mean reversion point can change over long periods of time. The lecture explains the Yule-Walker equations, which are used to estimate the parameters of the ARMA models. These equations involve the covariance between observations at different lags, and the resulting system of equations can be solved for the autoregressive parameters. Finally, it is noted that the Yule-Walker equations are frequently used to specify ARMA models in statistics packages.

  • 01:05:00 In this section, the method of moments principle for statistical estimation is explained, particularly in complex models where likelihood functions are difficult to specify and compute, and using unbiased estimates of parameters. The moving average model is then discussed, with formulas for the expectations of X_t, which include mu and gamma 0, calculated. Accommodations for non-stationary behavior in time series are also discussed, particularly through transformation of data into stationary, Box and Jenkins' approach of using the first difference, and examples of linear trend reversion models.

  • 01:10:00 In this section, the speaker discusses non-stationary processes and how to incorporate them into ARMA models. He explains that if first or second differencing results in covariance stationarity, it can be incorporated into the model specification to create ARIMA models, or Autoregressive Integrated Moving Average Processes. The parameters for these models can be specified using maximum likelihood, and different sets of models and orders of autoregressive and moving average parameters can be evaluated using information criteria such as the Akaike or Bayes information criterion.

  • 01:15:00 In this section, the speaker discusses the issue of adding extra variables in the model and what penalty should be given. He suggests that it is necessary to consider what evidence should be required to incorporate extra parameters such as t statistics that exceed some threshold or other criteria. The Bayes information criterion assumes there is a finite number of variables in the model and that we know them, while the Hannan-Quinn criterion assumes an infinite number of variables in the model but ensures they are identifiable. The problem of model selection is challenging but can be solved using these criteria.
8. Time Series Analysis I
8. Time Series Analysis I
  • 2015.01.06
  • www.youtube.com
MIT 18.S096 Topics in Mathematics with Applications in Finance, Fall 2013View the complete course: http://ocw.mit.edu/18-S096F13Instructor: Peter KempthorneT...
 

9. Volatility Modeling



9. Volatility Modeling

This video provides an extensive overview of volatility modeling, exploring various concepts and techniques in the field. The lecturer begins by introducing the autoregressive moving average (ARMA) models and their relevance to volatility modeling. The ARMA models are used to capture the random arrival of shocks in a Brownian motion process. The speaker explains that these models assume the existence of a process, pi of t, which represents a Poisson process counting the number of jumps that occur. The jumps are represented by random variables, gamma sigma Z_1 and Z_2, following a Poisson distribution. The estimation of these parameters is carried out using the maximum likelihood estimation through the EM algorithm.

The video then delves into the topic of model selection and criteria. Different model selection criteria are discussed to determine the most suitable model for a given dataset. The Akaike information criterion (AIC) is presented as a measure of how well a model fits the data, penalizing models based on the number of parameters. The Bayes information criterion (BIC) is similar but introduces a logarithmic penalty for added parameters. The Hannan-Quinn criterion provides an intermediate penalty between the logarithmic and linear terms. These criteria aid in selecting the optimal model for volatility modeling.

Next, the video addresses the Dickey-Fuller test, which is a valuable tool to assess whether a time series is consistent with a simple random walk or exhibits a unit root. The lecturer explains the importance of this test in detecting non-stationary processes, which can pose challenges when using ARMA models. The problems associated with modeling non-stationary processes using ARMA models are highlighted, and strategies to address these issues are discussed.

The video concludes by presenting an application of ARMA models to a real-world example. The lecturer demonstrates how volatility modeling can be applied in practice and how ARMA models can capture time-dependent volatility. The example serves to illustrate the practical relevance and effectiveness of volatility modeling techniques.

In summary, this video provides a comprehensive overview of volatility modeling, covering the concepts of ARMA models, the Dickey-Fuller test, model selection criteria, and practical applications. By exploring these topics, the video offers insights into the complexities and strategies involved in modeling and predicting volatility in various domains, such as financial markets.

  • 00:00:00 The author discusses the volatility model and how it can help in the estimation of a statistical model. The author notes that there are various model selection criteria that can be used to determine which model is best suited for a given set of data.

  • 00:05:00 The Akaike information criterion is a measure of how well a model fits data, and it is penalizing models by a factor that depends on the size of the model's parameters. The Bayes information criterion is similar, but has a log n penalty for added parameters. The Hannan-Quinn criterion has a penalty midway between log n and two. The Dickey-Fuller test is a test to see if a time series is consistent with a simple random walk.

  • 00:10:00 This video provides an overview of volatility modeling, including the concepts of autoregressive moving average (ARMA) models and the Dickey-Fuller test. The video then goes on to discuss the problems that can occur when a non-stationary process is modeled using ARMA models and how to deal with these issues. Finally, the video provides an application of ARMA models to a real-world example.

  • 00:15:00 This video provides a brief introduction to volatility modeling, including a discussion of the ACF and PACF functions, the Dickey-Fuller test for unit roots, and regression diagnostics.

  • 00:20:00 Volatility is a measure of the variability of prices or returns in financial markets. Historical volatility is computed by taking the difference in the logs of prices over a given period of time. Volatility models are designed to capture time-dependent volatility.

  • 00:25:00 Volatility is a measure of how much a security's price changes over time. Volatility can be measured by the square root of the sample variance, and can be converted to annualized values. Historical volatility can be estimated using risk metrics approaches.

  • 00:30:00 Volatility models can be used to predict future stock prices, and the geometric Brownian motion is a common model used. Choongbum will go into more detail about stochastic differential equations and stochastic calculus in later lectures.

  • 00:35:00 The volatility model is a mathematical model that predicts the price of a security over time. The model uses a Gaussian distribution to calculate the price over a given period of time. When the time scale is changed, the model needs to be adjusted.

  • 00:40:00 Volatility modeling can produce different results based on how time is measured. For example, under a Geometric Brownian Motion model, daily returns are sampled from a Gaussian distribution, while under a normal model, the percentiles of the fitted Gaussian distribution are plotted. In either case, the cumulative distribution function of the fitted model should be centered around the actual percentile.

  • 00:45:00 The Garman-Klass estimator is a model for estimating volatility that takes into account more information than just closing prices. It assumes that the increments are one for daily, corresponding to daily, and that the time of day at which the market opens (represented by little f) is taken into account.

  • 00:50:00 This volatility model calculates the variance of open-to-close returns and the efficiency of this estimate relative to the close-to-close estimate.

  • 00:55:00 The volatility model is a stochastic differential equation that models the volatility of a financial asset. The paper by Garman and Klass found that the best scale-invariant estimator is an estimate that changes only by a scale factor, and that this estimator has an efficiency of 8.4.

  • 01:00:00 This video covers volatility modeling, which is a way to deal with the random arrival of shocks to a Brownian motion process. The model assumes that there is a process pi of t, which is a Poisson process that counts the number of jumps that have occurred. These jumps are represented by gamma sigma Z_1 and Z_2, which are random variables with a Poisson distribution. The maximum likelihood estimation of these parameters is done using the EM algorithm.

  • 01:05:00 The "9. Volatility Modeling" video covers the EM algorithm and ARCH models, which are used to model time-dependent volatility. ARCH models allow for time dependence in volatility, while still maintaining parameter constraints. This model is used to estimate euro/dollar exchange rates.

  • 01:10:00 Volatility modeling is the process of estimating the underlying process that drives stock prices. This involves fitting an autoregressive model to the squared residuals, and testing for ARCH structure. If there is no ARCH structure, then the regression model will have no predictability.

  • 01:15:00 The GARCH model is a simplified representation of the volatility of a given asset's squared returns. The model is able to fit data quite well, and has properties that suggest a time dependence in the volatility.

  • 01:20:00 This video discusses the benefits of using volatility models compared to other models in forecasting. GARCH models are shown to be particularly effective in capturing time-varying volatility. The last day to sign up for a field trip is next Tuesday.
9. Volatility Modeling
9. Volatility Modeling
  • 2015.01.06
  • www.youtube.com
MIT 18.S096 Topics in Mathematics with Applications in Finance, Fall 2013View the complete course: http://ocw.mit.edu/18-S096F13Instructor: Peter KempthorneT...
 

10. Regularized Pricing and Risk Models



10. Regularized Pricing and Risk Models

In this comprehensive video, the topic of regularized pricing and risk models for interest rate products, specifically bonds and swaps, is extensively covered. The speaker begins by addressing the challenge of ill-posedness in these models, where even slight changes in inputs can result in significant outputs. To overcome this challenge, they propose the use of smooth basis functions and penalty functions to control the smoothness of the volatility surface. Tikhonov regularization is introduced as a technique that adds a penalty to the amplitude, reducing the impact of noise and improving the meaningfulness of the models.

The speaker delves into various techniques employed by traders in this field. They discuss spline techniques and principal component analysis (PCA), which are used to identify discrepancies in the market and make informed trading decisions. The concept of bonds is explained, covering aspects such as periodic payments, maturity, face value, zero-coupon bonds, and perpetual bonds. The importance of constructing a yield curve to price a portfolio of swaps with different maturities is emphasized.

Interest rates and pricing models for bonds and swaps are discussed in detail. The speaker acknowledges the limitations of single-number models for predicting price changes and introduces the concept of swaps and how traders quote bid and offer levels for the swap rate. The construction of a yield curve for pricing swaps is explained, along with the selection of input instruments for calibration and spline types. The process of calibrating swaps using a cubic spline and ensuring they reprice at par is demonstrated using practical examples.

The video further explores the curve of three-month forward rates and the need for a fair price that matches market observables. The focus then shifts to trading spreads and determining the most liquid instruments. The challenges of creating a curve that is insensitive to market changes are discussed, highlighting the significant costs associated with such strategies. The need for improved hedging models is addressed, with a new general formulation for portfolio risk presented. Principal component analysis is utilized to analyze market modes and scenarios, enabling traders to hedge using liquid and cost-effective swaps.

Regularized pricing and risk models are explored in-depth, emphasizing the disadvantages of the PCA model, such as instability and sensitivity to outliers. The benefits of translating risk into more manageable and liquid numbers are highlighted. The video explains how additional constraints and thoughts about the behavior of risk matrices can enhance these models. The use of B-splines, penalty functions, L1 and L2 matrices, and Tikhonov regularization is discussed as means to improve stability and reduce pricing errors.

The speaker addresses the challenges of calibrating a volatility surface, providing insights into underdetermined problems and unstable solutions. The representation of the surface as a vector and the use of linear combinations of basis functions are explained. The concept of ill-posedness is revisited, and the importance of constraining outputs using smooth basis functions is emphasized.

Various techniques and approaches are covered, including truncated singular value decomposition (SVD) and fitting functions using spline techniques. The interpretation of interpolation graphs and their application in calibrating and arbitraging market discrepancies are explained. Swaptions and their role in volatility modeling are discussed, along with the opportunities they present for traders.

The video concludes by highlighting the relevance of regularized pricing and risk models in identifying market anomalies and facilitating informed trading decisions. It emphasizes the liquidity of bonds and the use of swaps for building curves, while also acknowledging the reliance on PCA models in the absence of a stable curve. Overall, the video provides a comprehensive understanding of regularized pricing and risk models for interest rate products, equipping viewers with valuable knowledge in this domain.

  • 00:00:00 In this section, Dr. Ivan Masyukov, a guest speaker from Morgan Stanley, discusses regularized pricing and risk models for interest rate products, which involves adding additional constraints, also known as regularizers, to the model. The lecture starts with an explanation of bonds, one of the simplest interest rate products on the market, and covers their periodic payments, maturity, and face value. Zero-coupon bonds, which pay nothing until maturity, and perpetual bonds, which offer infinite payment, are also discussed. The lecture concludes with the explanation of the cash flow diagram used for analysis, with green arrows indicating something received, and red arrows indicating something paid.

  • 00:05:00 In this section, the concept of time value of money is introduced, where the more in the future a cash flow is, the smaller the discount factor, resulting in depreciation. A fair value of computed cash flows can be found if we have discount factors, which can be represented using a model for discounting. A simple model using one parameter, the yield to maturity, is discussed. The price of a bond can be represented as a linear combination of future cash flows, and the bond yield can be found by solving for it if the bond price is known, or vice versa.

  • 00:10:00 In this section, the concept of bond pricing versus yield is discussed. The economic value of bonds is in the bond price and cash flows. Yield correlates future cash flows with bond price and assumes constant discounting for all time points, however, it might not always be optimal. The sensitivity of the bond price to yield and how it changes with the market is vital in determining a bond's duration. The duration of a bond is a weighted sum formula of time and proportional to present values of future cash flows. The relationship between yield and bond price has a negative sign and duration for a zero-coupon bond equals maturity, while regular coupon bonds' duration is less than maturity. The model for bond duration assumes all rates move in a parallel way.

  • 00:15:00 In this section, the speaker discusses interest rates and pricing models for bonds and swaps. They acknowledge that a single number model might not be adequate for predicting price changes, and suggest using second derivatives to account for unexplained losses. With regards to swaps, the speaker explains how traders quote bid and offer levels for the most important quantity of a swap, the swap rate, using the present value of fixed and floating cash flows. They also note that entering a swap does not require any exchange of money, and that the fixed rate is set so that the present value of fixed minus floating cash flows is net to zero.

  • 00:20:00 In this section, the concept of swap rates as a weighted sum of forward rates is explained, with the weights determined by discounting factors. The video explains the need for constructing a yield curve to price an entire portfolio of swaps with various maturities, as well as the process of selecting input instruments for calibration and spline type. The final step is adjusting control points to ensure that when the instruments are repriced using the mathematical object, the results match the market prices.

  • 00:25:00 In this section, Ivan Masyukov explains how a cubic spline is used to build a smooth curve, in which the functional form of the shape of the curve is a cubic polynomial, while maintaining the maximum number of derivatives for every node point. B-splines are introduced as a new type of spline that can be represented as a linear combination of basis functions, enabling any curve with those node points to be represented. Masyukov then goes on to explain how to calibrate swaps using a solver to ensure that they reprice at par. This is demonstrated using the example of yield curve instruments, and IRS swaps with maturities from one to 30 years and quotes of 0.33% up to 2.67%.

  • 00:30:00 In this section, Ivan Masyukov explains how the curve of three-month forward rate, which is mostly driven by the LIBOR rate for the three-month frequency of payment on the floating leg of standard interest rate USD swap, is not flat and is steep for the first five years and reaches a plateau later with some feature in the 20-year region. As the curve cannot be obtained on the assumption that there is just one parameter yield for everything, they need some extra term to get a fair price and match the market observables. The extra term will be a small correction to the yield curve rather than a rough assumption that the curve is flat. This approach is better for having a consistent model for bonds and swaps in our portfolio and understanding bond liquidity and credit spreads.

  • 00:35:00 In this section, the focus shifts towards how spreads are traded and which instruments are considered the most liquid. It is revealed that the bond is the most liquid option, while the spread between the ten-year swap and bond is the second most liquid option. This shift merits reliability when creating a curve as a small change in inputs can cause large variations in outputs, which is a cause of concern to traders. A typical situation is one in which a trader would like the value of their model to be insensitive to changes in the market, for this they would need to buy as many one-year swaps as plus 200, as many two-year swaps as minus 1.3, and so on. However, it could be expensive, costing around 3.6 million dollars, and proportional to the bid-offer of particular instruments.

  • 00:40:00 In this section, the need for a better model of hedging is discussed, as the current method of hedging for traders is not effective. A new general formulation for portfolio risk is presented characterized by the vectors of portfolio risk, the hedging portfolio, and the weights of that portfolio. Principal component analysis is used to approach the problem and analyze the market's typical modes and scenarios, whereby traders pick liquid and cheap swaps to hedge against. A graph of the typical principal components is presented, with the main behavior of the market being that rates currently do not move, but will move in the future due to the Federal Reserve's stimulation.

  • 00:45:00 In this section, the speaker discusses regularized pricing and risk models, specifically the disadvantages of the PCA model. The PCA model is formulated using hedging instruments to eliminate the need for minimizing, but the coefficients are not very stable, especially for recent modes in the market. Additionally, the model is sensitive to outliers, and can result in overfitting to historical data, making it risky to assume they will work for the future. The advantages of the model include being able to translate risk into fewer, more liquid numbers that are orders of magnitude smaller than before, allowing traders to make informed decisions.

  • 00:50:00 In this section, the video talks about regularized pricing and risk models, and how putting additional constraints or thoughts about the behavior of risk matrices can improve the situation. The speaker explains the PCA interpretation of the risk matrix and how it is a linear combination of principal components, producing a shift on one hedging instrument at a time. They also discuss an approach that goes beyond historical data and builds yield curves in terms of forward rates to minimize non-smoothness by penalizing equations where Jacobian is a matrix translating shifts of yield curve inputs. The video also highlights how the pricing engine and calibration process work using the HJM model to price volatility.

  • 00:55:00 In this section, the speaker explains the equations of evolution of forward rates necessary for Monte Carlo simulation, where forward rates are the quantity that is being assimilated. The speaker discusses the drift of forward rates, which has some dependence on the forward rates to the power of beta. The volatility surface is introduced, which gives the number of volatility to use for the calendar and forward time, and the correlation and factor structure are briefly mentioned. The speaker explains that the triangular surface is used for the transition's volatility for every arrow and shows an example of the volatility surface. The problem lies in computing the triangle matrix, which has a dimension of 240 by 240, needed up to 60 years of data, making it a challenging task.

  • 01:00:00 In this section of the video, the speaker explains how to approach the issue of calibrating a volatility surface. Since the number of elements to be calibrated is large, a formal solution storing a matrix of 28K by 28K is not practical. Additionally, since there are fewer calibration instruments than elements to be calibrated, it is an underdetermined problem that produces unstable solutions. To solve this, they represent the surface as a vector and use a linear combination of basis functions that corresponds to reasonable functions with the same number of basis functions as input instruments. While it calibrates perfectly, the resulting surface looks less like a volatility surface and more like Manhattan skyline with Hudson River and building shapes. This approach is commonly used but produces unstable results.

  • 01:05:00 In this section of the video, the speaker discusses the issue of ill-posedness in pricing and risk models, which means that small changes in inputs can lead to drastic changes in outputs. To address this, they suggest putting constraints on outputs using basis functions that are smooth to begin with, such as B-splines, and using penalty functions to control the change and smoothness of the volatility surface. By doing so, they can produce meaningful results without having to calibrate exactly to every input instrument. The speaker demonstrates how basis functions can be constructed in two dimensions and combined using linear combinations.

  • 01:10:00 In this section, the speaker discusses the concept of regularized pricing and risk models. The speaker explains that L1 and L2 matrices consisting of values of 1 and -1 can be used to penalize the gradient of a vector if a smoothness approach is desired. To solve an ill-posed problem where small noise and insignificant modes can cause substantial changes in the output, the Tikhonov regularization technique can be employed. The technique involves adding a penalty to the amplitude to reduce the impact of noise. The speaker highlights that because there is always uncertainty in the numbers being calibrated and the model is not always perfect, regularization is necessary to minimize pricing errors.

  • 01:15:00 In this section, the concept of regularized pricing and risk models is discussed. Tikhonov regularization is introduced as a method for improving stability in ill-conditioned problems. By penalizing the amplitude or a linear combination of the solution, regularization can provide a more meaningful and realistic result, albeit possibly with a biased solution. Truncated SVD is another approach that can be used to select only the significant singular values, resulting in a more robust model. The key is to identify and penalize the specific quantity that needs regularization, rather than blindly applying a textbook approach.

  • 01:20:00 In this section, Ivan Masyukov answers questions from the audience about the techniques used for fitting functions, particularly spline techniques. He explains that a spline or interpolation is used when there are a limited number of inputs and you want to draw in between. He also discusses the interpretation of the interpolation graph and how traders use it to calibrate and arbitrage any discrepancies they see. Additionally, he explains how swaptions are used in modeling for volatility and how traders make trades out of any discrepancies they see.

  • 01:25:00 In this section, the speaker discusses the regularized pricing and risk models utilized by market traders to find anomalies in the market and take advantage of them through trades. These models can incorporate inputs such as smoothness assumptions about the forward rates or combinations of principal component analysis (PCA). While bonds are the most liquid instrument in the market, they are not continuously traded, making swaps more suitable for building a curve. Once the swap curve is built, bond traders use it for hedging because bonds are more liquid than swaps. However, traders who only trade bonds often rely on PCA models or other methods due to the lack of a stable curve.
10. Regularized Pricing and Risk Models
10. Regularized Pricing and Risk Models
  • 2015.01.06
  • www.youtube.com
MIT 18.S096 Topics in Mathematics with Applications in Finance, Fall 2013View the complete course: http://ocw.mit.edu/18-S096F13Instructor: Ivan MasyukovThis...
 

11. Time Series Analysis II


11. Time Series Analysis II

This video delves into various aspects of time series analysis, building upon the previous lecture's discussion on volatility modeling. The professor begins by introducing GARCH models, which offer a flexible approach for measuring volatility in financial time series. The utilization of maximum likelihood estimation in conjunction with GARCH models is explored, along with the use of t distributions as an alternative for modeling time series data. The approximation of t-distributions with normal distributions is also discussed. Moving on to multivariate time series, the lecture covers cross-covariance and Wold decomposition theorems. The speaker elucidates how vector autoregressive processes simplify higher order time series models into first-order models. Furthermore, the computation of the mean for stationary VAR processes and their representation as a system of regression equations is discussed.

The lecture then delves deeper into the multivariate regression model for time series analysis, emphasizing its specification through separate univariate regression models for each component series. The concept of the vectorizing operator is introduced, demonstrating its utility in transforming the multivariate regression model into a linear regression form. The estimation process, including maximum likelihood estimation and model selection criteria, is also explained. The lecture concludes by showcasing the application of vector autoregression models in analyzing time series data related to growth, inflation, unemployment, and the impact of interest rate policies. Impulse response functions are employed to comprehend the effects of innovations in one component of the time series on other variables.

Additionally, the continuation of volatility modeling from the previous lecture is addressed. ARCH models, which allow for time-varying volatility in financial time series, are defined. The GARCH model, an extension of the ARCH model with additional parameters, is highlighted for its advantages over the ARCH model, offering greater flexibility in modeling volatility. The lecturer emphasizes that GARCH models assume Gaussian distributions for the innovations in the return series.

Furthermore, the implementation of GARCH models using maximum likelihood estimation is explored. The ARMA model for squared residuals can be expressed as a polynomial lag of innovations to measure conditional variance. The square root of the long-run variance is determined by ensuring that the roots of the operator lie outside the unit circle. Maximum likelihood estimation involves establishing the likelihood function based on the data and unknown parameters, with the joint density function represented as the product of successive conditional expectations of the time series. These conditional densities follow normal distributions.

The challenges associated with estimating GARCH models, primarily due to constraints on the underlying parameters, are discussed. To optimize a convex function and find its minimum, it is necessary to transform the parameters to a range without limitations. After fitting the model, the residuals are evaluated using various tests to assess normality and analyze irregularities. An R package called rugarch is used to fit the GARCH model for the euro-dollar exchange rate, employing a normal GARCH term after fitting the mean process for exchange rate returns. The order of the autoregressive process is determined using the Akaike information criterion, and a normal quantile-quantile plot of autoregressive residuals is produced to evaluate the model.

The lecturer also highlights the use of t distributions, which offer a heavier-tailed distribution compared to Gaussian distributions, for modeling time series data. GARCH models with t distributions can effectively estimate volatility and compute value-at-risk limits. The t distribution serves as a good approximation to a normal distribution, and the lecturer encourages exploring different distributions to enhance time series modeling. In addition, the approximation of t-distributions with normal distributions is discussed. The t-distribution can be considered a reasonable approximation of a normal distribution when it has 25-40 degrees of freedom. The lecturer presents a graph comparing the probability density functions of a standard normal distribution and a standard t-distribution with 30 degrees of freedom, demonstrating that the two distributions are similar but differ in the tails.

In the lecture, the professor continues to explain the analysis of time series data using vector autoregression (VAR) models. The focus is on understanding the relationship between variables and the impact of innovations on the variables of interest. To analyze the relationships between variables in a VAR model, the multivariate autocorrelation function (ACF) and partial autocorrelation function (PACF) are used. These functions capture the cross-lags between the variables and provide insights into the dynamic interactions among them. By examining the ACF and PACF, one can identify the significant lags and their effects on the variables. Furthermore, the impulse response functions (IRFs) are employed to understand the effects of innovations on the variables over time. An innovation refers to a shock or unexpected change in one of the variables. The IRFs illustrate how the variables respond to an innovation in one component of the multivariate time series. This analysis helps in understanding the propagation and magnitude of shocks throughout the system.

For example, if an innovation in the unemployment rate occurs, the IRFs can show how this shock affects other variables such as the federal funds rate and the consumer price index (CPI). The magnitude and duration of the response can be observed, providing insights into the interdependencies and spillover effects within the system. In addition to the IRFs, other statistical measures such as forecast error variance decomposition (FEVD) can be utilized. FEVD decomposes the forecast error variance of each variable into the contributions from its own shocks and the shocks of other variables. This analysis allows for the quantification of the relative importance of different shocks in driving the variability of each variable. By employing VAR models and analyzing the ACF, PACF, IRFs, and FEVD, researchers can gain a comprehensive understanding of the relationships and dynamics within a multivariate time series. These insights are valuable for forecasting, policy analysis, and understanding the complex interactions among economic variables.

In summary, the lecture emphasizes the application of VAR models to analyze time series data. It highlights the use of ACF and PACF to capture cross-lags, IRFs to examine the impact of innovations, and FEVD to quantify the contributions of different shocks. These techniques enable a deeper understanding of the relationships and dynamics within multivariate time series, facilitating accurate forecasting and policy decision-making.

  • 00:00:00 In this section, the professor discusses the continuation of volatility modeling in the previous lecture by addressing the definition of ARCH models that admit time-varying volatility in financial time series. The GARCH model, an extension of the ARCH model via additional parameters, has many more advantages over the ARCH model and has fewer parameters. By adding the extra parameter that relates the current volatility to the past or lagged value, the GARCH model can be flexible in modeling volatility. The lower bound on volatility is present in the ARCH model, causing this model to have a hard lower bound, while GARCH models have a much more flexible advantage in predicting volatility levels. It should be noted that in these fits, we are assuming Gaussian distributions for the innovations in the return series.

  • 00:05:00 In this section, the topic is GARCH models and their implementation using maximum likelihood estimation. With GARCH models, we can measure volatility and express the ARMA model for the squared residuals as a polynomial lag of innovations. For the conditional variance, we can determine the square root of the long-run variance by requiring that the roots of the operator have roots outside the unit circle. Maximum likelihood estimation requires determining the likelihood function of the data given the unknown parameters, and the joint density function can be expressed as the product of successive conditional expectations of the time series. These conditional densities are normal random variables.

  • 00:10:00 In this section, the speaker discusses the challenge of estimating GARCH models due to constraints on the underlying parameters, which need to be enforced. In order to optimize a convex function and find the minimum of a convex function, the optimization methods work well and transforming the parameters to a scale where they're unlimited in range is necessary. After fitting the model, the residuals need to be evaluated with various tests for normality and analyzing the magnitude of irregularities. With R package called rugarch, the GARCH model for euro-dollar exchange rate with a normal GARCH term is chosen and fit after fitting the mean process for exchange rate returns. In order to evaluate the model, the autoregressive process is fit using the Akaike information criterion to choose the order of the autoregressive process and produce a normal q-q plot of autoregressive residuals.

  • 00:15:00 In this section, the presenter discusses the use of a heavier-tailed distribution, specifically the t distribution, for modeling time series data. When compared to a Gaussian distribution, the t distribution better accommodates the high and low values of residuals. The presenter shows how GARCH models with t distributions can estimate volatility similarly to GARCH models with Gaussian distributions, and they can be used to compute value at risk limits. Overall, the t distribution can be a good approximation to a normal distribution, and the presenter encourages exploring different distributions to better model time series data.

  • 00:20:00 In this section, the professor discusses the approximation of the t-distribution with a normal distribution. Typically, a t-distribution can be considered a good approximation of a normal distribution with 25-40 degrees of freedom. The professor shows a graph comparing the probability density functions for a standard normal distribution and a standard t-distribution with 30 degrees of freedom. The graph demonstrates that the two distributions are very close but differ in the tails of the distribution. The t-distribution has heavier tail distributions than a normal distribution. The professor also discusses volatility clustering and the GARCH model's ability to handle it. In addition, the professor notes that returns have heavier tails than Gaussian distributions, and the homework covers how the GARCH model can handle this.

  • 00:25:00 In this section, the GARCH model and its usefulness for modeling financial time series are discussed. The GARCH model is appropriate for modeling covariance stationary time series, where the volatility measure is a measure of the squared excess return and is essentially a covariance stationary process with a long-term mean. GARCH models are great at describing volatility relative to the long-term average, and in terms of their usefulness for prediction, they predict that volatility will revert back to the mean at some rate. The rate at which volatility reverts back is given by the persistence parameter, which can be measured by alpha_1 plus beta_1. The larger alpha_1 plus beta_1 is, the more persistent volatility is. There are many extensions of the GARCH models, and in the next topic, multivariate time series, the multivariate Wold representation theorem will be discussed.

  • 00:30:00 In this section, we learn about multivariate time series, which involves the extension of univariate time series to model multiple variables that change over time. We extend the definition of covariance stationarity to finite and bounded first and second-order moments, where an M-dimensional valued random variable is treated as M different time series. For the variance-covariance matrix of the t-th observation of the multivariate process, we define gamma_0, which is expected value of X_t minus mu times X_t minus mu prime. The correlation matrix, r_0, is then obtained by pre-and post-multiplying the covariance matrix gamma_0 by a diagonal matrix with the square roots of the diagonal of this matrix.

  • 00:35:00 In this section, the concept of cross-covariance matrices was introduced, which looks at how the current values of a multivariate time series covary with the k-th lag of those values. Gamma_k, the current period vector values, is covaried with the k-th lag of those values. The properties of these matrices were explained, with the diagonal of gamma_0 being the covariance matrix of diagonal entries of variances. The existence of the Wold decomposition theorem, an advanced theorem, was also mentioned, which extends the univariate Wold decomposition theorem. This theorem is useful in identifying judgments of causality between variables in economic time series.

  • 00:40:00 In this section, the concept of Wold decomposition representation for a covariance stationary process is introduced. The process is represented as the sum of a deterministic process and a moving average process of a white noise. In a multivariate case, the deterministic process could be a linear or exponential trend, and the white noise process is an m-dimensional vector with mean 0 and a positive semi-definite variance/covariance matrix. The innovation is the disturbance about the modeled process that cannot be predicted by previous information. The sum of the terms in the covariance matrix must converge for the process to be covariance stationary.

  • 00:45:00 In this section, the Wold decomposition is discussed as a way to represent the bits of information that affect the process and were not available before. The section then moves on to discuss vector autoregressive processes, which model how a given component of the multivariate series depends on other variables, or components of the multivariate series. The concept of re-expressing a p-th order process as a first-order process with vector autoregressions is then explained, which is a powerful technique used in time series methods to simplify the analysis of complicated models.

  • 00:50:00 In this section, the speaker discusses the representation of a multivariate stochastic process using Z_t and Z_(t-1) vectors and how it can be transformed into a first-order time series model with a larger multivariate series. The process is stationary if all eigenvalues of the companion matrix A have a modulus less than 1, which ensures that the process won't have explosive behavior when it increments over time. This requirement is the same as all roots of the polynomial equation being outside the unit circle. The order of the polynomial is not mentioned in this excerpt.

  • 00:55:00 In this section, the focus is on computing the mean of the stationary VAR process by taking expectations on both sides of the equation. The unconditional mean of the process is obtained by solving for mu in the second line to the third line. The vector autoregression model is expressed as a system of regression equations, consisting of m regression models corresponding to each component of the multivariate series. The m-th regression model models the j-th column of the matrix as Z beta j and epsilon j, where Z is a vector of lagged values of the multivariate process. The computation assumes that p pre-sample observations are available.

  • 01:00:00 In this section, the speaker explains the multivariate regression model for time series analysis. The model consists of a linear regression model on the lags of the entire multivariate series up to p lags with their regression parameter given by βj, which corresponds to the various elements of the phi matrices. The speaker defines the multivariate regression model and explains how to specify it by considering the univariate regression model for each component series separately. This is related to seemingly unrelated regressions in econometrics.

  • 01:05:00 In this section of the lecture, the professor discusses the estimation methods for the parameters of a linear regression and how to estimate variances and covariances of innovation terms. The process involves applying straightforward estimation methods for the parameter of a linear regression and then estimating the variances/covariances of the innovation term. A significant result is that these component-wise regressions are the optimal estimate for the multivariate regression as well. The Kronecker product operators are used in this theory, which applies to vec operators that take a matrix and stack the columns together.

  • 01:10:00 In this section, the concept of vectorizing operator is introduced and its use in manipulating terms into a more convenient form is explained. The multivariate regression model is set up using a matrix structure and is expressed in terms of linear regression form. By vectorizing the beta matrix, epsilon, and y, one can define the likelihood function in maximum likelihood estimation with these models. The unknown parameters beta star, sigma, which are equal to the joint density of this normal linear regression model, correspond to what was previously used in regression analysis with a more complicated definition of the independent variables matrix X star and the variance/covariance matrix sigma star.

  • 01:15:00 In this section, the concept of concentrated log-likelihood is discussed and it is revealed that the estimation of regression parameter beta is independent of the covariance matrix sigma. This enables the concentration of the likelihood function, which needs to be maximized while estimating the covariance matrix. The maximization is done through the log of a determinant of a matrix minus n over 2 the trace of that matrix times an estimate of it. Additionally, model selection criteria such as the Akaike Information Criterion, the Bayes Information Criterion, and Hannan-Quinn Criterion can be applied. Lastly, an example of fitting vector autoregressions with macroeconomic variables is shown, demonstrating the importance of understanding what factors affect the economy in terms of growth, inflation, unemployment, and the impact of interest rate policies.

  • 01:20:00 In this section, the speaker discusses the use of vector autoregression models to analyze time series data. The specific variables being studied are the unemployment rate, federal funds, and the CPI (a measure of inflation). The multivariate versions of the autocorrelation function and the partial autocorrelation function are used to capture the cross lags between variables in these models. The impulse response functions are then used to understand the impact of an innovation in one of the components of the multivariate time series on the other variables. This is important in understanding the connection between the moving average representation and these time series models.
11. Time Series Analysis II
11. Time Series Analysis II
  • 2015.01.06
  • www.youtube.com
MIT 18.S096 Topics in Mathematics with Applications in Finance, Fall 2013View the complete course: http://ocw.mit.edu/18-S096F13Instructor: Peter KempthorneT...
 

12. Time Series Analysis III



12. Time Series Analysis III

In this YouTube video on time series analysis, the professor covers a range of models and their applications to different scenarios. The video delves into topics such as vector autoregression (VAR) models, cointegration, and linear state-space models. These models are crucial for forecasting variables like unemployment, inflation, and economic growth by examining autocorrelation and partial autocorrelation coefficients.

The video starts by introducing linear state-space modeling and the Kalman filter, which are used to estimate and forecast time series models. Linear state-space modeling involves setting up observation and state equations to facilitate the model estimation process. The Kalman filter, a powerful tool, computes the likelihood function and provides essential terms for estimation and forecasting.

The lecturer then explains how to derive state-space representations for autoregressive moving average (ARMA) processes. This approach allows for a flexible representation of relationships between variables in a time series. The video highlights the significance of Harvey's work in 1993, which defined a particular state-space representation for ARMA processes.

Moving on, the video explores the application of VAR models to macroeconomic variables for forecasting growth, inflation, and unemployment. By analyzing autocorrelation and partial autocorrelation coefficients, researchers can determine the relationships between variables and identify patterns and correlations. The video provides a regression model example, illustrating how the Fed funds rate can be modeled as a function of lagged unemployment rate, Fed funds rate, and CPI. This example reveals that an increase in the unemployment rate tends to lead to a decrease in the Fed funds rate the following month.

The concept of cointegration is then introduced, addressing non-stationary time series and their linear combinations. Cointegration involves finding a vector beta that produces a stationary process when combined with the variables of interest. The video discusses examples such as the term structure of interest rates, purchasing power parity, and spot and futures relationships. An illustration using energy futures, specifically crude oil, gasoline, and heating oil contracts, demonstrates the concept of cointegration.

The video further explores the estimation of VAR models and the analysis of cointegrated vector autoregression processes. Sims, Stock, and Watson's work is referenced, which shows how the least squares estimator can be applied to these models. Maximum likelihood estimation and rank tests for cointegrating relationships are also mentioned. A case study on crack spread data is presented, including testing for non-stationarity using an augmented Dickey-Fuller test. Next, the video focuses on crude oil futures data and the determination of non-stationarity and integration orders. The Johansen procedure is employed to test the rank of the cointegrated process. The eigenvectors corresponding to the stationary relationship provide insights into the relationships between crude oil futures, gasoline (RBOB), and heating oil.

The lecture then introduces linear state-space models as a way to express various time series models used in economics and finance. The state equation and observation equation are explained, demonstrating the flexibility of this modeling framework. The video illustrates the representation of a capital asset pricing model with time-varying betas as a linear state-space model. By incorporating time dependence in the regression parameters, the model captures dynamic changes. Furthermore, the lecturer discusses the concept of changing regression parameters over time, assuming they follow independent random walks. The joint state-space equation and its implementation for recursively updating regressions as new data is added are explained. Autoregressive models of order P and moving average models of order Q are expressed as linear state-space models.

The lecture then delves into the state equation and observation equation, emphasizing their role in transitioning between underlying states. The derivation of the state-space representation for ARMA processes is explored, highlighting the flexibility in defining states and the underlying transformation matrix.
The lecture provides an overview of the application of linear state-space models to time series analysis. The speaker explains that these models can be used to estimate and forecast variables of interest by incorporating both observed data and underlying states. By utilizing the Kalman filter, which is a recursive algorithm, the models can compute the conditional distribution of the states given the observed data, as well as predict future states and observations.

The lecture emphasizes the importance of understanding the key components of linear state-space models. The state equation represents the transition dynamics of the underlying states over time, while the observation equation relates the observed data to the underlying states. These equations, along with the initial state distribution, define the model structure.
The lecturer proceeds to discuss the estimation process for linear state-space models. Maximum likelihood estimation is commonly used to estimate the unknown parameters of the model based on the observed data. The Kalman filter plays a crucial role in this process by computing the likelihood function, which measures the goodness of fit between the model and the data.

Moreover, the lecture highlights that linear state-space models provide a flexible framework for modeling various economic and financial phenomena. They can be used to express autoregressive models, moving average models, and even more complex models such as the capital asset pricing model with time-varying betas. This versatility makes linear state-space models a valuable tool for researchers and practitioners in economics and finance. To further illustrate the practical applications of linear state-space models, the lecture introduces a case study on crude oil futures contracts. By analyzing the relationship between the prices of different futures contracts, such as crude oil, gasoline, and heating oil, the speaker demonstrates how linear state-space models can be utilized to identify patterns, forecast prices, and assess risk in the energy market.

In summary, the video provides a comprehensive overview of linear state-space models and their applications in time series analysis. By leveraging the Kalman filter, these models enable researchers to estimate and forecast variables of interest, understand the dynamics of underlying states, and capture the complex relationships between variables. The lecture emphasizes the flexibility and usefulness of linear state-space models in various economic and financial contexts, making them a valuable tool for empirical analysis and decision-making.

  • 00:00:00 In this section, the professor introduces macroeconomic variables that can be used for forecasting growth, inflation, and unemployment in the economy and focuses on a summary of the vector autoregression fitting model. The roots of the characteristic polynomial in the model turned out to be non-stationary, indicating that a different series should be used to model this. To eliminate this non-stationarity, the professor suggests modeling first differences, which can be done by taking differences of all the series and eliminating missing values. The graph displays the time series properties of the difference series, including diagonal autocorrelation functions and cross-correlations, which are shown to be statistically significant. The partial autocorrelation function is also discussed, which involves correlations between variables and the lag of another after explaining for all lower degree lags.

  • 00:05:00 In this section, the video discusses the use of vector autoregressive models, which allow researchers to model the structural relationships between multiple macroeconomic variables. The example focuses on three variables: the unemployment rate, the Fed funds rate, and the CPI. By examining the autocorrelation and partial autocorrelation coefficients, researchers can determine the relationships between these variables and identify patterns and correlations. The video also provides a regression model for the Fed funds rate as a function of lagged unemployment rate, Fed funds rate, and CPI. This model indicates that if the unemployment rate goes up, the Fed rate is likely to go down the next month. The video emphasizes the importance of understanding the signal-to-noise ratio when estimating the autoregressive parameters and interpreting the coefficients.

  • 00:10:00 In this section of the video, the speaker introduces the concept of cointegration, which is a major topic in time series analysis that deals with non-stationary time series. The discussion starts with the context in which cointegration is relevant and focuses on stochastic processes that are integrated of some order d, meaning that the d-th difference is stationary. While taking the first differences results in stationarity, the process loses some information, and cointegration provides a framework to characterize all the available information for statistical modeling systematically. A non-stationary process can still have a vector autoregressive representation, which can be expressed as a polynomial lag of the x's equal to white noise epsilon, and reducing it to stationarity requires taking the d-th order difference.

  • 00:15:00 In this section of the video, the concept of cointegration is introduced as a way to deal with situations where linear combinations of multivariate time series may be stationary, meaning they represent the stationary features of the process. Cointegration involves finding a vector beta such that the linear weights on the x's and beta prime X_t is a stationary process. The cointegration vector can be scaled arbitrarily, but it is common practice to set the first component series of process equal to 1. This relationship arises in many ways in economics and finance, including the term structure of interest rates, purchase power parity, money demand, covered interest rate parity, the law of one price, and spot and futures. An example of energy futures is given to illustrate the concept.

  • 00:20:00 In this section, the professor discusses a time series of crude oil, gasoline, and heating oil futures contracts traded at the CME. He explains how futures prices for gasoline and heating oil should depend on the cost of the input, which is crude oil. The professor shows a plot of the prices of the futures, which represent the same units of output relative to input. He notes that while the futures for gasoline and heating oil are consistently above the crude oil input futures, they vary depending on which is greater. The difference between the price of the heating oil future and the crude oil future represents the spread in value of the output minus the input, which includes the cost of refining, supply and demand, seasonal effects, and the refinery's profit.

  • 00:25:00 In this section, the lecture discusses the vector autoregressive model of order p which extends the univariate model. The lecture explains that the autoregressive of one series depends on all the other series, which forms the multi-dimensional white noise with mean 0 and some covariance structure. The process that is being integrated of order one is also discussed, along with the derivation process that relates to differences with some extra terms. At the end, the lecture provides the equation for the difference of the series, which is equal to a constant plus a matrix multiple of the first difference multivariate series, plus another matrix times the second difference, all the way down to the p-th difference.

  • 00:30:00 In this section, the video discusses the process of eliminating non-stationarity in the time series by using lagged and differenced series. The model expresses the stochastic process model for the difference series, which is stationary. While the terms that are matrix multiples of lags are stationary, the pi X_t term contains the cointegrating terms that involve identifying the matrix, pi. Since the original series had unit roots, the matrix pi is of reduced rank, and it defines the cointegrating relationships. The columns of beta define linearly independent vectors that cointegrate x. The decomposition of pi is not unique, and by defining the coordinate system in the r-dimensional space where the process is stationary, the matrix pi can be expressed as alpha beta prime.

  • 00:35:00 In this section, the speaker discusses the estimation of vector autoregression models and the work of Sims, Stock, and Watson that shows how the least squares estimator of the original model can be used for an analysis of cointegrated vector autoregression processes. The speaker also mentions the advanced literature on estimation methods for these models, including maximum likelihood estimation, which yields tests for the rank of the cointegrating relationship. A case study on the crack spread data is also discussed, which involves testing for non-stationarity in the underlying series using an augmented Dickey-Fuller test that yields a p-value of 0.164 for CLC1, the first nearest contract.

  • 00:40:00 In this section, the presenter discusses the non-stationarity and integration order of crude oil futures data, suggesting that accommodating non-stationarity is necessary when specifying models. The results of conducting a Johansen procedure for testing the rank of the cointegrated process suggest that there is no strong non-stationarity, and the eigenvector corresponding to the stationary relationship is given by the coefficients of 1 on crude oil futures, 1.3 on RBOB, and -1.7 on heating oil. The combination of crude plus gasoline minus heating oil appears to be stationary over time, which could be useful for refiners wanting to hedge their production risks.

  • 00:45:00 In this section, the speaker introduces the topic of linear state-space models, which can be used to express many time series models used in economics and finance. The model involves an observation vector at time t, an underlying state vector, an observation error vector at time t, and a state transition innovation error vector. The speaker explains the state equation and observation equation in the model, which are linear transformations of the states and observations plus noise, and how they can be written together in a joint equation. The notation may seem complicated, but it provides a lot of flexibility in specifying the relationships between variables.

  • 00:50:00 In this section, the speaker discusses representing a capital asset pricing model with time-varying betas as a linear state-space model. The model extends the previous one by adding time dependence to the regression parameters. The alpha and beta now vary by time, with alpha being a Gaussian random walk and beta also being a Gaussian random walk. The state equation is adjusted by adding random walk terms, making s_(t+1) equal to T_t s_t plus R_t eta_t, with a complex representation in the linear state-space framework. The observation equation is defined by a Z_t matrix, which is a unit element row matrix of r_(m,t). The covariance matrix has a block diagonal structure, with the covariance of the epsilons as H, and the covariance of R_t eta_t as R_t Q_t R_t transpose. Finally, the speaker considers a second case of linear regression models where p independent variables could be time-varying.

  • 00:55:00 In this section, the concept of changing regression parameters over time in a time series is introduced, assuming that they follow independent random walks. The joint state-space equation is explained as well as the linear state-space implementation for recursively updating regressions as new data is added. Autoregressive models of order P are also discussed, outlining the structure for how the linear state-space model evolves. Finally, the moving average model of order Q is expressed as a linear state-space model.

  • 01:00:00 In this section, the lecturer discusses the state equation and observation equation, which are used to give a transition between underlying states. They use an example of an autoregressive moving average model to demonstrate how the setup for linear state-space models facilitates the process of estimating the model. The lecture goes on to explain how Harvey's work in '93 defined a particular state-space representation for the ARMA process and how there are many different equivalent linear state-space models for a given process depending on how one defines the states and the underlying transformation matrix T. Finally, the lecture goes on to derive the state-space representation for the ARMA process.

  • 01:05:00 In this section, the speaker explains how to come up with a simple model for the transition matrix T in linear state-space models by iteratively solving for the second state using the observation value and rewriting the model equation. This process replaces the underlying states with observations and leads to a transition matrix T that has autoregressive components as the first column and a vector of moving average components in the R matrix. The effectiveness of linear state-space modeling lies in the full specification with the Kalman filter, which recursively computes the probability density functions for the underlying states at t+1 given information up to time t, as well as the joint density of the future state and observation at t+1, given information up to time t, and the marginal distribution of the next observation given information up to time t. The implementation of the Kalman filter requires notation involving conditional means, covariances, and mean squared errors that are determined by omegas.

  • 01:10:00 In this section, the transcript discusses the Kalman filter, which has four steps that help predict the state vector and observation in a time series. The filter gain matrix is used to adjust the prediction of the underlying state depending on what happened and characterizes how much information we get from each observation. The uncertainty in the state at time t is reduced by minimizing the difference between what we observed and what we predicted. There is also a forecasting step, which predicts the state one period forward and updates the covariance matrix for future states given the previous state. Lastly, the smoothing step characterizes the conditional expectation of underlying states given information in the whole time series.

  • 01:15:00 In this section, the speaker introduces the Kalman filter as a tool for computing the likelihood function for linear state-space models and for successive forecasting of a process. They explain that the likelihood function is the product of the conditional distributions of each successive observation given the history of the data. The Kalman filter provides all the necessary terms for this estimation, and if the error terms are normally distributed, the means and variances of these estimates characterize the exact distributions of the process. Additionally, the Kalman filter updates the means and covariance matrices for the underlying states and the distributions of the observations.
12. Time Series Analysis III
12. Time Series Analysis III
  • 2015.01.06
  • www.youtube.com
MIT 18.S096 Topics in Mathematics with Applications in Finance, Fall 2013View the complete course: http://ocw.mit.edu/18-S096F13Instructor: Peter KempthorneT...
 

13. Commodity Models



13. Commodity Models

In this video, the speaker delves into the intricate world of commodity models, highlighting the challenges faced by quantitative analysts in this domain. They provide insightful examples, such as Trafigura's record profit in 2009, achieved through strategic crude oil purchasing and storage. The speaker discusses various strategies for bidding on storage, optimization problems, and the significance of stability and robustness in commodity models. Moreover, they explore the complexities of modeling commodity prices, focusing on the unique considerations required for power prices. The speaker suggests an alternative methodology tailored to the commodity landscape, distinguishing it from approaches used in fixed-income, foreign exchange, and equity markets.

The video commences by shedding light on the specific problems tackled by quantitative analysts in the commodity realm. An illustrative example is presented, featuring Trafigura, a company that profited immensely from the dramatic drop in oil prices in 2009. The speaker explains how futures contracts function in commodity markets, emphasizing the concepts of contango and backwardation. Contango refers to a scenario where the future spot price exceeds the current spot price, enabling traders to generate profits even during periods of price decline.

Next, the speaker delves into Trafigura's profit-making strategy between February 2009 and 2010 when crude oil prices surged from $35 to $60 per barrel. By borrowing at $35, purchasing and storing crude oil, and subsequently selling it at the higher price of $60, Trafigura achieved a remarkable profit of $25 per barrel. This strategy was employed on a massive scale, involving millions of barrels of storage, resulting in significant gains. The speaker emphasizes the need for careful strategizing in storage auctions to recover costs and generate additional profits effectively.

The video proceeds to discuss two distinct strategies for bidding on storage in commodity models. The first strategy involves traders bidding on futures contracts for August and selling them in December without the need for borrowing. The second strategy, employed by quants, entails selling the spread option between August and December contracts. This option's value is determined by the price difference between the two contracts, with positive differences yielding profits to the option owner and negative differences yielding no profit. While the second strategy is more intricate, it offers additional value to the company.

The advantages of selling a production on August 1st using a commodity model are discussed in the subsequent section. By selling the option on that specific date, the seller receives a formula-determined option value, typically higher than the current market value. This gives the seller an advantageous position during bidding, enabling them to earn a profit margin of their choice. The speaker also elucidates the calculation of option risk and how real or physical assets can be leveraged to mitigate that risk.

The video then delves into the complexity of spread options within commodity models, emphasizing the need to determine the most valuable portfolios of options while accounting for technical, contractual, legal, and environmental constraints. The speaker stresses the importance of selling option portfolios in a manner that guarantees the extraction of value upon option expiration, considering limitations on injection and withdrawal rates.

An optimization problem involving commodity models and storage is discussed in another section. The problem revolves around extracting value from a commodity option when storage capacity is exhausted, as well as selling from storage when it becomes empty. The speaker explains the variables and constraints involved in the problem and demonstrates how optimizing the portfolio through a series of options can lead to profit maximization. The problem's complexity requires the use of boolean variables and a focus on maximizing profits.

The video further delves into the challenges of commodity models, particularly those related to injection and withdrawal rates, capacity constraints, and unknown variables such as volumes and prices. These factors contribute to the non-linear nature of the problem, making it exceedingly difficult to solve when dealing with numerous variables and constraints. Several approaches, including approximation, Monte Carlo simulations, and stochastic control, can be employed to address commodity models' complexity. However, the accuracy of the results heavily relies on the precision of the parameters utilized. Even the most meticulous methodology can lead to erroneous outcomes if the parameters are incorrect.

The speaker then proceeds to discuss their chosen methodology for commodity modeling, which prioritizes robustness and stability over capturing the complete richness of price behaviors. They caution against over-parameterizing a model, as it can introduce instability, causing even slight changes to significantly impact its value. By employing a different approach, they prioritize stability and robustness, allowing for outside regulators to verify the model. Moreover, each component of the model can be traded in the market, which holds substantial importance in the current market landscape. The concept of dynamic hedging is also explained, showcasing how it can be used to replicate the value of an option and fulfill payouts without an active option market, using a simple player function.

The speaker delves deeper into the concept of replicating the payout of an option through dynamic hedging. This strategy empowers traders to sell portfolios even when there are no buyers. They emphasize the importance of developing a strategy to extract value and collaborating with storage facility operators to execute the plan successfully. The speaker explains how this approach can be extended to model physical assets, such as tankers and power plants, to maximize profits by making informed decisions based on electricity and fuel prices. While the nature of each asset may vary, the conceptual approach remains the same, necessitating a comprehensive understanding of the unique intricacies and constraints associated with each asset.

In a subsequent section, the video explores the process of calculating the cost of producing one megawatt-hour of power based on power plant efficiency. The efficiency, quantified as the heat rate measured in mm BTUs, indicates the amount of natural gas required to generate one megawatt-hour of power. The constant corresponding to a natural gas power plant typically falls between 7 to 20, with lower values indicating higher efficiency. Additional costs related to producing one megawatt-hour, such as air conditioning and labor, are also considered. The video further delves into determining the value of a power plant and constructing price and fuel cost distributions to ascertain an appropriate payment for a power plant acquisition.

The challenges of modeling commodity prices, particularly power prices, are discussed in the subsequent section. The distribution of power prices cannot be accurately modeled using Brownian motion due to the presence of fat tails and spikes in the data. Additionally, the volatility in power prices is significantly higher compared to equity markets. The lecturer emphasizes that these challenges are common across all regions and underscores the necessity of capturing mean reversion in spikes to accurately represent power price behavior. Other phenomena such as high kurtosis, regime switching, and non-stationarity also need to be incorporated into the models.

The video explores the challenges associated with modeling commodity prices, highlighting various approaches including mean reversion, jumps, and regime switching. However, these models tend to be complex and challenging to manage. Instead, the speaker proposes a unique methodology specifically tailored to the commodity domain, distinct from methodologies employed in fixed-income, foreign exchange, and equity markets. This approach is better aligned with the characteristics and intricacies of commodity markets.

The speaker emphasizes that commodity prices are primarily driven by supply and demand dynamics. However, traditional methodologies based solely on prices have proven inadequate in capturing the complexities of commodity price behavior. To address this issue, the speaker suggests incorporating fundamental modeling while ensuring the model aligns with available market data. They explain how power prices are shaped through the auctioning of bids from power plants with varying efficiencies and how the final price is determined based on demand. The resulting scatter plot depicting the relationship between demand and price demonstrates a diverse distribution due to the influence of random fuel price factors.

Furthermore, the speaker explains that the price of power is determined by both demand and fuel prices, as the cost of generation depends on the prices of fuel. Additionally, the occurrence of outages needs to be modeled, as the market is finite and the price of power can be affected if a few power plants experience downtime. To incorporate these factors, the speaker suggests constructing a generation stack, which represents the cost of generation for each participant in the market. By considering fuel prices and outages, the generation stack can be adjusted to accurately match market prices and option prices.

The video progresses to discuss how different commodities can be modeled to understand the evolution of power prices. The speaker explains the process of modeling the behavior of fuel prices, outages, and demand. Subsequently, a generation stack is constructed, representing a curve determined by factors such as demand, outages, variable costs, and fuel prices. Parameters are carefully selected to match the forward curve for power prices and other relevant market parameters. This approach enables the capture of price spikes in power markets with relative ease. The speaker notes that natural gas, heating oil, and fuel oil are storable commodities, making their behavior more regular and easier to model.

Moving forward, the speaker highlights how commodity models can be leveraged to predict the price of electricity in the market, taking into account factors such as temperature, supply, and demand. Through the utilization of Monte Carlo simulations and a comprehensive understanding of the distribution of fuel prices, accurate simulations of price spikes caused by temperature fluctuations can be achieved. The model also accurately captures the correlation structure of the market without requiring it as an input. However, it is emphasized that maintaining such a model necessitates a significant amount of information and organization, as every power plant and market change must be tracked.

In the final section of the video, the speaker acknowledges the challenges associated with building commodity models for different markets. The process is a massive undertaking that requires years of development, making it an expensive endeavor. Despite the complexities involved, the speaker believes that the covered topics are a good point to conclude the discussion and invites viewers to ask any remaining questions they may have.

Overall, the video provides valuable insights into the challenges faced by quantitative analysts when building commodity models. It highlights the importance of prioritizing stability and robustness in modeling approaches, the complexities of modeling commodity prices, and the role of fundamental factors such as supply, demand, and fuel prices in shaping power prices. The speaker also emphasizes the significance of collaboration with industry stakeholders and the continuous effort required to maintain and update commodity models for different markets.

  • 00:00:00 In this section, the speaker discusses the problems that quantitative analysts solve in the commodity world, in comparison to those in other markets. He provided an example of Trafigura, which made a record profit in 2009, the year the oil prices dropped to a historical low level. He also talks about futures contracts and how they work in commodity markets, specifically discussing the concepts of contango and backwardation. Contango means that the future spot price is more expensive than the current spot price, which allows traders to make a profit even in times when the prices are low.

  • 00:05:00 In this section, the speaker explains how Trafigura made money during the period between February 2009 and 2010, when crude oil prices increased from $35 to $60. The company borrowed $35, bought one barrel of crude oil, and stored it until it could be sold for much higher at $60. This allowed them to make a profit of $25 per barrel, which multiplied over 50-60 million barrels of storage racks up to a massive sum. The speaker emphasizes that to bid for storage in an auction, one must carefully strategize how to recover the money paid for storage and gain some additional profit.

  • 00:10:00 In this section, the video discusses two strategies for bidding on storage in commodity models. The first is a standard strategy where a trader bids on futures contracts for August and sells in December, without having to borrow money. The second strategy is one used by quants, where they sell the August-December spread option, determined by the difference between the prices of December and August contracts, with positive differences paying the option owner and negative ones paying zero. The latter strategy is more complicated but offers added value to the company.

  • 00:15:00 In this section, the speaker discusses the advantages of selling a production on August 1st using a commodity model. He explains that by selling the option on the given date, the seller gets a formula-determined value of the option, which is typically higher than the current market value. This gives the seller an edge during bidding, and they can earn a profit margin of their choice. The speaker also explains how to calculate the risk of the option and how real or physical assets can be used to mitigate the risk.

  • 00:20:00 In this section, the speaker discusses the concept of a spread option and sheds more light on their complexity in reality. He explains that optimizing the value of a portfolio of options that can be sold against the storage requires determining the most valuable portfolios of options while considering technical, contractual, legal, and environmental constraints. The speaker further notes that the option portfolios should be sold in a way that guarantees that the value can be extracted whenever the option expires, and there are constraints on the injection and withdrawal rate.

  • 00:25:00 In this section, the speaker discusses an optimization problem involving commodity models and storage. The problem involves finding a way to extract value from a commodity option when there is no space left in storage, and conversely, finding a way to sell from storage when it is empty. The speaker explains the variables and constraints of the problem and shows how it is possible to optimize the portfolio through a series of options. Overall, the optimization problem is complex but can be solved with the help of boolean variables and a focus on maximizing profits.

  • 00:30:00 In this section, the speaker discusses the complex nature of commodity models that involve injection and withdrawal rates, maximum and minimum capacity constraints, and unknown variables like volumes and prices. The problem becomes non-linear and very difficult to solve with a large number of variables and constraints. Several approaches including approximation, Monte Carlo simulations, and stochastic control can be used to solve commodity models, but the accuracy of the results depends on the accuracy of the parameters used. Even the most precise methodology can be wrong if the parameters are incorrect.

  • 00:35:00 In this section, the speaker discusses their chosen methodology of commodity modeling, which is designed to prioritize robustness and stability over capturing the richness of price behaviors. They explain that over-parameterizing a model can lead to instability and small changes that can change the value substantially. To prioritize stability and robustness, they sacrifice some of the value by using a different approach. Furthermore, the model they use can be verified by outside regulators, and every component of the model can be traded in the market, which is crucial in the current day and age. Additionally, they explain the concept of dynamic hedging and how it can be used to replicate the value of an option and meet payouts without an active option market by using a simple player function.

  • 00:40:00 In this section, the speaker discusses the concept of replicating the payout of an option by using a dynamic hedging strategy, allowing traders to sell portfolios even if there are no buyers. He emphasizes the importance of producing a strategy to extract value, as well as working with those who operate storage facilities to execute the plan successfully. The speaker then explains how this approach can be used to model physical assets, such as tankers and power plants, to maximize profits by making informed decisions based on the price of electricity and fuel. While the nature of each asset differs, the conceptual approach remains the same, requiring an understanding of the nuances and constraints of each asset.

  • 00:45:00 In this section, the video discusses the process of calculating the cost of producing one megawatt-hour of power based on the efficiency of the power plant. The efficiency, known as heat rate, is measured in mm BTUs and tells us how many units of natural gas need to be burned to produce one megawatt-hour of power. The constant corresponding to a natural gas power plant is typically between 7 to 20, with 7 being the most efficient. Other costs associated with producing one megawatt-hour, such as air conditioning and labor, are also considered. The video then goes on to discuss the process of determining the value of a power plant and constructing a distribution of prices and fuel costs in order to calculate how much to pay for a power plant.

  • 00:50:00 In this section, the lecturer discusses the challenges of commodity models, specifically in the case of power prices. The distribution of power prices cannot be modeled using Brownian motion due to the presence of fat tails and spikes in the data. The volatility is also much higher than in equity markets. The lecturer notes that these challenges are common in all regions and that mean reversion in spikes is necessary to capture the behavior of power prices. Other phenomena that need to be captured include high kurtosis, regime switching, and non-stationarity.

  • 00:55:00 In this section, the speaker discusses the challenges of modeling commodity prices and how different models have been used, including mean reversion, jumps, and regime switching. However, these models are too complex and difficult to manage. The speaker suggests a completely different methodology from the fixed-income world, foreign exchange, and equities, which is more suitable and understandable from the commodity point of view.

  • 01:00:00 In this section, the speaker discusses how commodity prices are primarily driven by supply and demand. However, standard methodologies for modeling commodity prices based solely on prices themselves have proven to be difficult. The speaker suggests introducing some fundamental modeling to address this issue, while also ensuring that his model matches all market data available. The speaker goes on to explain how power prices are formed through the auctioning of bids from power plants with different levels of efficiency, and how the final price is determined based on demand. The resulting scatter plot of demand versus price shows a fat graph due to the random factor of fuel prices.

  • 01:05:00 In this section, the speaker explains that the power price is determined by both demand and fuel prices, as the cost of generation depends on the fuel prices. Outages also need to be modeled because the market is finite, and the price of power can be affected if a few power plants go down. To model these factors, the speaker suggests constructing a generation stack, which is the cost of generation for each participant in the market. By knowing the fuel prices and outages, one can generate the beat stack, which will follow the generation stack and adjust to match the market prices and option prices.

  • 01:10:00 In this section, the speaker explains how different commodities can be modeled and used to determine the evolution of power prices. They start by modeling the evolution of fuel prices, outages, and demand, and then construct the generation stack, which is a curve determined by demand, outages, variable costs, and fuel. They choose parameters to match the forward curve for power prices and other market parameters. This approach allows for the capture of spikes in power prices without much effort, and natural gas, heating oil, and fuel oil are storable commodities, making their behavior more regular and easier to model.

  • 01:15:00 In this section of the video, the speaker explains how commodity models can be used to predict the price of electricity in the market based on temperature and supply and demand factors. By using Monte Carlo simulations and understanding the distribution of fuel prices, they are able to accurately capture and simulate the spikes in prices caused by changes in temperature. Additionally, the model accurately captures the correlation structure of the market without needing it as an input. However, the negative side of this approach is that it requires a lot of information and organization to maintain due to the need to keep track of every power plant and any changes that may occur in the market.

  • 01:20:00 In this section, the speaker talks about the challenges of building commodity models for different markets. It requires a massive undertaking and takes years to develop, making it an expensive process. The speaker believes that this is a good point to stop but invites questions from viewers.
13. Commodity Models
13. Commodity Models
  • 2015.01.06
  • www.youtube.com
MIT 18.S096 Topics in Mathematics with Applications in Finance, Fall 2013View the complete course: http://ocw.mit.edu/18-S096F13Instructor: Alexander Eydelan...
 

14. Portfolio Theory



14. Portfolio Theory

Portfolio Theory is a fundamental concept in finance that focuses on the performance and optimal construction of investment portfolios. It involves analyzing the expected returns, volatilities, and correlations of multiple assets to determine the most efficient portfolio allocation. The efficient frontier represents a range of feasible portfolios with varying levels of volatility. By introducing a risk-free asset, the feasible set expands to include a combination of the risk-free asset and other assets, forming a straight line.

Accurate estimation of parameters is crucial for evaluating portfolios and solving the quadratic programming problem for portfolio optimization. Formulas are used to calculate optimal weights based on various constraints, such as long-only portfolios, holding constraints, and benchmark exposure constraints. Utility functions are employed to define preferences for wealth and maximize expected utility while considering risk aversion.

The video delves into the application of portfolio theory using exchange-traded funds (ETFs) and market-neutral strategies. Different constraints can be implemented to control risks and variations in a portfolio, including exposure limits to market factors and minimum transaction sizes. The speaker explores the optimal allocation of nine ETFs invested in various industrial sectors in the US market, considering portfolio analysis tools and the impact of capital constraints on optimal portfolios. Market-neutral strategies employed by hedge funds are also discussed, highlighting their potential for diversification and reduced correlation.

The selection of appropriate risk measures is crucial when evaluating portfolios. Mean-variance analysis is commonly used, but alternative risk measures such as mean absolute deviation, semi-variance, value-at-risk, and conditional value-at-risk can provide additional insights. The use of factor models aids in estimating the variance-covariance matrix, enhancing the accuracy of portfolio optimization.

Throughout the video, the speaker emphasizes the importance of accurate parameter estimation, the impact of constraints on portfolio construction, and the significance of risk measures in portfolio evaluation. Portfolio theory provides a framework for making rational investment decisions under uncertainty, considering preferences for higher returns, lower volatility, and risk aversion. By applying these concepts, investors can construct well-balanced portfolios tailored to their risk tolerance and investment objectives.

In the subsequent sections of the video, the speaker further explores the intricacies of portfolio theory and its practical implications. Here is a summary of the key points covered:

  1. Historical Theory of Portfolio Optimization: The speaker begins by discussing the historical foundation of portfolio optimization, focusing on the Markowitz Mean-Variance Optimization. This approach analyzes portfolios based on their mean return and volatility. It provides a framework for understanding the trade-off between risk and return and serves as the basis for modern portfolio theory.

  2. Utility Theory and Decision-Making under Uncertainty: Utility theory, specifically von Neumann-Morgenstern utility theory, is introduced to guide rational decision-making under uncertainty. Utility functions are used to represent an investor's preferences for wealth, considering factors such as higher returns and lower volatility. The speaker explains various utility functions commonly employed in portfolio theory, including linear, quadratic, exponential, power, and logarithmic functions.

  3. Constraints and Alternative Risk Measures: The video explores the inclusion of constraints in portfolio optimization. These constraints can be implemented to ensure specific investment criteria, such as long-only portfolios, turnover constraints, and exposure limits to certain market factors. Additionally, the speaker discusses alternative risk measures beyond the traditional mean-variance analysis, such as measures accounting for skewness, kurtosis, and coherent risk measures.

  4. Solving the Portfolio Optimization Problem: The speaker provides mathematical insights into solving the portfolio optimization problem. By formulating it as a quadratic programming problem, optimal weights for the portfolio can be determined. The Lagrangian and first-order conditions are utilized to solve for these weights, with the second-order derivative representing the covariance matrix. The solution allows for maximizing returns while minimizing volatility, subject to specified constraints.

  5. Efficient Frontier and Capital Market Line: The concept of the efficient frontier is introduced, representing a set of optimal portfolios that achieve the highest return for a given level of risk. The speaker explains how the efficient frontier takes shape based on the risk-return profiles of various portfolios. Furthermore, the capital market line is discussed, illustrating the relationship between risk and return when combining the risk-free asset with the market portfolio. It enables investors to determine the expected return for any desired level of risk.

  6. Estimation of Parameters and Risk Measures: The importance of accurate parameter estimation is highlighted, as it significantly influences portfolio analysis. The speaker emphasizes the use of factor models to estimate the variance-covariance matrix, providing more precise inputs for optimization. Additionally, different risk measures such as mean absolute deviation, semi-variance, value-at-risk, and conditional value-at-risk are explained, with their suitability depending on the specific characteristics of the assets being invested.

Throughout the video, the speaker emphasizes the practical application of portfolio theory using exchange-traded funds (ETFs) and market-neutral strategies. The use of constraints to manage risks and variations in a portfolio, the impact of capital constraints on optimal portfolios, and the benefits of market-neutral strategies for diversification are discussed in detail.

Overall, the video provides a comprehensive overview of portfolio theory, covering various aspects from historical foundations to practical implementation. It emphasizes the importance of accurate estimation, the incorporation of constraints, the choice of risk measures, and the potential benefits of different investment strategies. By understanding these concepts, investors can make informed decisions to construct portfolios that align with their risk preferences and investment goals.

  • 00:00:00 In this section of the video, Peter Kempthorne covers the topic of portfolio theory, which is one of the most important topics in finance. He begins by discussing the historical theory of portfolio optimization, which involves Markowitz Mean-Variance Optimization for analyzing the performance characteristics of portfolios in terms of their mean return and volatility returns. The analysis is then extended to include investing with a risk-free asset, and the topic of utility theory, von Neumann-Morgenstern utility theory, is introduced to make decisions under uncertainty in a rational way. Additionally, Kempthorne covers portfolio optimization constraints and alternative risk measures for extending the simple mean-variance analysis. Finally, he explains single-period analysis, how to represent a portfolio, and how to calculate the expected return and variance of a portfolio.

  • 00:05:00 In this section, the speaker introduces the portfolio analysis problem and considers a simplified setting with two assets. The goal is to find optimal portfolios investing in these two assets, considering their expected return and volatility, and the possible correlation between them. The mean-variance analysis is used to analyze the feasible portfolio set and determine optimal and sub-optimal portfolios. The speaker then highlights the importance of Markowitz theory and its extensions in providing elegant answers to these questions. Finally, a simulation is conducted to examine the cumulative returns of each asset in different portfolios.

  • 00:10:00 In this section, a simulated asset with a mean return of 15% and 25% volatility is discussed. The scatter plot of weekly returns shows no apparent correlation, although there is a sample correlation. The feasible set of portfolios is shown on the right-hand graph, and allocating towards asset 2 improves the return of the portfolio without compromising on volatility. The minimum variance portfolio is also discussed, with a weighting on the different assets being inversely proportional to their squared volatility. The blue graph is slightly closer to asset 1, indicating a slightly higher weight for asset 1.

  • 00:15:00 In this section, the concept of sub-optimal portfolios is examined, with the conclusion that all the points on the scatter plot are sub-optimal portfolios and a trade-off must be made between return and volatility. The benefit of diversification when two fully uncorrelated assets are pooled together is discussed, and the effect of negative correlations on feasible sets and reducing volatility is examined. A correlation of -1 between two assets can lead to a zero volatility portfolio, which is rare in markets, but in pricing theory, this portfolio's return should be equal to the risk-free rate.

  • 00:20:00 In this section of the video, the speaker discusses the relationship between correlation and diversification in portfolio theory. The simulation shows that increasing the correlation between assets results in less benefit from diversification, meaning that the variance of the portfolio cannot be lowered as much. The speaker highlights the importance of using accurate estimates for mean returns, volatilities, and correlations when evaluating portfolios, as sample estimates can differ from population parameters and have a certain amount of variability. The quadratic programming problem for portfolio optimization involves minimizing the squared volatility of the portfolio subject to constraints on the mean of the portfolio and full investment, which can be solved using a Lagrangian and first-order conditions.

  • 00:25:00 In this section, the speaker explains how to solve for weights and minimum variance. The first-order condition is a solution because the second-order derivative of the Lagrangian is equal to the covariance matrix, which can solve the problem. By substituting a given alpha into the solutions, the variance of the optimal portfolio can also be solved. The problem can be looked at in two other ways, one to maximize the return subject to a constraint on volatility, and the other to maximize the return subject to negative multiple on variance. These are equivalent problems being solved by the same Lagrangian.

  • 00:30:00 In this section, we learn about the efficient frontier, which is the collection of all possible solutions given a range of feasible target returns and volatility values. In a two-asset case, the efficient frontier is a parabola, and adding another asset creates several parabolas, which define the feasible set. The efficient frontier is the top side of the curve. The addition of a risk-free asset expands the feasible set into a straight line between the risk-free asset point and any point on the efficient frontier, allowing for investments in a combination of the risk-free asset and other assets.

  • 00:35:00 In this section, the lecturer discusses the mathematics for solving a problem where the goal is to minimize volatility while ensuring the return is equal to
    a specific value. By investing in a risk-free asset, investors can achieve a higher return with a lower variance and expand their investment opportunities. The lecturer provides formulas for determining an optimal portfolio, which invests proportionally in risky assets but differs in weight allocation, depending on the target return. These formulas also provide closed-form expressions for the portfolio variance, which increases as the target return increases due to the trade-off when using optimal portfolios. The fully invested optimal portfolio is called the market portfolio.

  • 00:40:00 In this section, the speaker explains the concept of the optimal portfolio, which is the portfolio that maximizes the mean return across all portfolios. They mention that every optimal portfolio invests in a combination of the risk-free asset and the market portfolio, regardless of how much risk an investor wants to take. The speaker presents the expressions for the expected return and variance of the market portfolio, and shows the formula for the weights of the optimal portfolio. This leads to the definition of the capital market line, which allows investors to determine the expected return for any given level of risk.

  • 00:45:00 In this section, the capital market line for portfolio optimization is discussed. The line represents the expected return of any optimal portfolio, which is equal to the risk-free rate plus a multiple of the return per risk of the market portfolio. By allocating additional weights to the market portfolio and borrowing money at the risk-free rate, one can achieve higher returns and volatility beyond the market portfolio, leading to an extended efficient frontier. The section ends with a discussion on the von Neumann-Morgenstern utility theory, which considers the decision-making process for portfolio optimization based on expected return and volatility.

  • 00:50:00 In this section, the concept of portfolio theory is introduced. Portfolio theory involves making investment decisions under uncertainty based on a specified utility function for wealth, with the aim of maximizing the expected utility of wealth. The theory is powerful in providing rational decisions under uncertainty that factor in preferences as to higher returns, lower volatility, and other factors defined by the utility function used. The basic properties of utility functions are discussed, including the concepts of risk aversion and absolute and relative risk aversion. The utility functions used in portfolio theory include linear, quadratic, exponential, power, and log functions.

  • 00:55:00 In this section, the speaker discusses portfolio theory under the quadratic utility function and the assumptions of Gaussian distributed returns. Under these assumptions, mean-variance analysis is the optimal approach to portfolio optimization. However, with different utility functions, such as those that consider penalties for skewness or kurtosis, extensions to the basic model may be needed. The speaker also notes that practical portfolio optimization problems involve constraints such as long-only portfolios, holding constraints, simple linear constraints, turnover constraints, and benchmark exposure constraints. These constraints are necessary to consider when adjusting portfolios from one period to the next.

  • 01:00:00 In this section, the speaker discusses different types of constraints that can be applied in portfolio optimization to control risks and variations in a portfolio. These include controlling the tracking error between a portfolio and its benchmark, limiting the exposure to different market factors, and applying minimum transaction and holding sizes, and integer constraints. These constraints can be expressed as linear and quadratic constraints on the weights and can be implemented alongside the portfolio optimization problem. The example given is on US sector exchange-traded funds.

  • 01:05:00 In this section, the speaker discusses the potential of exchange-traded funds as a means of investing in equity markets. They analyze nine different ETFs invested in various industrial sectors in the US market. These ETFs performed differently between 2009 and last week, which highlights their value for a diversified portfolio. The speaker examines the optimal allocation of these ETFs over this period through portfolio analysis tools. Results reveal the yellow ETF representing consumer staples to be awarded a high weight, followed by green representing energy, and orange for health, implying these sectors to be promising for investment. Moreover, a mean-variance optimization is applied by restricting a 30% maximum investment per asset. The graph illustrates that this constraint starts to be active when the returns are above the risk-free rate, which means allocating more weightage to other ETFs to increase the consumers' discretionary portfolio.

  • 01:10:00 In this section, the lecturer discusses how capital constraints impact optimal portfolios. They present a graph of the efficient frontier and demonstrate how portfolios change as constraints are hit. When a target return of 10% is considered with a 30% capital constraint, the optimal portfolio with a 10% volatility is shown. However, when the capital constraint is reduced to 15%, the efficient frontier decreases and portfolios must allocate to other exchange-traded funds as constraints hit sooner. The lecture highlights that capital constraints are realistic in certain circumstances and how they impact investment policies.

  • 01:15:00 In this section, the speaker discusses portfolio optimization using exchange-traded funds (ETFs) and market-neutral strategies. The example of ETFs shows how past performance can define portfolios, but it is not realistically reliable. The speaker then explains how hedge funds can invest in sector-based models using market-neutral strategies, which tend to be less correlated and offer dramatic diversification benefits. The graph shows that optimal allocations across these sector market-neutral models can help achieve a target volatility of 10%, and combining different models has beneficial portfolio optimization due to their lower correlation.

  • 01:20:00 In this section, the speaker highlights that the results of estimated returns, estimated volatilities, and correlations can be impacted by choices of estimation period, estimation error, and different techniques that can modulate these issues. The use of factor models to estimate the variance-covariance matrix results in more precise inputs to the optimization. The speaker also discusses different risk measures such as mean absolute deviation, semi-variance, and value-at-risk measures, which are now standard in portfolio management and the management of risky assets. There is also an extension of value at risk called conditional value at risk. The appropriate risk measures depend on the assets being invested, and there is a whole discussion on coherent risk measures for risk analysis.
14. Portfolio Theory
14. Portfolio Theory
  • 2015.01.06
  • www.youtube.com
MIT 18.S096 Topics in Mathematics with Applications in Finance, Fall 2013View the complete course: http://ocw.mit.edu/18-S096F13Instructor: Peter KempthorneT...
 

15. Factor Modeling



15. Factor Modeling

In this section, the video delves into the practical aspects of factor modeling, including the estimation of underlying parameters and the interpretation of factor models. The speaker emphasizes the importance of fitting the models to specific data periods and acknowledges that modeling the dynamics and relationships among factors is crucial.

The video explains that maximum likelihood estimation methods can be employed to estimate the parameters of factor models, including the factor loadings and alpha. The estimation process involves using regression formulas with the estimated factor loadings and alpha values to estimate the factor realizations. The EM (Expectation-Maximization) algorithm is highlighted as a powerful estimation methodology for complex likelihood functions, as it iteratively estimates hidden variables assuming known hidden variables.

The application of factor modeling in commodities markets is discussed, emphasizing the identification of underlying factors that drive returns and covariances. These estimated factors can serve as inputs for other models, enabling a better understanding of the past and variations in the market. The speaker also mentions the flexibility of considering different transformations of estimated factors using the transformation matrix H.

Likelihood ratio tests are introduced as a means of testing the dimensionality of the factor model. By comparing the likelihood of the estimated factor model with the likelihood of a reduced model, the significance and relevance of additional factors can be assessed. This testing approach helps determine the appropriate number of factors to include in the model.

The section concludes by highlighting the importance of modeling the dynamics of factors and their structural relationships. Factor models provide a framework for understanding the interplay between factors and their impact on asset returns and covariances. By considering the dynamics and structural relationships, investors and analysts can gain valuable insights into the underlying drivers of financial markets.

Overall, this section expands on the topic of factor modeling, exploring the estimation of parameters, the interpretation of factor models, and the application of factor modeling in commodities markets. The section emphasizes the need for proper modeling techniques and understanding the dynamics and relationships among factors to gain meaningful insights into financial markets.

  • 00:00:00 In this section, the topic discussed is factor modeling, which aims to use multivariate analysis to model financial markets by using factors to explain returns and covariances. There are two types of factor models where the factors can either be observable or hidden, and statistical factor models are used to specify these models. The linear factor model uses factors f1 through fk, which is a state-space model for the value of the stochastic process that depends on the coefficients beta_1 through beta_k. The setup looks like a standard regression model, and the vectors beta_i are called the factor loadings with specific factors being termed as the epsilon of asset i, period t. The goal is to characterize returns and covariances using a modest number of underlying factors compared to the large number of securities, simplifying the problem greatly.

  • 00:05:00 In this section, the video discusses a factor model for explaining the returns of assets based on underlying factors. The residual term is considered random and assumed to be white noise with mean 0. This model assumes that the returns on assets depend on the underlying factors with a mean, mu_f, and a covariance matrix, omega_f. The psi matrix represents a diagonal matrix with the specific variances of the underlying assets. The covariance matrix for the overall vector of the m-variate stochastic process can be obtained using the conditional and unconditional expectations and covariances. The unconditional covariance of x is equal to the expectation of the covariance of the residual term plus twice the covariance between the expected value of x and the residual term. The number of parameters for the covariance matrix is m times m plus 1 over 2.

  • 00:10:00 In this section, the concept of a factor model is introduced as a means of reducing the number of parameters involved in a multivariate regression, with specific attention given to interpreting the factor model as a series of time series regressions. The focus is on grouping everything together for all assets at once, which is computationally efficient in fitting these. The simplest factor model, the single-factor model of Sharpe, is presented where the excess return on stock can be modeled as a linear regression on the excess return of the market, scaling risk by the beta_i of different assets.

  • 00:15:00 In this section, the video discusses the covariance matrix of assets in factor modeling and how it can be simplified by using a model for modeling the covariance, which can be useful in portfolio management and risk management. The estimation process for Sharpe's single index model is also explained, along with the concept of common factor variables that can be observed as potential candidates for being a relevant factor in a linear factor model. The effectiveness of a potential factor is determined by fitting the model and seeing how much it contributes to the overall covariance matrix.

  • 00:20:00 In this section, the video describes factor modeling and the approach of transforming factors into surprise factors to model macroeconomic variables. The power of incorporating unanticipated changes in these factors is discussed, and this approach is applied widely now. The video also explains how to estimate the underlying parameters using simple regression methods and the Gauss-Markov assumptions. An example of the BARRA Approach, which uses common factor variables based on fundamental or asset-specific attributes, is also provided.

  • 00:25:00 In this section, the Fama-French approach to factor modeling and risk analysis is discussed, which involves ranking stocks based on common factors such as market cap and value versus growth, and dividing them into quintiles for equal-weighted averages. The BARRA industry factor model, which divides stocks into different industry groups, is also mentioned as a simple case of factor modeling. The factor realizations are unobserved but estimated in the application of these models, allowing correlation with individual asset returns to be calculated. Overall, these approaches continue to be used extensively in factor modeling today.

  • 00:30:00 In this section, the concept of industry factor models is introduced. Specifically, industry factor models allow for the association of factor loadings, which are used to load each asset in terms of the industry group it belongs to. The problem with industry factor models is how to specify the realization of underlying factors, which can be estimated with a regression model. The estimation of the factor realizations assumes that the variability of the components of x has the same variance but there is actually heteroscedasticity in these models. Overall, this section provides an overview of the estimation of covariance matrices and regression estimates for industry factor models.

  • 00:35:00 In this section of the video, the focus is on heteroscedasticity in estimating the regression parameters and its impact on portfolio optimization, where assets are weighted by their expected returns and penalized by high variance. Factor mimicking portfolios are used to determine the real value of trading with factors such as in the Fama-French model, and the realization of each factor is a weighted sum of the returns on the underlying assets. By normalizing the row weights of the k-dimensional realizations, factor mimicking portfolios that interpret potential investments can be defined for asset allocation.

  • 00:40:00 In this section, the speaker discusses statistical factor models for analyzing time series of asset returns for m assets over T time units, where the underlying factors are unknown. The speaker explains factor analysis and principal components analysis as methods for uncovering those underlying factors, which can be defined in terms of the data themselves. The speaker notes that there is flexibility in defining the factor model and that any given specification of the matrix B or factors f can be transformed by a k by k invertible matrix H.

  • 00:45:00 In this section, the concept of factor modeling and transformations are discussed, highlighting how the linear function remains the same in terms of the covariance matrix of the underlying factors. The discussion moves to defining a matrix H that diagonalizes the factors, which allows for the consideration of factor models with uncorrelated factor components. Making certain assumptions such as orthonormal and zero-mean factors simplifies the model to the covariance matrix sigma_x as the factor loadings B times its transpose, plus a diagonal matrix. Maximum likelihood estimation is also discussed in the context of normal linear factor models with normally distributed underlying random variables, leading to the joint density function of the data.

  • 00:50:00 In this section, the video discusses factor modeling and how maximum likelihood estimation methods can be applied to specify all the parameters of the B and psi matrices using the EM algorithm. Factor realizations can be estimated by using the regression formula with estimates for the factor loadings and alpha. The EM algorithm is a powerful estimation methodology that can simplify complex likelihood functions by estimating hidden variables, assuming the hidden variables are known, and iterating that process. The factor realizations can be used for risk modeling.

  • 00:55:00 In this section, the speaker discusses the use of statistical factor analysis in commodities markets and identifying underlying factors that drive returns and covariances. The estimated underlying factors can also be used as inputs to other models, which is useful in understanding the past and how they vary. The speaker also mentions the flexibility of considering different transformations of any given set of estimated factors by the H matrix for transformation. Additionally, the use of statistical factor analysis for interpreting the underlying factors is mentioned, with applications in measuring IQ and finding rotations of the factor loadings that make the factors more interpretable. Finally, the section covers likelihood ratio tests and testing for the dimensionality of the factor model.

  • 01:00:00 In this section, the concept of principal components analysis (PCA) is introduced, which is a theoretical framework that uses eigenvalues and eigenvectors of the covariance matrix to reduce the multivariate structure into a smaller dimensional space. PCA creates a new coordinate system that doesn't change the relative position of the data, but only rotates the coordinate axes, and it simplifies
    the affine transformation of the original variable x. The principal component variables have a mean of 0 and a covariance matrix given by the diagonal matrix of eigenvalues, and they represent a linear factor model with factor loadings given by gamma_1 and a residual term given by gamma_2 p_2. However, the gamma_2 p_2 vector may not have a diagonal covariance matrix.

  • 01:05:00 In this section, the video explains the differences between linear factor models and principal components analysis. With a linear factor model, it is assumed that the residual vector has a covariance matrix equal to a diagonal, while principal components analysis may or may not be true. The video then goes on to discuss empirical principal components analysis, where sample data is used to obtain estimates of means and covariance matrices. The concept of variability is also introduced, wherein the first principal component variable is defined as the dimension at which the coordinate axis has the maximum variability. The second principal component variable is then the direction orthogonal to the first with the maximum variance, and this process is continued to define all m principal component variables.

  • 01:10:00 In this section, the speaker explains how principal component analysis can be used to decompose the variability of different principal component variables of a covariance matrix σ, which represents the total variance of a multivariate data set. The off-diagonal entries of the matrix are zero, indicating that the principal component variables are uncorrelated and have their own level of variability, as represented by the eigenvalues. As a case study, the speaker uses the example of U.S. Treasury yields between 2000 and 2013, looking specifically at the yields' changes. The focus is on a five-year period between 2001 and 2005, and the analysis consists of the yields' daily volatility and negative levels over that period.

  • 01:15:00 In this section, the presenter discusses factor modeling of yield changes using principal components analysis. The correlation matrix of yield changes shows high correlations for shorter tenors and correlations decreasing as you move away from the diagonal. The presenter uses graphs to visually represent the correlations and shows that the first principal component variable explains 85% of the total variability. A scree plot confirms that the first few principal components explain a significant amount of variability. Finally, the presenter compares the standard deviations of the original yield changes to those of the principal component variables.

  • 01:20:00 In this section, a plot of the loadings on the different yield changes for the first few principal component variables was presented, which gives an idea about the interpretation of the principal component variables. The first principal component variable measures the average yield change across the whole range and gives greater weight to the five-year, which captures a measure of the level shift in the yield curve, while the second principal component variable looks at the difference between the yield changes on the long tenors versus the short tenors. Moreover, the third principal component variable provides a measure of the curvature of the term structure and how that's changing over time. The principal component variables have zero correlations with each other, and the cumulative principal component variables over time indicate how these underlying factors have evolved over the time period.

  • 01:25:00 In this section, the speaker discusses fitting a statistical factor analysis model to the data and comparing the results over a five-year period. The speaker emphasizes the importance of specifying the models over a specific period and notes that fitting the models is only a starting point. Ultimately, modeling the dynamics in these factors and their structural relationships is necessary.
15. Factor Modeling
15. Factor Modeling
  • 2015.01.06
  • www.youtube.com
MIT 18.S096 Topics in Mathematics with Applications in Finance, Fall 2013View the complete course: http://ocw.mit.edu/18-S096F13Instructor: Peter KempthorneT...