Machine Learning and Neural Networks - page 13

 

Lecture 12: Blob Analysis, Binary Image Processing, Green's Theorem, Derivative and Integral



Lecture 12: Blob Analysis, Binary Image Processing, Green's Theorem, Derivative and Integral

In this lecture, the professor covers a range of topics including intellectual property, patents, trademarking, and image processing techniques for edge detection. The lecture emphasizes the importance of accuracy in 2D machine vision and the challenges of detecting fuzzy or defocused edges. The professor covers methods for finding mixed partial derivatives, Laplacians, and edge detection using sub-pixel interpolation, along with techniques for bias compensation and correctional calibration in peak-finding. Overall, the lecture provides a comprehensive overview of these topics and their practical applications.

In this lecture on image processing, the speaker discusses various methods to avoid quantization of gradient directions and improve accuracy in determining edge position. Interpolation is suggested as a preferred method over lookup tables and quantization for more precise gradient direction determination. Additionally, fixing the step size with a circle and using multiscale analysis are discussed as alternative gradient calculation methods. The speaker also explains an iterative approach to rotating an image to reduce the y-component of the gradient to zero and introduces the concept of chordic for rotating through special angles. Students are reminded to start early on the quiz as it is more work than the typical homework problem.

  • 00:00:00 In this section, the professor discusses the upcoming quiz which is longer and counts twice as much as a homework problem. The quiz covers the course content up to this point, with more emphasis on recent materials. The professor then provides a brief discussion on intellectual property and patents, mentioning the different types of patents such as utility and design patents. The social contract between patent holders and the government is also discussed, where patent holders receive a limited monopoly for a certain number of years in exchange for explaining exactly how to do something. The discussion concludes by touching on the legal concept of best mode in patent litigation.

  • 00:05:00 protect your brand or logo you can do so with a trademark. Exceptions exist for using small portions of copyrighted material, such as for educational purposes, and for reverse engineering software without violating copyright laws. Copyright laws used to protect the author's lifetime plus a certain amount of years, but have since been updated to author's lifetime plus 75 years or more. Trademark laws protect brands and logos, which are more restrictive than copyrights.

  • 00:10:00 In this section, the speaker discusses the rules around trademarking a company name and logo, emphasizing that it must be unique in the field and cannot be a common word. The trademark may also include shapes, markings, and color, which can serve to protect the company. The speaker also touches on the concept of trade secrets, where the company keeps the details of their product secret, though it has no legal protection. The speaker then introduces a low-level patent related to edge-finding and mentions that once edges are found, more complex image processing tasks can be performed for object recognition and determining position and attitude. The speaker notes that in the 2D machine vision world, accuracy is incredibly important and must work almost perfectly.

  • 00:15:00 In this section, the lecturer reviews the basics of blob analysis and binary image processing by discussing various methods used to estimate derivatives. The first idea discussed was looking at the brightness gradient to identify an inflection point as the edge, and then looking at the derivative, which is searching for a peak. Various methods of estimating derivatives, such as different approximations for e sub x, were examined, and the lowest order error term was found using Taylor series expansion. Finally, the lecture delves into muscle electrical signal analysis and how complex the process can become when looking for high precision first derivatives due to noise and signal distortion.

  • 00:20:00 In this section, the lecturer discusses the trade-offs involved in choosing the length of the edge operator in order to detect edges. He explains that using an operator that is too long can lead to different features interacting with each other, making it challenging to detect edges. This trade-off is applicable when detecting edges in an image of a cube, where the edges get quite close to each other. The lecturer then explains how second-order derivatives can be computed using the convolution of first derivatives applied twice, and shows how this method can be used to check for the accuracy of results. Finally, he explains the importance of checking the different ways of designing computational molecules used for deriving derivatives.

  • 00:25:00 In this section of the lecture, the professor explains the process of finding mixed partial derivatives using a 2D stencil. The stencil involves flipping one of the functions and superimposing it on top of the other to identify areas of overlap, resulting in a 2x2 stencil. The professor notes that it is important to watch out for sign reversals when using computational stencils that are not flipped. They also point out that the mixed partial derivative can be thought of as a second derivative in a rotated coordinate system. Overall, the section provides a clear and detailed explanation of finding mixed partial derivatives in 2D.

  • 00:30:00 In this section, the topic of the Laplacian is re-introduced as a second derivative operator, where two operators are added in orthogonal directions to obtain an approximation of the Laplacian for a centrally symmetric differential operator. A weighted sum of these two operators is then introduced to create a smoother version of the Laplacian for a centrally symmetric differential operator, and this new operator is even more computationally efficient when applied to an image. Additionally, techniques for determining the values of these weighted coefficients are discussed, such as the lowest order error term, or sums equalling zero.

  • 00:35:00 In this section, the speaker discusses the issue of using rectangular pixels instead of hexagonal. He explains situations where people are concerned about efficiency, such as in the imaging of the black hole at the center of our galaxy using radio frequencies. The speaker also differentiates between linear and non-linear operators and discusses Robert's use of stencils in computing derivatives in the rotated coordinate system. Additionally, he explains non-maximum suppression, the concept of applying edge operators everywhere to get a weak response everywhere, but a strong response on the edges.

  • 00:40:00 In this section, the speaker discusses the concept of edge detection and emphasizes the drawbacks of applying a threshold for edge detection. Instead, the speaker proposes removing everything except the maximum value in the gradient direction to identify the edge point. The speaker also talks about non-maximum suppression and the issues of asymmetry in tie-breaking. Finally, the speaker explains how to fit a parabola to the edge response profile to determine the sub-pixel edge position. The speaker acknowledges that the choice of the shape of the curve is arbitrary but explains how fitting a second-order polynomial could work as a good guess in most cases.

  • 00:45:00 In this section, we learn about edge detection using sub-pixel interpolation. The gradient direction tells us the orientation of the edge, which we then quantize to assist in projecting the potential edge point onto the actual edge location. We can then perform bias compensation to more accurately estimate the edge position using a parabolic or triangular method. By doing this, we can find the peak of the edge and improve accuracy by taking the closest point to the origin.

  • 00:50:00 In this section of the lecture, the speaker discusses a method for correctional calibration of peak finding for subpixel edge detection. Essentially, the method involves moving the edge experimentally and measuring the accuracy of the peak-finding method against the actual peak value, in order to create a correctional lookup table for the method. The speaker also talks about how the edge shapes can differ and demonstrates how to approximate the shape using a one-parameter fit. Despite these differences, only a small correction to the method is necessary for subpixel edge detection accuracy.

  • 00:55:00 In this section of the lecture, the professor discusses the concept of fuzzy edges and why they are important for sub-pixel recovery and avoiding aliasing problems. The professor explains that one reason for fuzzy edges is defocus. Using the example of a camera lens, the professor shows that an object in focus will be captured as a point, whereas the same object that is slightly out of focus will be captured as a circle with uniform brightness. To compensate for this, the professor introduces the unit step function and the point spread function, and explains how these can be used to describe the circle of uniform brightness as a function of x and y.

  • 01:00:00 In this section, the speaker explains the effect of being out of focus and how to calculate the response geometrically by superimposing the edge and the circle. The area of the sector of the circle and the area of the triangle are used to find the difference between the two shapes. Theta is used to calculate the area, and the details are explained to demonstrate the response between zero and one.

  • 01:05:00 In this section, the speaker discusses plotting a diagram to calculate the error in determining edge position accurately using an algorithm. They mention that this error could be small but non-zero and is essential to take into account for high accuracy. The speaker then talks about ways to avoid quantization of gradient directions, which can introduce awkwardness due to the spacing that comes in two sizes. They discuss that this may cause slightly different error contributions and suggest a couple of ways to avoid it. The section ends with a discussion on patent infringement and ways to avoid it, where the focus is on making the invention different rather than better.

  • 01:10:00 In this section of the video, the lecturer discusses a preferred method to avoid quantizing gradient directions present in certain patents. Instead of using that method, he suggests to interpolate to avoid the quantization of the gradient directions. By interpolating, the values can be approximated smoothly, and the gradient direction can be determined precisely. The lecturer believes this method is an improvement in accuracy, eliminating the need to build a lookup table or quantize and make corrections on the bias graph. The downside to this approach is that an interpolation is being used, so there is a lack of accuracy compared to knowing the exact measured value, but that can be negligible in many cases.

  • 01:15:00 In this section of the lecture, the speaker discusses an alternative method for gradient calculation that involves fixing the step size instead of changing it. This method uses a circle to determine the pixel spacing and provides a more continuous gradient direction with less quantization. However, this approach requires interpolation, either bilinear or bicubic, and can be extra work due to the need to account for more pixels. Additionally, the speaker talks about the usefulness of multiscale analysis for finding sharp edges and blurry edges in images. Finally, the speaker briefly touches upon the preferred implementation for the cartesian to polar coordinate transform, which involves rotating the coordinate system.

  • 01:20:00 In this section, the speaker discusses a method for rotating an image to reduce the y-component of the gradient to zero using an iterative approach. To do this, the angle of rotation is manipulated iteratively until the magnitude of the y-component is reduced to zero. The speaker suggests a strategy of using a sequence of test angles and reducing the magnitude of the y-component with each iteration. The angles are chosen so that they are inverse powers of 2, which allows for a reduction in the number of multiplications from four to two. The iterative approach is repeated until the angle of rotation is small enough.

  • 01:25:00 In this section, the speaker explains the concept of chordic which involves rotating through special angles that have a property where the tangent of theta i is one over two to the i. The iterative process involves changing through that angle and keeping track of whether it got negative or not. The first thing to do is to get it to the first octant which is trivial by just looking at the signs of x and y and whether y is greater than x. The next lecture will cover multi-scale and sampling, and the speaker reminds viewers to start early on the quiz as it is more work than the typical homework problem.
Lecture 12: Blob Analysis, Binary Image Processing, Green's Theorem, Derivative and Integral
Lecture 12: Blob Analysis, Binary Image Processing, Green's Theorem, Derivative and Integral
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 13: Object Detection, Recognition and Pose Determination, PatQuick (US  Patent 7016539)



Lecture 13: Object Detection, Recognition and Pose Determination, PatQuick (US Patent 7016539)

The lecture focuses on object detection, recognition, and pose determination, with an emphasis on the PatQuick patent (US 7,016,539). The patent aims to detect and determine the pose of objects in space and offers an improvement over previous methods, using an abstract representation called a model that is compared to a runtime image at different poses and rotations. The patent also incorporates a list of generalized degrees of freedom to increase accuracy and uses low-pass filtering and edge detection to obtain boundary points, postponing thresholding until the final stages. Additionally, the lecture discusses the process of creating models using edge detection and probes with desired spacing and contrast to represent these models, explaining the importance of considering degrees of freedom such as translation, rotation, scaling, and aspect ratio, which allow for variations in object dimensions and perspectives.

The video discusses the hexagonal search patterns utilized for efficient and scalable translational search in object detection, including peak detection and a solution for detecting adjacent objects. The video also discusses PatQuick, a patent for determining the presence of predetermined patterns in runtime images and their multi-dimensional location. The method uses probes and a pre-computed gradient to match an object's pose, and the integration of the scoring function removes errors from the result. The video explores an alternative method for determining angle differences using dot products and emphasizes the intricacies of multi-scale operations and probe selection for different granularities. The accuracy of the method is limited by the quantization of search space.

  • 00:00:00 In this section, we are introduced to patent 7016539 which aims to detect, recognize, and determine the pose of objects in space, as well as inspect objects. The problem it works to solve is the need to manipulate objects using machinery but without accurate edge information about the objects. The prior art had four different components, and one of them consisted of binary image processing, which involved distinguishing objects from the background to create binary images, allowing for easier processing and less memory required. Local computations can be performed for certain low-level binary image processing operations such as finding the area, perimeter, and centroid of binary images, and even computing euler numbers in parallel ways, which can be achieved with parallel hardware.

  • 00:05:00 In this section, the lecturer discusses various methods for object detection, recognition, and pose determination. The method of thresholding is introduced which involves distinguishing foreground from background in an image based on some parameter. However, this method is limited as there may be no clear distinction between foreground and background. Binary template methods involve using a master image or golden template to define the object and compute a template through thresholding. Normalized correlation involves trying all possible positions for the match to find a suitable match between two images. This was the claim to fame for Cognac, an early research project in computer vision.

  • 00:10:00 In this section, the speaker discusses the process of alignment using correlation, a related method to object detection and recognition, which involves moving an image around to find the alignment where the difference between the shifted image and the other image is as small as possible. However, at present, only translation is being considered due to computation expenses, as the method requires analyzing every pixel for every possible position. In addition, the speaker relates correlation to gradient-based methods, which involve computing an offset, and discusses how this can be used to maximize the correlation by minimizing the change in time.

  • 00:15:00 In this section, the lecture focuses on recognizing an object and determining its pose, particularly in the context of aligning an integrated circuit for the next step in the manufacturing process. The speaker discusses various methods for determining alignment and notes that the sum of squared differences and correlation are commonly used, but have some drawbacks. Correlation, in particular, can give a high match even if the contrast between the images is different, and there is no clear threshold for what constitutes a match. Despite these issues, correlation remains popular due to its computational efficiency. Furthermore, the speaker notes that these methods can be improved through the incorporation of gradient-based methods, which have been utilized in optical mice.

  • 00:20:00 In this section, the lecture discusses normalized correlation and its role in image recognition. Normalized correlation is used to eliminate any offset in image brightness and make the process less sensitive to changes in the optical setup. The normalization method computes the correlation of two images and normalizes it to remove shifts in contrast, whereby the method calculates the peak so users can measure the success of the correlation. Consequently, a high correlation score indicates a good match, whereas a low correlation score signifies a poor match. Although the method can be costly, it was a claim to fame for Cognex in their early days.

  • 00:25:00 In this section, the video discusses a patent related to object detection and recognition, specifically for determining the presence of predetermined patterns in an image and determining their locations within a multi-dimensional space. The patent, which is an improvement over previous methods, includes using an abstract representation of the pattern called a model, which is compared to a runtime image at different poses, rotations, etc. The comparison produces a match score, which is compared to an accept threshold to delay decision making until more information is available. The patent also provides a list of generalized degrees of freedom instead of just translation and rotation to increase its accuracy for partial or missing parts of an object.

  • 00:30:00 In this section, the patent for object detection, recognition and pose determination known as PatQuick, which focuses on obtaining potential matches, is discussed. The section dives into how the patent uses low-pass filtering and edge detection to obtain boundary points at different resolutions. The process then continues by connecting neighboring boundary points that have consistent directions to organize the points in the chain. The patent differs from other methods, as it chains together edges, even if they are weak, and postpones thresholding until the very end.

  • 00:35:00 In this section, the speaker discusses the creation of models for object recognition using edge detection, and the process of creating probes with desired spacing and contrast to represent these models. The models are fitted to the edges, and these probes are used to detect whether there is a match between the model and the image being analyzed. The probes are used as points of evidence to identify areas of high contrast, and this method helps reduce the number of pixels that need to be analyzed. Tie breaking is also discussed in the context of determining the order of the neighbors of the probes.

  • 00:40:00 In this section, the speaker discusses different examples of how to compare the gradients observed in the runtime image with those of the model. He explains that the direction of the gradient is much more likely to be maintained even in the case of changes in illumination or material. The speaker also introduces the concept of weight, which helps to determine the importance of each probe. While manual weight assignment can be useful in accounting for object symmetries, it requires human intervention and is not commonly used. Finally, the speaker defines the different objects in the model, including the probes, their positions, directions, and weights, as well as the compiled probe object used to increase computational efficiency.

  • 00:45:00 In this section, the speaker explains how to map the compiled probe object onto the image and how to use the model. The compiled probe is a set of probes specialized to image coordinates, and the main difference between it and a probe is that an offset in the compiled probe is an integer in pixels as opposed to real variables. The speaker also discusses the concept of a map that is the transformation with many degrees of freedom that must be found, and it includes all transformations except translation. To score the gradient, a grading function is used, which considers the polarity, contrast polarity and 90-degree difference between the two directions of the gradients.

  • 00:50:00 In this section, the speaker explains how to rate how well a probe matches a corresponding point in a runtime image using a function that considers the direction and magnitude of the gradient. However, he notes that contrast reversals can make the direction-based metric less robust against noise, while using a wider slop can increase the chances of accepting random alignments. To deal with degrees of freedom, the speaker provides examples of parameters and functions used for rotation, scale, and shear adjustments. Overall, the process of object detection requires various considerations as different situations may call for different approaches.

  • 00:55:00 In this section, we learn about generalized degrees of freedom in object detection, recognition, and pose determination. These degrees of freedom - such as translation, rotation, scaling, and aspect ratio - allow for variations in object dimensions and perspectives. It is important to factor in such degrees when working in spaces that aren't exactly two-dimensional, which makes the image appear as a rhombus instead of a rectangle. However, it is essential to be cautious of computational costs when considering scaling, and a more reasonable approach is to work in a logarithmic scale. Additionally, the probe minimum enclosing rectangle can cut down computations in some operations. The multi-dimensional space of poses dictates that we need to determine proximity between certain values, and this is done through identifying how close together two poses are in that space.

  • 01:00:00 In this section of the video, the speaker explains the search patterns used for efficient and scalable translational search in object detection. These patterns are organized around hexagons to provide a four over pi advantage in terms of work done versus resolution. The speaker also discusses how peak detection works on a hexagonal grid and offers a solution to avoid detecting adjacent objects. Additionally, the video defines terms commonly used in patent law, such as object, image, brightness, granularity, and boundary, and their applications beyond visible light images, such as graphics and x-ray images. The generalization of these terms aims to widen the scope of the patent and its potential applications.

  • 01:05:00 In this section, the video discusses a patent on PatQuick, a method for determining the presence or absence of at least one instance of a predetermined pattern in a runtime image and for determining the multi-dimensional location of each present instance. The patent incorporates the possibility of inspection and recognition, wherein the process is run for each object and most won't be a good match, but one will be for recognition. The video also mentions the use of a gradient, which is a vector that gives the direction and magnitude of greatest change in brightness at a specified granularity, and a model, a set of data encoding characteristics of a pattern to be found, which could be created from a real image or a CAD drawing.

  • 01:10:00 In this section, the speaker explains how PatQuick's method works even if parts of an object are obscured or missing, making it useful for inspection purposes. The method uses probes to match the object's pose, and although theoretically the gradient can be computed at each match, it is advantageous to pre-compute it for efficiency. The integration of the scoring function is used to calculate how much random matches offset the score, and despite being a nuisance to compute, it is necessary to remove error from the result and reduce noise. The method has primarily method claims, and the legal situation changed, resulting in only method claims.

  • 01:15:00 In this section, the speaker discusses an alternative method for determining angle differences between unit vectors using dot products instead of using a tangent function. However, this method produces a large absolute value and is not as good as the original method. The speaker also discusses the disadvantage of the method being quantized and the need to search the whole pose space to find potential matches before using a finer quantization for more accurate results. The section ends with a mention of the need to discuss different scoring functions.

  • 01:20:00 In this section, the speaker discusses the different computations involved in finding a match when the result is either required to be precise or fast. They delve into the intricacies of running multi-scale operations that use different probes and models for different granularities. The probes are not restricted to the pixel grid, but derived from edge points, which provides more reliable results than using brightness contrast. Additionally, the accuracy of this method is limited by the quantization of search space, which can be surpassed in another patent covered in the future lectures.
Lecture 13: Object Detection, Recognition and Pose Determination, PatQuick (US 7,016,539)
Lecture 13: Object Detection, Recognition and Pose Determination, PatQuick (US 7,016,539)
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 14: Inspection in PatQuick, Hough Transform, Homography, Position Determination, Multi-Scale



Lecture 14: Inspection in PatQuick, Hough Transform, Homography, Position Determination, Multi-Scale

In this lecture, the PatQuick algorithm is discussed, with a focus on the use of probes to produce a scoring function in a multi-dimensional space, which determines the pose of an object in real-time images. The matching function used to grade the quality of the match in terms of the direction and magnitude of the gradient is also examined, with different scoring functions discussed for trade-offs between accuracy and speed. The lecture also delves into different methods used to make the process of pattern matching more efficient, including adjusting the granularity of the computation and addressing the challenge of getting the directions right, especially when performing transformations that change the aspect ratio of an image. The lecture also touches on the topic of homography and the Hough transform for detecting lines in photographs.

The lecture covers a range of topics related to computer vision, including Hough Transform, Extended Gauss Half Transform, position determination, multi-scale sub-sampling, and SIFT. The Hough Transform is used for line and edge detection, while the Extended Gauss Half Transform is a more sophisticated version of the Hough Transform. The lecture also explains how to use the Hough Transform to detect circles, such as the location of a cell tower. In addition, the speaker discusses sub-sampling images to decrease the workload without sacrificing quality, and introduces SIFT, a method for finding corresponding points in different images of a scene, which is widely used in producing 3D information from multiple pictures. Finally, the speaker briefly discusses music theory and ends with a reminder to submit proposals and a quote about not delaying.

  • 00:00:00 In this section, the speaker discusses the PatQuick algorithm and the use of probes to produce a scoring function in a multi-dimensional space. The algorithm looks at a small number of points in the image and can handle a large number of degrees of freedom. The patents discussed are related and are part of a physics-based approach to machine vision. The algorithms described are mostly restricted to situations involving two-dimensional surfaces, such as integrated circuits and printed circuit boards.

  • 00:05:00 In this section, the speaker discusses a training step in the PatQuick technique where an image is shown to the system, and it automatically computes a model. This is a crucial step because it saves resources and time rather than handcrafting the code for each visual task. The models are then mapped onto real-time images, and the pose is determined through translation, rotation, scaling, skew, and aspect ratio. The evidence collected for the object is cumulative, and the final result is the sum of the local operations. However, the limitation of this method is the quantization of the pose space, which can affect the accuracy.

  • 00:10:00 In this section, the speaker discusses the potential six-dimensional space that can arise from dealing with patterns of different sizes and shapes. While translation has two degrees of freedom and rotation has one, scaling, skew, and aspect ratio each have one degree of freedom, bringing the total up to 6. However, dealing with all six parameters becomes impractical since quantizing the space to a reasonable number of levels, such as 100, results in a total of 10 to the 12 spaces. The speaker also goes on to explain the matching function used to grade the quality of the match in terms of the direction and magnitude of the gradient, highlighting some disadvantages of the function, including the possibility of matching to background noise.

  • 00:15:00 In this section, the lecturer discusses various scoring functions used in the PatQuick algorithm for trade-offs between accuracy and speed. Different scoring functions have different features such as normalized values, meaningful scores, or just the value being bigger with a better match. The lecturer explains that they discard negative weights and use the direction of the gradient to calculate the score. The focus is on compiled probes and varying translation. The lecture also highlights a second version of the scoring function called s1b, which removes the need for multiplication and only processes probes with positive weights.

  • 00:20:00 In this section, the speaker discusses different functions used for the preferred embodiment in PatQuick. One function takes into account the gradient direction and subtracts a term based on random matching to improve the result. Another function uses the gradient magnitude directly and is not normalized, meaning its absolute value will not be significant. These functions are used in the candidate solution and fine scanning steps in PatQuick. The speaker notes that while the preferred embodiment has different functions, other alternatives are also given for implementation.

  • 00:25:00 In this section of the lecture, the speaker discusses some of the details involved in making the process of pattern matching more efficient. One important consideration is the granularity of the computation, which may be adjusted by decreasing resolution until a satisfactory result is achieved. The speaker also touches on the issue of normalizing, explaining that for some tasks, it is not necessary to normalize because it is a computational issue. Additionally, the speaker addresses the challenge of getting the directions right since the process relies heavily on the gradient direction, especially when performing transformations that change the aspect ratio of an image.

  • 00:30:00 In this section of the lecture, the speaker discusses how to deal with the issue of gradient direction when transforming x and y in ways that do not preserve right angles. The solution is to compute the isophote from the gradient direction, transform it, and construct something at right angles to the isophote. The speaker also touches on the additional topic of inspection, which involves using probes in the model to determine whether a certain area is a reasonable match or not, and calculating a percentage based on how many edges in the runtime image match something in the model.

  • 00:35:00 In this section, the lecturer discusses projection of a flat surface in a 3D world using perspective projection and a camera coordinate system. He elaborates on the translation and rotation relationships between the camera and world coordinate systems through an orthonormal matrix. The lecturer then explores the transformation from world object coordinates to image coordinates and notes the nonlinear and messy nature of perspective projection when involving division. However, he focuses on the particular case of planar surfaces and details how the system can be erected in the object, allowing for simpler transformation.

  • 00:40:00 In this section, the speaker talks about using a coordinate system where z is zero, turning the 3D surface into a 2D surface. They demonstrate how one can ignore the third column in this case and conveniently fold in translation to rotations to get a single matrix. They then introduce matrix T, which is not orthonormal as opposed to matrix R. Finally, they discuss the degrees of freedom for translation and rotation in 3D and the different ways to think about rotation.

  • 00:45:00 In this section of the video, the speaker discusses rotation, translation, and constraints in matrices, specifically in the case of perspective projection onto a planar surface. The matrix for transformation has nine independent elements but only six degrees of freedom due to constraints such as orthonormality and orthogonality. Although calibration data can be fit using linear least squares, the constraints must also be enforced, which is often overlooked in published works. These concepts will be important for later discussions on 3D transformations.

  • 00:50:00 In this section of the video, the lecturer discusses the scale factor ambiguity and homography, a funny kind of matrix. The homography is used in photogrammetry and is applied when confining attention to a plane. The lecturer also talks about the Hough transform and its generalization, which is used when mapping points on a road from camera footage. Finally, the lecturer describes the NASA cloud chamber and how people studied elementary particles by shooting them into a cloud chamber and taking pictures of the ionized points in that space.

  • 00:55:00 In this section, the lecturer discusses the history of automating the process of image analysis, specifically for the purpose of detecting lines or arcs in photographs of Wilson bubble chamber pictures. Hough transform was developed as a solution to handle the challenge of detecting lines that were not spaced evenly or uniform in size, so lines were mapped from image space to parameter space for lines. The lecturer explains the concept of an accumulator array to count evidence for each possible parameter combination and looks for peaks to correspond with lines in the image. The mapping from parameter space to image space allows for a good estimate of the line, even if the evidence is just a bubble.

  • 01:00:00 In this section, the lecturer explains the concept of the Hough Transform, which is a technique to detect the presence of simple objects such as lines, circles or ellipses within an image. The Hough Transform works by mapping the image space to a parameter space, where each point in the transformed space represents a line in the original space. The transform is symmetric such that all the lines in the original space map to unique intersections in the parameter space. The lecturer takes an example to explain how bubbles in an image can give evidence about possible lines, and by finding their transform in the parameter space, one can accumulate evidence to find the peaks that correspond to the lines in the transformed space.

  • 01:05:00 In this section, the lecturer explains the Hough Transform, which is used for line and edge detection in images. The Hough Transform creates a space for the possible parameters of the transformation, with each point corresponding to a particular line, which can help gather up the evidence even if the line is in ratty and distributed in uneven intervals. However, the Hough Transform may not be used in edge detection anymore, as there are better methods in place. The lecture also briefly mentions the Extended Gauss Half Transform, which is a more sophisticated version of the Hough Transform, that has trade-offs and little tricky things that need to be dealt with. Additionally, the lecture talks about circles, and how the Hough transform can be used in detecting cell phone signals by determining the timing advance in the signal.

  • 01:10:00 In this section, the speaker discusses how to use the extension of the Hough transform to solve problems involving circles, such as determining the distance from GPS coordinates. By taking measurements of timing advances and constructing circles of possible positions based on the radius given, it becomes possible to use an accumulator array to update data and gradually accumulate evidence that identifies the circle's location. This method can be generalized to a larger parameter space, including cones with varying radii, and each point in the space corresponds to a different circle at a particular position in the plane. The final result should contain many circle intersections, indicating where the true location of the cell tower is located.

  • 01:15:00 In this section, the lecture discusses the idea of the generalized half transform, which involves the original parameter space and evidence accumulation to create a score surface; this is useful when detecting features such as edges or textures, which may only be apparent at a particular scale or with specific noise levels. By working at lower resolutions or reducing the dimensions, we can reduce computation costs and improve the ability to detect features accurately. However, this method may become an expensive task when working on higher-dimensional problems and with high levels of noise.

  • 01:20:00 In this section, the speaker discusses different methods of sub-sampling images in order to reduce the number of cells and decrease the workload without sacrificing the quality of the image. They explore different values of “r” and how they affect the level of sub-sampling, with “r” equal to one over square root of two being a commonly used value because it reduces the number of cells by two and increases the spacing by square root of two. The speaker also introduces SIFT, a method for finding corresponding points in different images of a scene that is widely used in producing 3D information from multiple pictures. SIFT uses a much less aggressive sub-sampling method, with multiple steps per octave, to create unique descriptors for each point in the image.

  • 01:25:00 In this section, the speaker briefly discusses the musical scale where an octave is divided into eight notes, and mentions that although they are not equally spaced, there are good reasons not to always use a factor of two. The speaker also reminds the audience to submit their proposals and shares a quote from a fortune cookie about not delaying.
Lecture 14: Inspection in PatQuick, Hough Transform, Homography, Position Determination, Multi-Scale
Lecture 14: Inspection in PatQuick, Hough Transform, Homography, Position Determination, Multi-Scale
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 15: Alignment, PatMax, Distance Field, Filtering and Sub-Sampling (US patent 7065262)



Lecture 15: Alignment, PatMax, Distance Field, Filtering and Sub-Sampling (US patent 7065262)

The video discusses several techniques and patents related to pattern recognition and object detection. One such technique is PatMax, which iteratively improves the pose of a runtime image using an attractive force-based system. Another technique involves generating a vector field on a pixel grid to improve runtime image alignment. The lecture also covers the use of distance fields for edge detection and expanding seeded edges by looking at force vectors in the vector field. The speaker also discusses the use of multi-scale pattern matching and the mathematical steps involved in fitting lines to sets of image coordinates. Finally, a patent for efficiently computing multiple scales is introduced.

In Lecture 15, the lecturer covers various techniques and shortcuts for efficient convolution, filtering, and sub-sampling of images. These include approximating filter kernels using spline piecewise polynomials, using derivatives as convolutions, compressing images by repeatedly taking the third difference, and combining x and y direction convolutions. The speaker also mentions the importance of low-pass filtering before image sampling to avoid interference and aliasing in images.

  • 00:00:00 In this section, the video discusses another pattern for finding objects in two-dimensional images, called PatMax. It differs from the previous pattern, PatQuick, by assuming that one already has a rough idea of where things are and instead aims to improve that position incrementally with an iterative least squares approach. The motivation for using PatMax was to maximize energy, inspired by the forces between magnetic dipoles. However, the intuition behind the approach was all wrong, and a much better analogy would be connecting things with a spring. The patent is also partially about alignment and references other patents and publications from the old AI lab.

  • 00:05:00 In this section, the video explains the training process of a pattern recognition system using edge detection that produces edge dipoles and creates a two-dimensional vector field. The system then uses an attraction process to iteratively find a good pose for a runtime image assuming a starting pose has already been obtained. The client map is used to map pixel positions that are not on a square grid to a square pixel array, and there are measures like RMS error and inspection evaluations used to determine whether an object is in good shape or not. Finally, the video describes how the field dipole list produces the probes that are used for alignment with the runtime image.

  • 00:10:00 In this section, the lecturer talks about improving alignment using a field generated on the pixel grid. The pose is the opposite of the previous patent, with feature detection being done on the runtime image instead of the model. The purpose of the field is to map discrete results from the runtime image back to the field, making it cheaper than transforming the whole image, which was the case with the previous patent. The field is generated through a new process that draws one towards the alignment where objects in the runtime image match objects in the training image. The lecture investigates how the field is generalized, and highlights the different steps involved in computing the field.

  • 00:15:00 In this section, the video discusses the process of initializing and filling in a distance field for edge detection, which is a common technique used in machine vision called the distance map. The initialization involves giving field dipoles a value corresponding to the distance from the edge along with its direction. The process of filling in the rest of the squares near the edge is an iterative process where the value of nearby squares is determined and adjusted according to the computed geometry. The distance field is essentially a groove along each edge that tells how far it is from the edge. The ultimate goal is for each edge to be connected so that the system settles into a lower energy state.

  • 00:20:00 In this section of the lecture, the speaker discusses the process of extending the seeded edges by looking at neighbouring pixels and calculating the force and direction to the edge using a vector field. They explain that sometimes the angles between forces become too large, indicating a corner, and that in such cases, the vectors will no longer point at the original edge pixels. Additional information, such as contrast direction and vector directions, can help in the matching process of extending the edges. The goal is to minimize the energy in the system, similar to modeling with a mechanical system of springs. The speaker notes that with an edge, it is often difficult to say with certainty how well we're matching a particular point on the edge, which will require a more sophisticated model to track.

  • 00:25:00 In this section, the speaker discusses the mechanical analog that represents the algorithm for feature detection using runtime images. The system adjusts itself using a set of forces from the many detected features on the image, and the mechanical springs are stretched outwards and adjusted using a scale transformation. The system then computes clutter and coverage to evaluate how well the runtime image matches the model. The ultimate goal for the system is to reduce energy by moving all the runtime dipoles in a systematic way, and it involves a large least square system with a natural computation method using a set of accumulators.

  • 00:30:00 In this section, the lecturer discusses various aspects of pattern matching, including translation-only and translation-and-rotation cases. The lecturer explains that the tensor used in pattern matching is a multi-dimensional array that allows for degrees of freedom in alignment. The lecturer also talks about multi-scale pattern matching, which involves working at low resolution to get a starting pose and then using this to perform high-resolution pattern matching. The lecturer notes that the pattern matching method can be applied to a range of devices used for practical purposes, from tv cameras to electron microscopes. Finally, the lecturer discusses the claims made in the patent, noting that claim one is very broad and likely to be challenged by prior art, but that the dependent claims provide more specific details.

  • 00:35:00 In this section of the lecture, the speaker discusses a patent for an alignment process that depends on multiple components, including low-resolution error values and initial guesses. The process, called PatMax, searches the complete pose space at a low resolution without needing a first guess, unlike the discussed patent which requires a first guess and has a capture range. The pose space for this process is the other way around from PatMax for computational reasons. The alignment process works to avoid thresholding and quantization at the pixel level, focusing on sub-pixel accuracy instead. The speaker also touches on a physical analog involving mechanical springs.

  • 00:40:00 In this section, the speaker discusses the process of object inspection and how it involves matching and determining the transformation between trained and runtime images. The inspection is based on missing and extra features in the runtime image compared to the trained image, and clutter in the image due to the background texture. The generation of the distance field is also explained, with a focus on how it changes when there are edges and corners present in the image. The process of computing the distance transform is discussed, including the challenges of working in a discrete world and the ways of approximating the euclidean distance in a fast and efficient manner.

  • 00:45:00 In this section of the lecture, the concept of adding up local forces to provide translation or rotation alignment is discussed. The weights can be predefined or depend on gradient magnitude or field dipole, among other variations. Torque around a center is used to provide rotation, and taking the z-component of the cross product of two vectors in a plane can be used to provide a scalar for the torque. The lecture then describes distance to a line and explains the rotation into a coordinate system that is aligned with a line for calculating the x and y primes.

  • 00:50:00 In this section, the speaker discusses the use of two parameters rho and theta in parameterizing the family of lines in the plane, which is a two-parameter family. This parameterization is useful in line fitting, where the objective is to find a line that fits the edge points with high accuracy. The speaker explains how to use calculus to minimize the distance squared and shows how to relate x bar and y bar, the average centroids of the points on the line, to rho and theta. Additionally, the lecture touches on moving coordinates to the centroid and finding strong relationships between theta and rho to determine the line's parameters.

  • 00:55:00 In this section, the lecturer explains the mathematical steps to finding the least square solution for fitting a line to a set of image coordinates using the Hesse normal form equation. By taking the derivative with respect to theta and setting it to zero, a solution involving sine and cosine of twice the angle is obtained, which can be simplified using trigonometric identities. This method is preferred over fitting y equals mx plus c, as it is independent of coordinate system choice and can be used for combining short edge fragments into longer edge fragments. The lecturer then introduces a patent for efficiently computing multiple scales by avoiding expensive convolution.

  • 01:00:00 In this section, the lecturer talks about efficient ways of computing filters for multi-scale purposes. The trick is to approximate a kernel with a spline piecewise polynomial, and take the n plus first difference, which makes it easy to convolve with zero, resulting in a sparse kernel with small support. The lecture also covers the n plus first sum, which is the inverse of the n plus first difference, and the properties of convolutions and differentiations. Overall, the lecture provides insights into shortcuts and tricks to make the convolution of large images with large kernels easier and more efficient.

  • 01:05:00 In this section, the lecturer discusses the properties and benefits of convolution, specifically how derivatives can be treated as convolutions if distribution instead of functions are allowed. This allows for the use of convolution properties such as commutativity and associativity, which can be very powerful in signal processing. The lecturer also describes an example of using convolution to make a pattern sparse and cheap to convolve with, which involves computing derivatives and finding the places where there are non-zero values. Only two values need to be convolved with, which is a significant advantage.

  • 01:10:00 In this section, the lecturer explains the technique of taking the third difference of an image in order to compress it. By repeatedly taking the third difference, a small and sparse set of values is produced, reducing the computation compared to using the full original image. This can be used to control the bandwidth and scale of the filter without altering the amount of computation required. The lecturer demonstrates this technique using a one-dimensional function and then shows an example with a parabola where the ends are more complicated due to a discontinuity.

  • 01:15:00 In this section of the lecture, different filtering techniques are discussed to improve the efficiency of computations in sub-sampling images while avoiding aliasing artifacts. The use of a spline to approximate filters such as the Gaussian and sync functions is explored, with a focus on reducing computation time and number of non-zero values. Additionally, a technique of combining convolution operations in both the x and y directions is presented, which requires less intermediate memory and allows for a more efficient cascade of 1D convolutions. The relevance of these topics for edge detection and multi-scale image processing is highlighted.

  • 01:20:00 In this section, the speaker discusses a calcite crystal that is birefringent and has two refractive indices depending on polarization, which causes two copies of an image to appear very close together. This is used in cameras to suppress higher frequency content and improve sampling. However, removing this filter can lead to interference and aliasing in images, as well as changes in color and shape of objects being filmed. The speaker notes that improvements in low-pass filtering before image sampling have reduced these issues, but it is still important to consider the effects of aliasing in imaging.
Lecture 15: Alignment, PatMax, Distance Field, Filtering and Sub-Sampling (US 7,065,262)
Lecture 15: Alignment, PatMax, Distance Field, Filtering and Sub-Sampling (US 7,065,262)
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 16: Fast Convolution, Low Pass Filter Approximations, Integral Images (US Patent 6457032)



Lecture 16: Fast Convolution, Low Pass Filter Approximations, Integral Images (US Patent 6457032)

The lecture covers various topics related to signal processing, including band-limiting, aliasing, low-pass filter approximations, blurring, the integral image, Fourier analysis, and convolution. The speaker emphasizes the importance of low-pass filtering the signals before sampling to avoid aliasing artifacts. The lecture also introduces the idea of the integral image, which efficiently computes the sum of pixels within a block, and various techniques to reduce computation when approximating low-pass filters. Lastly, the lecture discusses bicubic interpolation, which is used to approximate the sinc function, and its computational costs.

In this lecture, the speaker discusses various topics related to convolution, low-pass filter approximations, and integral images. They explain different implementations of convolution, including a method that saves computing time by adding values from left to right and subtracting to get the average. The limitations of linear interpolation for low-pass filter approximations and its inferiority compared to more advanced methods like cubic interpolation are also discussed. The concept of a pillbox and its value in limiting frequency ranges is introduced, and the speaker talks about the ideal low-pass filter and how defocusing affects the Bessel function. The lecture also touches on the use of low-pass filter approximations for DSLR camera lenses and the concept of photogrammetry.

  • 00:00:00 In this section, the speaker discusses sampling waveforms and the importance of band limiting them. When sampling a waveform, it's surprising that we can capture something about it, given that the waveform has infinite support and we only get discrete samples. However, if the frequency content is limited, the Nyquist theorem states that we can completely reconstruct it by sampling at a high enough frequency. The criterion is that we sample fast enough, so that the highest frequency component of the signal is less than fs over two. Ultimately, band limiting is significant because it allows us to capture the essence of a waveform without getting aliasing artifacts.

  • 00:05:00 In this section, the concept of aliasing in signal processing is explained. Aliasing occurs when frequency content above a certain threshold is sampled and indistinguishable from lower frequency content. This cannot be fixed after sampling, so it must be done beforehand by suppressing higher frequency content. To do so, it is important to low-pass filter the signal before sampling. However, true low-pass filtering is difficult to achieve, so approximations must be made.

  • 00:10:00 In this section of the lecture, the speaker discusses the concept of blurring through methods such as pre-sampling filtering and introduces the idea of the integral image. He explains that a boxcar filter can be used to perform block averaging, where the sum of the pixels within a block is calculated, but this method can be computationally expensive. To address this, an integral image can be used in both 1D and 2D cases to compute the sum more efficiently. The integral image is not restricted to only images, as it can also work for other types of matrices like integral gradient.

  • 00:15:00 In this section, the lecturer explains how to compute the total of a rectangle using the integral image. The lecturer shows that with four memory accesses and three arithmetic operations, we can get the total for any block and independent of its size. This technique can be used for recognition and blocking averaging. The lecturer also talks about Fourier Analysis and how to average a block using a moving average.

  • 00:20:00 In this section of the lecture, the speaker discusses the drawbacks of using the sinc function as an approximation for a low pass filter. The sinc function does not attenuate high frequencies aggressively enough and does not reach the first zero fast enough, making it a poor choice for low pass filter approximations. This discussion is particularly relevant to cameras, which perform a filtering operation before sampling, and block averaging is suggested as a potential alternative to the sinc function. Block averaging is cheap to compute and can be performed twice in the hope of getting a better approximation of a low pass filter.

  • 00:25:00 In this section, the lecturer discusses the properties of filters in the transform domain and how they relate to step discontinuities in images. The lecturer explains that the transform of a step function drops off as one over frequency, which means that images with step discontinuities will produce high-frequency content that does not drop off quickly. The lecturer notes that this is a problem with the discrete Fourier transform because it assumes that data is periodic, so it introduces step edge discontinuities as the data wraps around. To deal with this, the lecturer suggests apodizing, which involves multiplying the image by a waveform to make the ends match up. One common apodizing filter is an inverted cosine waveform.

  • 00:30:00 In this section, the video covers different approaches to dealing with dft applied to images, with one being to assume that the outside of the image repeats itself periodically or is a mirror image, though this isn't a perfect solution due to the potential for a derivative discontinuity. Another approach discussed is low pass filtering with an approximate filter. The video then touches on certain properties needed for approximate low pass filtering, such as the sifting property of the unit impulse and distributions.

  • 00:35:00 In this section of the lecture, the speaker discusses the unit impulse and its relationship to convolution. While the unit impulse is not mathematically correct to define as the limit of the convolution, it can be used to determine the effect of convolution with the unit impulse by calculating its convolution and taking the limit as epsilon tends to zero. The speaker notes that convolution can be connected to derivatives, and that linear shift-invariant operators and derivative operators are closely related. They explain that derivatives can essentially be treated as convolutions, with one of the two convolutions being flipped.

  • 00:40:00 In this section, the lecturer discusses low pass filter approximations and how they can improve the pixel averaging method used in cameras. He explains that additional low pass filtering needs to be done before sampling in the analog domain, and suggests using birefringent materials to create a special filter. This filter involves two shifted images that model as convolution with impulses, resulting in two slightly shifted versions of the original image. When analyzed with a Fourier transform, the filter doesn't drop off with frequency but does drop off at pi over epsilon, allowing for the selection of the appropriate epsilon value.

  • 00:45:00 In this section, the lecturer discusses the concept of low-pass filters and introduces a technique to cut high frequencies using a plate that is thicker than the pixel spacing. This plate cuts high frequencies but leaves other frequencies uncut. The lecturer explains that using this extremely simple anti-aliasing filter alongside the block averaging filter can reduce moiré effects caused by high-frequency content in images. The lecturer then introduces the idea of the patent and the integral image, which aims to cut down computation for good low-pass filtering while minimizing the size of support. The lecturer demonstrates how to represent integration using convolution and provides the Fourier transform of the unit impulse.

  • 00:50:00 In this section, the video focuses on the concept of convolutions and their relation to differentiation and integration in the Fourier transform domain. It is explained that a second derivative can be obtained by convolving first-level derivatives or impulses. This concept is applied to the process of filtering, where a filter can be split into sections to reduce computation if it is sparse, which occurs when working with constant functions or polynomial approximations. By integrating or summing the results of convolving with a sparse filter, the desired answer can be obtained efficiently with fewer computations.

  • 00:55:00 In this section, the lecturer discusses the approximation of the sinc function, which is ideal for a low-pass filter but goes on forever, making it impossible to represent it fully. The lecture introduces bicubic interpolation for 2D images, where pixels are rotated and need to be interpolated. The method involves using four parts, where the curve is described by a cubic. The fourth derivative is used for filtering, and the result is far better than using nearest-neighbour or linear interpolation. It is explained that there are computational costs to approximating the sync function, making it impractical for use.

  • 01:00:00 In this section, a block averaging example is used to illustrate naive implementation of convolution by shifting a block along and adding up whatever is underneath the block. Another implementation is shown to save significantly on computing time when blocking over larger segments by adding values left to right and then subtracting to get the average. Linear interpolation is also discussed, which can be thought of as having to do with convolution by creating a function that connects the points on a discrete grid using straight lines.

  • 01:05:00 In this section, the speaker discusses the linear interpolation method for low-pass filter approximations and its limitations, particularly in terms of changes in noise and image measurements. He explains that the method involves the convolution of two boxcars, which is a linear function that mimics the sync function. He also notes that this method is inferior to more advanced methods such as the cubic interpolation method for low-pass filter approximations. Furthermore, he explains that the nearest neighbor approximation method is a piecewise constant function that is even less precise than the linear method.

  • 01:10:00 In this section of the lecture, the speaker discusses the concept of low pass filter approximations and integral images in the context of convolution. They explain how nearest neighbor interpolation corresponds to convolution with a boxcar and the benefits of using a rotationally symmetric coordinate system for natural images. They then introduce the concept of a pillbox and its value in limiting frequency ranges. The inverse transform of a pillbox is shown to be rotationally symmetric as well, varying according to the Bessel function, which is commonly used in optics.

  • 01:15:00 In this section, the lecturer discusses the ideal low-pass filter, which is the spread function's response to an impulse. The first zero of this function, which is different from that of the sync function, is used for resolution based on Aries resolution criteria. When out of focus, the lecturer shows that the spread function changes to the pillbox, and this, in the spatial frequency domain, becomes the Bessel function. He then concludes that defocusing affects the focus by changing the Bessel function.

  • 01:20:00 In this section of the lecture, the speaker discusses the use of low pass filter approximations and the resulting decrease of high frequency content, which can lead to the killing of some frequencies completely due to the presence of zeros. The speaker also talks about how to determine the step size of a DSLR camera lens by looking at the frequency domain, as well as the effect of two perspective projections in sequence not being the same as a single perspective projection. Finally, the concept of taking slightly out of focus images and convolving them as a way of determining whether an image has been modified or not is introduced.

  • 01:25:00 In this section, the lecturer discusses the concept of convolution and how it relates to multiplication in the frequency domain. They explain how using a pillbox function allows for convolving an image, but caution that multiplying defocused pictures will not yield accurate results. The lecture then transitions into the topic of photogrammetry, which uses images to create 3D information about objects and their location by matching up features such as edges between images to pinpoint the camera's location.
Lecture 16: Fast Convolution, Low Pass Filter Approximations, Integral Images (US 6,457,032)
Lecture 16: Fast Convolution, Low Pass Filter Approximations, Integral Images (US 6,457,032)
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 17: Photogrammetry, Orientation, Axes of Inertia, Symmetry, Orientation



Lecture 17: Photogrammetry, Orientation, Axes of Inertia, Symmetry, Orientation

This lecture covers various topics related to photogrammetry, including depth cues, camera calibration, and establishing the transformation between two coordinate systems. The speaker explains how to approach the problem of finding the coordinate transformation between two systems using corresponding measurements and highlights the importance of checking for the exact inverse of the transformation. The lecture also discusses finding the axes of inertia in 2D and 3D space and determining the distance between two points projected onto an axis. Overall, the section provides a comprehensive overview of photogrammetry and its applications.

Photogrammetry requires building a coordinate system on a point cloud in left-hand and right-hand coordinate systems and relating the two. The lecturer explains how to determine the inertia matrix or the axes of inertia and establish the basis vectors. They also discuss the challenges posed by symmetrical objects and the properties of rotation, such as the preservation of dot products, lengths, and angles. Additionally, the lecture covers how to simplify the problem of finding rotation by eliminating translation and minimizing the error term. Finally, the lecturer explains how to align two objects with similar shapes using vector calculus and suggests exploring other representations for rotation.

  • 00:00:00 In this section, the speaker introduces photogrammetry, which involves using images to measure and reconstruct three-dimensional surfaces. The field has its roots in map-making and was popularized after the invention of photography. The speaker discusses four classic problems from photogrammetry, including finding the relationship between two disparate coordinate systems, as well as finding the relationship between a single coordinate system and objects that may move or change. The speaker notes that while machine vision is often more concerned with the second problem, which involves recovering the third dimension from two-dimensional images, it may be advantageous to tackle the 3D problem first due to its closed-form solution.

  • 00:05:00 In this section, the lecturer explains the two types of applications for photogrammetry: 2D to 3D and 3D to 2D. The former involves recovering three-dimensional information from images and determining the relationship between two cameras in space to align them. The latter involves camera calibration, which is necessary for precise measurements using cameras, and creating topographic maps through capturing regular image intervals from a plane. The lecturer also discusses several depth cues, including binocular stereo, which is the ability to perceive depth through two eyes.

  • 00:10:00 In this section, the lecturer explains how two cameras can be used to establish depth cues using similar triangles. By imaging an object in both cameras and comparing the resulting images, the difference between the positions can be used to calculate the depth of the object. The lecture also notes that disparities in the image can be used to calculate depth as the distance is inversely proportional to the disparity. Finally, the section touches on the topic of sensitivity to error and how large errors could result from small discrepancies in measuring the disparity.

  • 00:15:00 In this section of the video, the lecturer discusses photogrammetry and the measurement of 3D positions using two cameras. They explain that increasing the baseline or focal length can improve measurement accuracy, but there are constraints on these quantities, such as ensuring the cameras are not too far apart. They also mention the challenge of calibrating the cameras if they are not perfectly aligned in a specific geometry. The lecturer then moves on to the topic of absolute orientations and how to compensate for the orientation of devices such as lidars or aerial cameras, which may not maintain a constant attitude. Lastly, they note that the discussion assumes the presence of interesting points in the images, leaving aside the matching problem.

  • 00:20:00 In this section, the lecturer explains how to find the rotation and translation of two coordinate systems in order to project rays in 3D and find the point of intersection between them. He uses the example of points measured in both a left and right coordinate system, noting that this could apply to any two coordinate systems regardless of their labels. The lecturer highlights the need for six numbers to fully specify the transformation, three for rotation and three for translation, and explains that there are three degrees of freedom for each. He writes the transformation formula, emphasizing that the rotation does not have to be represented as an orthonormal matrix.

  • 00:25:00 The lecture discusses the properties of rotation and the orthonormal matrix, which is essential in understanding how to compute the rotation and translation of objects. The lecture also talks about how enforcing the orthonormality constraint eliminates reflections and how the inverse of a rotation matrix can be easily obtained. A physical model is also presented for better visualization of how the points from the left and right coordinate systems can be superimposed and lined up.

  • 00:30:00 In this section, the speaker discusses how to approach the problem of finding the coordinate transformation between two systems using corresponding measurements. This problem can be approached in a least squares way, where the objective is to minimize the distance between the transformed vector in the left coordinate system and the right coordinate system. This can be thought of as an energy minimizing problem, where the system tries to adjust itself to minimize energy. The speaker emphasizes the importance of checking that the transformation from the right system to the left is the exact inverse of the transformation from the left system to the right. Separating the translation and rotation problems simplifies the problem to only three degrees of freedom at a time.

  • 00:35:00 In this section, the speaker explains how to construct a coordinate system using measurements of points on an object. The first step is to pick a point as the origin and connect it to a second point to create one axis. The separation between the first two points is normalized to create the x-axis, and a third point is used to define the x-y plane. The y-axis is created by removing the component of the vector from the first point to the third point that's in the x-axis direction, and making the resulting vector perpendicular to the original. The z-axis is defined as the cross product of x and y, as it is perpendicular to both vectors. This process allows for the creation of a coordinate system and the measurement of points in both coordinate systems for an object.

  • 00:40:00 In this section, the speaker explains how to build a coordinate system and solve for rotation. To do this, they use a triad of unit vectors to define a coordinate system for the left and right. Then, they take both cloud points, build an axis, and map the unit vectors to each other to find a transformation that puts them together. They then use a 3x3 matrix to stick the separate equations together and solve for rotation. They mention that, by removing translation, there are only three degrees of freedom left to find.

  • 00:45:00 In this section, the speaker discusses the constraints involved in mapping points between coordinate systems in photogrammetry. While three correspondences between two systems might seem enough for a solution with only three unknowns, vector equalities mean that each constraint is worth three points. Thus, we have nine constraints. However, rotation only has three degrees of freedom, leading to excess information. The speaker then discusses an ad hoc solution involving selectively picking points for transformation, which is imprecise. Another solution involves using singular value decomposition (SVD) to find the optimal transformation matrix that evenly weights the information from all correspondences.

  • 00:50:00 In this section, the lecturer discusses the concept of finding the axes of inertia in 2D and 3D space. He explains that the axes of minimum inertia can be found by calculating the integral of the distance squared times the mass, whereas the perpendicular axis has a maximum inertia, and in 3D, there is a third axis that is a saddle point. He states that if these axes are identified, a coordinate system can be established for the object in question. The formula for finding the distance from the axis to the origin is also discussed, along with picking the centroid as the origin to separate the problem of finding the translation from the problem of finding the rotation.

  • 00:55:00 In this section, the speaker explains how to determine the distance between two points, r and r prime, projected onto an axis omega. The formula for inertia is derived from this distance and is shown to vary as the axis changes direction. The speaker then simplifies the formula using dot products, associativity of multiplication, and the identity matrix. The resulting formula shows that the inertia is equal to the dot product of r with itself multiplied by the identity matrix and integrated over the volume of the object.

  • 01:00:00 In this section, the lecturer explains how to build a coordinate system on a point cloud in a left-hand and right-hand coordinate system, and then relate the two. This is done by computing the inertia matrix, or the axes of inertia, which is a simple eigenvalue eigenvector problem for a three by three matrix. Three axes are found that are perpendicular to each other- the maximum, minimum, and saddle axis. These axes are used to establish the basis vectors, and the same method is done for the right-hand coordinate system. The method that is used for doing this is a least-squares problem as it treats all points equally and minimizes the problem.

  • 01:05:00 In this section of the lecture, the speaker discusses the limitations of ad hoc methods in photogrammetry when dealing with symmetrical objects. The speaker explains that some objects, such as a sphere, tetrahedron, and octahedron, have the same inertia in all directions, making it difficult to determine their orientation using an ad hoc method that relies on elongation. Additionally, the speaker notes that using correspondences to determine orientation is a more accurate but challenging approach since it requires knowing the alignment of each point. The speaker also explains the properties of rotation, including the preservation of dot products, lengths, and angles.

  • 01:10:00 In this section, the professor discusses the triple product of vectors, which is the volume of a parallelepiped formed by those vectors. If those vectors are rotated, then their volume will be preserved if the rotation is not a reflection. A reflection would change the sign of the triple product and, therefore, the volume, resulting in a left-hand rule instead of a right-hand rule. This principle is important when setting up a least squares problem to find the transformation between two coordinate systems, where the offset and rotation need to be chosen to minimize the error between the two systems.

  • 01:15:00 In this section, the lecturer explains how to simplify the problem of finding the translation from finding the rotation. They do this by moving the coordinates to the centroid and subtracting them from the original coordinates to get rid of the translation, making the rotation problem much easier to solve. The lecturer then plugs in the new coordinates to the error formula and groups the terms, eventually arriving at a simpler problem to work with. The lecture ends with a question of what offset to choose for the translation.

  • 01:20:00 In this section, the lecture focuses on separating the problem of finding translation from the problem of finding rotation. The formula for the translation is the difference between where the centroid is in the right coordinate system and where the left coordinate system centroid is after rotating it. The next objective is to minimize the remaining error term, which involves finding the correct rotation. By maximizing the remaining term that depends on the rotation, the lecture aims to find the correct rotation, which makes intuitive sense when imagining a cloud of points connected to the centroid with a spikey, sushi-like appearance.

  • 01:25:00 In this section, the lecturer explains how to align two objects that have a similar shape using vector calculus. By taking corresponding spines of the objects and using the dot product between them to determine the angle, the objects can be aligned. However, this poses the problem of how to solve the rotation problem using calculus without having to deal with matrices complicated by added constraints. The lecturer suggests looking at other representations for rotation that make the alignment problem easier.
Lecture 17: Photogrammetry, Orientation, Axes of Inertia, Symmetry, Orientation
Lecture 17: Photogrammetry, Orientation, Axes of Inertia, Symmetry, Orientation
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 18: Rotation and How to Represent It, Unit Quaternions, the Space of Rotations



Lecture 18: Rotation and How to Represent It, Unit Quaternions, the Space of Rotations

This lecture discusses the challenges of representing rotations and introduces the usefulness of Hamilton's quaternions. Unit quaternions are particularly useful as they directly map onto rotations in three space, allowing for a discussion of a space of rotation and optimization in that space. Quaternions have properties similar to complex numbers and are particularly useful for representing rotations as they preserve dot products, triple products, length, angles, and handedness. The lecture also discusses different methods of representing rotation, the importance of being able to rotate vectors and compose rotations, and the limitations of conventional methods such as matrices, Euler angles, and gimbal lock. Finally, the lecture presents ongoing research in the field, including optimizing and fitting rotations to models, and developing new methods for analyzing and visualizing rotation spaces.

In this lecture, the professor discusses the problem of finding the coordinate transformation between two coordinate systems or the best fit rotation and translation between two objects with corresponding points measured in the two coordinate systems. The lecture explores the use of quaternions to align spacecraft cameras with catalog directions and solve the problem of relative orientation. The efficiency of quaternions in representing rotations is discussed, as well as different methods for approaching the representation of rotations in four-dimensional space. Additionally, the lecture explores various rotation groups for different polyhedra, emphasizing the importance of selecting the correct coordinate system for achieving a regular space sampling.

  • 00:00:00 In this section, the speaker discusses the challenges of dealing with rotations, as they are not commutative like translations. The goal is to develop a useful and general method to deal with rotations in photogrammetry and robotics. Hamilton's quaternions provide a more general way to represent rotations, particularly when restricted to unit quaternions, which can map directly onto rotations in three space. This allows for the discussion of a space of rotation and optimization in that space. The applications are vast, from robotics to biomedical science, and the speaker aims to develop a closed-form solution for problems involving the measurement of two objects in different coordinate systems or one object that moved.

  • 00:05:00 In this section, the topic of rotation is introduced and explained. Euler's theorem states that any rotation of a rigid object has the property that there is a line that is not changed, which is the axis. The parallel axis theorem states that any rotation about any axis is equivalent to a rotation about an axis through the origin, plus a translation. To simplify things, it is convenient to separate translation and rotation. The rotational velocity is much easier than the finite rotations themselves since angular velocity only requires a vector and a rate. Finally, finite rotations don't commute, and for n = 3, there are three degrees of freedom.

  • 00:10:00 In this section, the lecturer explains that it is best to think of rotations as preserving certain planes. For instance, the xy plane can be preserved while the things in it are moved to a different location. The lecturer also notes that cross products have three degrees of freedom and are represented as vectors because they are perpendicular to the two vectors being multiplied. Representations for rotation exist, and one useful method is the axis and angle notation where the axis is a unit vector and the number of degrees turned is represented by an angle. The Gibbs vector is another notation that combines the axis and angle into a single vector, although it is no longer a unit vector and blows up at theta equals pi.

  • 00:15:00 In this section, the lecturer explains the various ways to represent rotation, including Euler angles, orthonormal matrices, exponential form, stereography, and complex matrices. Each method has its own constraints, and there are 24 different definitions for Euler angles, making it confusing. However, unit quaternions are the most popular and useful method for representing rotations because they have many advantages, such as being compact, easy to interpolate, and not affected by Gimbal lock. It's also essential to be able to convert between different rotation representations.

  • 00:20:00 In this section, the speaker discusses the problem of rotating a vector and finding its position in a rotated coordinate system, as well as composing rotations. The speaker introduces Rodriguez's formula, which addresses the first problem by taking a vector and rotating it through an angle about a given axis. By breaking down the problem into a 2D one, the speaker shows how the rotation formula is simple in the plane, but more complex in 3D. The speaker explains that axis and angle notation is useful for visualizing rotations, but composition is difficult to achieve.

  • 00:25:00 In this section, the lecturer discusses different representations of rotation, including mapping a sphere onto a plane using a projection technique, which preserves angles and shapes. He also mentions the importance of being able to rotate vectors and compose rotations, as well as having an intuitive representation like axis and angle. However, he notes that some representations like rotational matrices and axis-angle can be redundant or not very intuitive. The lecturer also highlights the importance of avoiding singularities and ensuring computational efficiency while being able to interpolate orientation in graphics.

  • 00:30:00 In this section, the lecturer discusses the challenges of representing and interpolating rotations in computer graphics, as well as the need for a space of rotations that can be efficiently sampled and averaged. He points out the limitations of using matrices, euler angles, gimbal lock, and other conventional methods, and introduces quaternions as a more practical solution. He explains how quaternions can avoid redundancies and singularities, and how they can be composed, interpolated, and sampled in a way that is mathematically elegant and computationally efficient. He also highlights some of the open problems and ongoing research in this field, including optimizing and fitting rotations to models, and developing new methods for analyzing and visualizing rotation spaces.

  • 00:35:00 In this section, the speaker explains the history behind the creation of quaternions and their significance in mathematics, particularly in rotation. He explains that William Hamilton, a mathematician from Dublin, was trying to find a way to represent triplets of numbers in a way that allowed for division, so he looked to complex numbers for inspiration. Hamilton eventually discovered that quaternions, or numbers with a real part and three imaginary parts, could solve the problem. The speaker then goes on to explain the different ways of representing quaternions, including as a vector in space or a four-by-four matrix.

  • 00:40:00 In this section, the lecturer discusses different ways to represent quaternion multiplication, including using matrices and using a scalar part and three imaginary parts. The lecturer emphasizes that multiplication is non-commutative and shows how it can be represented as a product of a matrix and a vector. The lecture also highlights some basic results, including the fact that quaternion multiplication is not commutative but is associative.

  • 00:45:00 In this section, the speaker explains the properties of quaternions that make them a useful way to represent rotations. Quaternions have properties similar to complex numbers, including a conjugate that involves negating the imaginary part. The dot product can be expressed as a norm, and multiplying a quaternion by its conjugate results in a real quantity with no imaginary part, which can be used for division. In the case of unit quaternions, the inverse is just the conjugate. Quaternions can also be used to represent vectors by leaving out the scalar part, and there are many interesting properties in this space.

  • 00:50:00 In this section, the lecturer explains how to represent rotation using quaternions. Unlike simple quaternion multiplication, an operation of pre-multiplying a quaternion by a vector, post-multiplying it by its conjugate, and extracting the vector's imaginary part gives a quaternion with a zero scalar part which can be applied to rotate a vector in 3D. By representing quaternion multiplication using four by four matrices, the lecturer then shows how this operation preserves the dot products of the original vectors. Ultimately, the resulting three by three orthonormal rotational matrix can be used to rotate vectors without directly manipulating quaternions.

  • 00:55:00 In this section, the lecturer discusses the properties that define a rotation and how to represent it using a quaternion. A quaternion is a four-dimensional representation of a rotation that preserves dot products, triple products, length, angles, and handedness, which make it an appropriate representation of a rotation. The composition of rotations is straightforward in quaternion notation, whereas it is difficult in both axis-angle and Euler angles. The vector part of the quaternion is parallel to the axis of rotation, making it easy to determine the axis. The lecturer explains how to convert between axis-angle and quaternion representations and identifies that the opposite side of a sphere represents the same rotations, which is essential knowledge in photogrammetry for computing averages.

  • 01:00:00 In this section of the lecture, the speaker discusses the problem of finding the coordinate transformation between two coordinate systems or the best fit rotation and translation between two objects with corresponding points measured in the two coordinate systems. Using a physical analog with springs, the system wants to minimize the sum of squares of errors to find the rotation and translation. The first step in finding the translation is to take the centroid in the left system after rotation into the centroid of the right system, which is intuitive and doesn't require correspondences. The formula for the translation is then used to simplify the expression for minimizing the error term. The middle term is the only one that can be altered, and by maximizing it, the system can maximize the dot product of corresponding points.

  • 01:05:00 In this section, the lecturer discusses how to align spacecraft cameras with catalog directions using quaternion notation. They use quaternions to map the direction to stars in the camera with catalog directions, where the goal is to maximize the dot product of these two quaternions. However, since this can result in large values for the quaternion, there is an extra constraint that needs to be imposed. The lecturer explains two methods to differentiate with respect to the quaternion, which is used to minimize the difference between the two quaternion directions.

  • 01:10:00 In this section of the lecture, the professor discusses the eigenvector and eigenvalue of a four-by-four real symmetric matrix that is constructed from the data. Unlike in the past, where the smallest eigenvalue was desired, because of the sine flip, we need to pick the eigenvector that corresponds to the largest eigenvalue. The matrix is symmetric, meaning it has nine independent quantities, and its determinant has a cubic term that is zero. Although it has 16 independent quantities, ten of them are independent, making this matrix special. This allows for it to reduce to a cubic equation, which makes solving it easier. The professor also notes that cubic equations and quartic equations can be solved in closed form, unlike fifth-order equations.

  • 01:15:00 In this section, the lecturer discusses the desirable properties of quaternions as a means of representing rotations. These properties include the ability to rotate vectors and compose rotations easily, an intuitive non-redundant representation, computational efficiency, and the ability to interpolate orientations and take averages of a range of rotations. The lecturer then introduces relative orientation as a problem of finding the baseline and relative orientation of two coordinate systems using direction data from two points in the world. Quaternions are also useful for describing the kinematics of a robot manipulator and can help avoid problems with coordinate systems lining up, particularly in the wrist.

  • 01:20:00 In this section, the speaker discusses the efficiency of quaternions in representing rotations compared to orthonormal matrices, demonstrating that quaternion multiplications are faster for composition but slower for rotating vectors. He notes that quaternions are also easier to re-normalize than matrices. The speaker then discusses how to sample the space of rotations in four dimensions by projecting polyhedra onto the sphere of rotations, resulting in a regular and uniform sampling of the space.

  • 01:25:00 In this section, the lecture discusses different methods for representing rotations in four-dimensional space, such as using coordinate systems to simplify expressions for rotation groups. The lecture also explores various rotation groups for different polyhedra, using these groups to provide a regular space sampling of the space, so that users can try different orientations for their searches or averaging. However, it is noted that these methods may require tricks to achieve finer sampling, and that choosing the right coordinate system is crucial.
Lecture 18: Rotation and How to Represent It, Unit Quaternions, the Space of Rotations
Lecture 18: Rotation and How to Represent It, Unit Quaternions, the Space of Rotations
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 19: Absolute Orientation in Closed Form, Outliers and Robustness, RANSAC



Lecture 19: Absolute Orientation in Closed Form, Outliers and Robustness, RANSAC

The lecture covers various aspects of absolute orientation, including using unit quaternions to represent rotations in photogrammetry, converting between quaternion and orthonormal matrix representations, dealing with rotation symmetry, and coordinating translation, scaling, and rotation in a correspondence-free way. The lecture also discusses the problem of outliers and robustness in line fitting and measurement processes and introduces the RANSAC (Random Sample Consensus) method as a way to improve the reliability of measurements when outliers are present. The lecture concludes with a discussion on solving the problem of absolute orientation in closed form using two planes in a coplanar scenario, including challenges related to outliers and optimization.

In this video on absolute orientation, the lecturer discusses the issue of outliers in real data and proposes the use of RANSAC, a consensus method involving random subset fits to deal with outliers. The lecturer also discusses methods for achieving a uniform distribution of points on a sphere, including inscribing a sphere in a cube and projecting random points, tesselating the surface of the sphere, and generating points on regular polyhedra. Additionally, the lecturer covers ways to sample the space of rotations for efficient recognition of multiple objects in a library, finding the number of rotations needed to align an object with itself, and approaching the problem of finding rotations through examples or quaternion multiplication.

  • 00:00:00 In this section of the lecture, the speaker discusses the use of unit quaternions to represent rotations in photogrammetry. Unit quaternions allow for a closed-form solution to the least worst problem, providing an objective way to obtain the best-fit answer, which is more difficult with other notations. The two operations that are particularly important are the composition of rotations and the rotation of a vector, both of which can be represented using the formula discussed. The speaker also relates this notation to the axes and angle notation using the formula of Rodriguez. Overall, the use of unit quaternions allows for a more efficient way to represent rotations in photogrammetry.

  • 00:05:00 In this section of the video, the speaker discusses converting between quaternion and orthonormal matrix representations. The formula for converting quaternions to matrices involves a four by four matrix with both skew-symmetric and symmetric parts. The speaker explains that the first row and column are irrelevant since they represent a special quaternion that is a vector with a zero scalar part. To convert an orthonormal matrix back to a quaternion, the speaker recommends using the trace of the three by three submatrix. This method ends up with an equation in the form of two cosine terms that allows us to solve for the cosine of the angle between the matrices.

  • 00:10:00 In this section, the lecturer discusses different ways of computing the rotation matrix from the diagonal elements of the matrix R. While one approach is centered around the trace of the rotation matrix, it suffers from problems near theta equals zero. Instead, it's better to use the off-diagonal elements, which all depend on the sine of theta over two. The lecture then goes on to give a full inversion formula that computes various sums and differences and takes square roots. The problem with this approach is the sine ambiguity, but the lecture suggests picking the largest for numerical accuracy and solving for it.

  • 00:15:00 In this section, the speaker discusses the process of converting between quaternion and rotation matrix, both directly and indirectly, and how to account for scale in coordinate transformations. They explain the process of solving for the rotation and scaling factors using a least squares problem and minimizing the sum of four sums. The speaker highlights the importance of accounting for scale when patching together pieces of terrain obtained from successive camera positions and explains how to find the optimum in these situations.

  • 00:20:00 In this section, the speaker discusses the issue of symmetry in rotation, where the method used to compute rotation should be able to be inverted to get the inverse of the rotational matrix. The speaker also explores another error term that is preferred over previous methods because it doesn't require correspondences and can map centroid to centroid. This method involves finding the scale factor by setting the derivative of the error term with respect to the scale factor equal to zero and solving for the scale factor, which avoids cheating by making the scale factor a little smaller than it should be.

  • 00:25:00 In this section, the lecturer explains how to deal with translation, scaling, and rotation in a correspondence-free way. Using a centroid method, the scale factor can be calculated as the ratio of the sizes of two point clouds. With the rotation part, the lecturer briefly touches upon the calculus problem of maximizing a negative determinant of a matrix, N, with respect to q, the quaternion. The solution can be found using Lagrange multipliers, but a simpler method called Rall's quotient, which divides by the length of q to prevent it from becoming infinitely large, can be used as well. The resulting function is constant along any array, giving the direction of the ray that makes it as extreme as possible.

  • 00:30:00 In this section, the speaker explains how to find the matrix that maximizes sigma by differentiating the equation and setting it to zero. Using a ratio formula for differentiation, the speaker then shows how q is an eigenvector and explains that the matrix can be maximized by picking the eigenvector corresponding to the largest eigenvalue. The only constraint to this method is that the eigenvector must satisfy the constraint obtained from the correspondence data. However, unlike with orthonormal matrices, this constraint is much easier to handle.

  • 00:35:00 In this section, the lecturer discusses the number of correspondences needed for photogrammetric problems. They aim to find six things: translational rotation, and scaling, which means three constraints per correspondence, and thus, only two correspondences are needed. However, with only two correspondences, there is only five constraints; hence, it takes three correspondences to achieve this. Additionally, the lecturer mentions the possibility of generalizing the transformation to match the nine constraints obtained from three points; however, they note that these constraints are highly redundant.

  • 00:40:00 In this section, the video discusses the concept of general linear transformation in 3D, which involves 12 elements, not six like in 2D, making it difficult to determine with three correspondences. Additionally, the video explains that there are two ways for the linear transformation to fail. Firstly, if there are not enough correspondences, and secondly, if the matrix N has more than one eigenvalue of zero. The video further explains how to solve the characteristic equation to find the eigenvalues of the matrix.

  • 00:45:00 In this section of the video, the lecturer explains how to compute the matrix M by using the diatic product, which is a three by three matrix that is used in computing the four by four matrix N, which is the most efficient way of getting N. It is noted that if the determinant of M is zero, then the problem becomes particularly easy to solve because C1 is zero, allowing for the equation to be solved without needing any special textbook. This special case has to do with a distribution of points and can occur when the points are coplanar. The lecturer shows how this applies equally well if the points are all in a plane, making the problem easy to solve.

  • 00:50:00 In this section of the video, the speaker explains how to solve the problem of absolute orientation in closed form using two planes in a coplanar scenario. The full 3D rotation can be decomposed into two simple rotations, first rotating one plane so that it lies on top of the other plane, and then an in-plane rotation. The speaker explains how to find the axis and angle required to construct the quaternion and rotate all points in one of the coordinate systems to align them to the other coordinate system. Additionally, the speaker discusses the challenges of dealing with outliers in the optimization problem and how using something other than the square of error, such as absolute value of error, can lead to more work to be computed and difficulties in generalizing the results.

  • 00:55:00 In this section, the lecturer discusses the problem of outliers and robustness in line fitting and other measurement processes. He introduces the RANSAC (Random Sample Consensus) method, which involves taking a random sample of points and using least squares to find the best fit, then checking the number of points that fall inside a band and adjusting the threshold based on the noise and the ratio of inliers to outliers. The process is repeated until a good fit is obtained. The lecturer notes that the use of RANSAC can improve the reliability of measurements in situations where outliers are present.

  • 01:00:00 In this section of the video, the lecturer discusses the issue of outliers in the presence of real data and how to deal with it using the consensus method, also known as RANSAC. The method involves taking random subsets, performing fits, and looking for cells that have the most hits, which give a measure of the orientation of objects that may not have a closed-form solution. The lecturer emphasizes that this approach is useful in many applications and not just limited to absolute orientation. Additionally, the lecturer mentions that representations for complicated objects near convex can also be useful for detecting things and finding their orientation.

  • 01:05:00 In this section, the lecturer discusses the difficulties of sampling points on a sphere to achieve uniformity. A uniform distribution of points can't be achieved by sampling theta and phi using a uniform distribution generator due to the polar regions having more concentration than the others. The solution proposed is to inscribe a sphere in the cube and project random points from the cube out to the sphere. However, this still leads to a higher density of points where the sphere is tangent to the cube. To solve this, the lecturer suggests tesselating the surface of the sphere using regular solids or introducing weight to the points near the corners to contract their aggregation.

  • 01:10:00 In this section of the video, the lecturer discusses ways to obtain a uniform distribution of points on the surface of a sphere. One way is to generate points uniformly in a cube and project them onto the surface of the sphere while discarding points too close to the origin or too far away from the sphere. Another method is to divide the sphere using regular polyhedra and generating points uniformly on these shapes. However, this method requires subdivision to obtain finer divisions, unlike the first method that generates a practically uniform distribution.

  • 01:15:00 In this section, the lecturer discusses how to find uniform ways of sampling the space of rotations for various objects, which is part of the recognition process for multiple objects in a library. The lecturer explains that to be efficient, they do not want to sample parts of the rotation space more densely than other parts, and they aim to find a uniform way of sampling space. They start by discussing the hexahedron, which has rotational symmetry, and its rotations. The lecturer explains that the aim is to find rotation methods that make it easy to find point correspondences across different models.

  • 01:20:00 In this section, the speaker discusses how to find the number of rotations needed to align an object with itself, and then generates a group of rotations using two methods: geometrically, and through quaternion multiplication. An interesting new rotation, the axis of which is (1, 1, 1) and the angle is 2π/3, is found and is shown to align the corner of a cube with itself.

  • 01:25:00 In this section, the speaker offers two ways to approach the problem of finding rotations. The first way is to look at examples and add them up to get a total of 24 rotations. The second way is to implement quaternion multiplication and build a table by taking pairwise products to see if you end up with something new. The speaker then mentions that the next discussion will involve relative orientation, which is more relevant to binocular visions.
Lecture 19: Absolute Orientation in Closed Form, Outliers and Robustness, RANSAC
Lecture 19: Absolute Orientation in Closed Form, Outliers and Robustness, RANSAC
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

MIT 6.801 Machine Vision, Fall 2020. Lecture 20: Space of Rotations, Regular Tessellations, Critical Surfaces, Binocular Stereo



Lecture 20: Space of Rotations, Regular Tessellations, Critical Surfaces, Binocular Stereo

This section of the lecture covers topics including regular tessellations, critical surfaces, binocular stereo, and finding the parameters of a transformation in three-dimensional space. The lecturer explains the best way to tessellate a sphere is by using the dual of a triangular tessellation, creating approximately hexagonal shapes with a few pentagons. They also discuss critical surfaces, which are difficult for machine vision, but can be used to create furniture out of straight sticks. In the discussion of binocular stereo, the lecturer explains the relationship between two cameras, the concept of epipolar lines, and how to find the intersection of two cameras to determine a point in the world. They also explain how to calculate the error between two rays to determine their intersection and minimize the image error while taking into account the conversion factor between error in the world and error in the image. Finally, they discuss how to find the baseline and D to recover the position and orientation of a rigid object in space using a quaternion to represent the baseline.

The lecture covers various topics, including the space of rotations, regular tessellations, critical surfaces, and binocular stereo. For rotations, the instructor discusses the use of numerical approaches, the problem of singularities, and the benefits of using unit quaternions. With regular tessellations, they show how certain surfaces can cause problems with binocular stereo and suggest using error measures and weights to mitigate issues. The speaker also touches on quadric surfaces and introduces a new homework problem that involves "fearless reflection".

  • 00:00:00 In this section of the video, the speaker discusses tessellating the surface of a sphere based on platonic and archimedean solids, which have equal area projections on the sphere. The tessellation of the surface can be done using regular polygons as facets, with triangles, squares, and pentagons being commonly used. The areas of the polygons are not equal, and as a result, the tessellated surface has many divisions. This method of tessellation is relevant when discussing rotations, and the speaker explains the rotation groups of these solids. The video also mentions the use of a geodesic dome, which is based on tessellating an icosahedron into lots of triangular areas to create a regular structure.

  • 00:05:00 In this section, the lecturer discussed various regular tessellations, which are ways to divide a surface into equal-sized shapes. While square tessellations are commonly used in planes, they are not ideal for spheres, and triangular tessellations are also problematic. The lecturer highlighted a better option: the dual of a triangular tessellation, which features approximately hexagonal and a few pentagonal shapes. Additionally, the lecturer explained critical surfaces, which are hyperboloids of one sheet. These surfaces are difficult for machine vision problems, but they have the distinct feature of being ruled and can be used to make furniture out of straight sticks. Finally, the lecturer discussed hyperboloids of two sheets that have two negative signs in their equation.

  • 00:10:00 In this section, the lecturer discusses the different types of surfaces that can be created with two sheets or three negative signs. He also explains the various special cases that exist, such as the hyperboloid, cone, paraboloid, and planar surfaces. Moving on, the lecturer explains the problem of computing 3D from 2D using two cameras and how relative orientation is necessary to understand the geometry of the two cameras. The lecture concludes by mentioning how binocular stereo is applicable in autonomous vehicles, and calibration may need to be performed again if the baseline isn't rigid, but the same process also works for structure-from-motion with images before and after.

  • 00:15:00 In this section, the lecturer explains the concept of finding the intersection of two cameras to determine a point in the world, how the coordinate system is picked, and the geometry associated with this concept. The lecturer highlights that the baseline is measured in the right coordinate system, and the prime indicates how it's converted from the left coordinate system. When the point is connected to the baseline, it defines a plane, and the image of the plane in both camera systems projects into a straight line where the point is imaged somewhere along that line. The lecture also introduces the concept of epipolar lines and how they help in finding disparities that lead to a distance measurement.

  • 00:20:00 In this section, the lecturer discusses the relationship between the two cameras in a binocular stereo setup, which involves the baseline and the rotation of one camera relative to the other. The rotation has three degrees of freedom, but due to the scale factor ambiguity, the problem is reduced to five degrees of freedom instead of six, as with absolute orientation. The baseline is treated as a unit vector, giving only two degrees of freedom for that component. The lecturer explains that additional information, such as knowledge of the size of imaged objects, would be necessary to determine the absolute length of the baseline.

  • 00:25:00 In this section, the lecturer discusses how to determine the number of correspondences necessary to pin down measurements. He explains the mechanical analogy of creating a wire from image points and passing them through a collar to constrain it. The lecturer explains that if only two correspondences are used, there are still degrees of freedom meaning that changing the camera rotation is possible. Adding a second correspondence reduces the degree of freedom but is still insufficient. The answer is five, which gives one constraint per correspondence, allowing them to zero out vertical disparities in the camera orientation. The depth of field is inversely proportional to horizontal disparities. The instrument can be set up by tuning out vertical disparities, which is how optical equipment was set up for decades.

  • 00:30:00 In this section of the lecture, the speaker discusses the process of finding the parameters of a transformation in three-dimensional space using a sequence of moves and adjustments to converge, which can be a painful and complicated process. However, in practice, it is important to have more than five points to ensure accuracy and minimize the error in measurement of image position. This non-linear problem results in seven second-order equations, which once solved can give two to the seven (128) solutions. Though this is a curio for most, people interested in theoretical applications find it fun to figure out. Finally, the lecture discusses the coplanar nature of the three vectors when finding the baseline and rotation parameters from correspondences.

  • 00:35:00 In this section, the concept of constructing a parallely pipet using three vectors as edges and determining its volume through the triple product is explained. When the three vectors are coplanar, the object is flat and has no three-dimensional volume, making its expected value zero or the coplanarity condition. A potential method for minimizing the sum of squares of the triple product for each correspondence to estimate the baseline and rotation with minimal errors is discussed. However, this is not a reliable method as it has a high noise gain and can yield incorrect answers. To minimize measurement errors and determine the proportionality factor, the focus is shifted towards minimizing the minimum separation between two rays when the measurements and baseline or rotation are not perfect.

  • 00:40:00 In this section, the lecturer discusses how to calculate the error between two rays and determine their intersection. He explains that the minimum approach to the distance between two rays must be perpendicular to both of those rays, which means it's parallel to the cross product. By adding vectors and setting them equal to zero, the equation can be converted to a scalar equation using dot products, providing three different constraints. The lecturer then goes on to discuss how to simplify the equation by making certain terms drop out and how to calculate gamma, beta, and alpha, which will help determine how far out along the rays the intersection or almost intersection is.

  • 00:45:00 In this section, the speaker discusses the importance of three quantities - alpha, beta, and gamma - in calculating the three-dimensional position in the space of rotations. While gamma is simply the distance error, alpha and beta can be negative, indicating whether the intersecting line segments may be behind the viewer, which is typically not physically reasonable. The speaker mentions that a closed-form solution is not currently possible due to the fifth-order equation involved, but minimizing the image error is still achievable. By discarding solutions with negative alpha or beta and using a quintic solver, the error in the image can be minimized.

  • 00:50:00 In this section, the speaker discusses the problem of minimizing the sum of squares error in binocular stereo while taking into account the conversion factor between error in the world and error in the image. This depends on the solution and is solved iteratively. The triple product, which has been rotated from the left coordinate system to the right, is used to introduce quaternions. The speaker explains how quaternions with zero scalar parts are used to rotate the triple product from the right coordinate system to the left. The formula for multiplication between quaternions representing vectors simplifies to just the dot product and cross product. Lemma is stated without proof for a way of moving one of the multipliers to the other side.

  • 00:55:00 In this section, the speaker explains how to find the baseline and recover the position and orientation of a rigid object in space, given two images of the object taken from different viewpoints. The speaker defines a new quantity, which is the product of the baseline and rotation, and uses a quaternion to represent the baseline, simplifying the problem to finding the baseline and D. While there are 8 unknowns, there are only five degrees of freedom, so the speaker employs various constraints. They also talk about interesting symmetries that allow for the interchange of left and right coordinates. The weight, which is the relationship between the error in 3D space and the error in image position, is difficult to calculate but can be adjusted.

  • 01:00:00 In this section, the speaker discusses an optimization problem that involves calculating weights based on a good first guess, and then recalculating those weights and solving the problem again. They also touch on the symmetry between left and right arrays and how that can be useful in the numerical calculation, along with the symmetry between the rotation and translation in the triple product. Ultimately, this symmetry means that if one has an approximate solution, they can generate other approximate solutions by utilizing this symmetry. Furthermore, in the process of searching for solutions, one may find multiple versions that all yield the same results, which can help accelerate the search process.

  • 01:05:00 In this section, the instructor discusses the calculation of the space of rotations using a numerical approach that requires the assumption of one of the unknown values with a simple least square closed form solution. Another approach is to use a nonlinear optimization package, such as the Marquardt method that tunes parameters until equations are as close to being zero as possible. However, these approaches are not considered to have a closed-form solution for this problem. Moreover, the instructor explains that there is a problem with rotations because, due to the orthonormal matrix method, there are nine numbers and only three degrees of freedom, leading to a singularity with the Gibbs vector at theta equals pi.

  • 01:10:00 In this section, the speaker discusses the use of unit quaternions to represent rotations, citing that they have four numbers with three degrees of freedom. He recommends adding additional constraints to make them less redundant and states that this package allows for the addition of those constraints. He also touches on the formula for combining two rotations and the transformation of a vector, which is a bit more complicated. Additionally, the speaker highlights that there's a four-page blurb that summarizes everything you will need to know about quaternions. Finally, he discusses the use of error measures and how the weight is necessary when considering larger z-values to avoid large errors.

  • 01:15:00 In this section, the speaker explains that certain kinds of surfaces can cause issues with the binocular stereo method of determining the orientation of an object. These "critical surfaces" were discovered over a century ago and can cause ambiguity and high sensitivity to error. The speaker gives an example of a U-shaped valley where the angles between different images of surface features do not change as the airplane moves along the surface, making it impossible to distinguish different positions. The speaker notes that the hyperboloid of one sheet is a common example of a quadric surface that can cause problems with binocular stereo, as it has the right number of minus signs for the one sheet and can closely resemble other surfaces.

  • 01:20:00 In this section, the speaker talks about quadric surfaces, specifically the two intersecting planes that make up one version of this surface. Each plane has a linear equation, and when multiplied together, the combination of two planes is obtained. One of the planes goes through the center of projection, which means it projects into a line. This is even weirder, as it means that a planar surface is a problem, common in man-made structures. The speaker mentions having to talk about "fearless reflection" next time, and a new homework problem was introduced.
Lecture 20: Space of Rotations, Regular Tessellations, Critical Surfaces, Binocular Stereo
Lecture 20: Space of Rotations, Regular Tessellations, Critical Surfaces, Binocular Stereo
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...
 

Lecture 21: Relative Orientation, Binocular Stereo, Structure, Quadrics, Calibration, Reprojection



Lecture 21: Relative Orientation, Binocular Stereo, Structure, Quadrics, Calibration, Reprojection

This lecture covers topics related to photogrammetry, including relative orientation, quadric surfaces, camera calibration, and correspondences between image points and known 3D objects. The lecturer explains various methods to solve problems of distortion and obtaining parameters such as f and tz. They also stress the importance of orthogonal unit vectors when finding the full rotational matrix and provide solutions for finding k using a more stable formula. The lecturer emphasizes the importance of understanding homogeneous equations, which are critical in machine vision.

This lecture covers various topics related to computer vision and calibration, including using a planar target for calibration, the ambiguity of calibrating the exterior orientation, redundancy in representing rotation parameters, and determining the statistical properties of given parameters through the noise gain ratio. The lecture explains the formula for solving a quadratic equation and introduces an approximation method involving iteration. The planar target case is discussed as a commonly used method for calibration and machine vision applications. The lecture also touches on the representation of shape and recognition, and attitude determination in 3D space.

  • 00:00:00 In this section, the speaker discusses relative orientation, which is the second of four problems in photogrammetry, and its relevance to binocular stereo, motion vision, and structure from motion. The speaker develops a solution but notes that there are surfaces where the relative orientation cannot be determined, particularly quadric surfaces. The lecture then delves deeper into the specific types of quadric surfaces, such as ellipsoids, hyperboloids of one or two sheets, and surfaces that have imaginary shapes. The speaker explains that if a surface doesn't have a constant term, it means that the origin of the right hand system, or the camera position at time two in motion vision, is on the surface. Additionally, if one plugs in minus b for r, where b is the distance between the two cameras, it also results in a solution, which means that the surface goes through both eyes.

  • 00:05:00 In this section of the lecture, the speaker discusses the properties and implications of the quadric surface equation, which is symmetric between left and right camera positions in a stereo pair. The equation has no constant term, meaning there is no scaling and the entire baseline is on the surface. This suggests that the surface is a ruled surface and has two rulings, making it interesting for manufacture. The equation covers a variety of special cases, including planar surfaces, with one of the planes passing through the origin of both coordinate systems as an epipolar plane. The image of this plane is a straight line, which is not particularly interesting, but the other plane is arbitrary and can be anything.

  • 00:10:00 In this section, the lecturer discusses the issue of ambiguity when reconstructing topographic maps or recovering structure from motion, as the two problems are mathematically the same. Although this problem is more likely to occur in narrow fields of view, it can still be amplified in a high noise gain situation. To combat this issue, a large field of view is recommended, which was why spider heads, a set of cameras mounted together to get a wide field of view, were created for aerial photography. The lecturer then moves onto interior orientation, which is essentially camera calibration. While the previous method of calibration using vanishing points worked, it was not very accurate and challenging to account for radial distortion. The lecturer suggests the need for a more general method to account for radial distortion.

  • 00:15:00 In this section, the lecturer discusses the trade-offs that come with designing a lens, including radial distortion, which causes an error in polar coordinates where the image may appear elsewhere along a line instead of where it should. This distortion is commonly approximated using a polynomial, with the quadratic term usually being sufficient to get decent results. The lecture goes on to describe a plumb line method used in the past to measure the distortion of a lens.

  • 00:20:00 In this section, the speaker discusses the different types of distortion that can occur in images, including barrel distortion and pin cushion distortion, and how they are related to the sign of k1. The speaker also mentions the possibility of using a polynomial approximation to convert between distorted and undistorted coordinates, and how this affects the final optimization and coordinate system used. The speaker notes the absence of tangential distortion in modern imaging systems, as they are typically rotationally symmetric and only experience radial distortion.

  • 00:25:00 In this section, the lecturer discusses potential complications in camera calibration such as the decentering of distortion and a tilted image plane. For high-quality work like aerial photography, these factors need to be taken into consideration. The lecturer explains that there is a possibility for small errors due to the mechanical nature of camera manufacturing which can impact magnification and image distortion. This can be compensated for by having a more complex model for distortion, and Tsai's calibration method involves using a calibration object that can be planar or three-dimensional. The lecturer also explains that in the past, it was more of a matter of fine-tuning the camera during manufacturing, but in modern times, a software solution and model extensions are used to deal with distortion.

  • 00:30:00 In this section, the speaker discusses the process of determining correspondences between image points and known points on a 3D object. However, unlike the vanishing point method, it's unlikely that we can determine the relationship between the calibration object and the camera using a tape measure. Therefore, we need to add exterior orientation to solve the problem of figuring out where the calibration object is in space and how it's rotated, in addition to finding the camera parameters. While exterior orientation adds more unknowns, it produces more accurate results. The interior orientation involves the perspective projection equation and the principal point and principal distance. The strategy for this is to eliminate difficult parameters and find a method to modify measurements to reduce dependence on radial distortion, then find a close-form solution for some parameters before resorting to numerical methods.

  • 00:35:00 In this section of the video, the speaker explains how to get a good initial guess for the iterative solution when calculating the relative orientation in binocular stereo. While the established principles should be kept in mind, some violations of these principles are allowed at this stage since the initial guess is not the answer. The speaker explains that using the row and column numbers for the xi and yi coordinates, and expressing f in pixel size, is convenient for the process of determining the initial guess. The exterior orientation is also discussed, including the rotation and translation of the calibration object, which is known accurately. The equations that are typically used to transform a position in the calibration object to a position in the camera object are inverted and used to try and recover the unknown rotation and translation parameters.

  • 00:40:00 In this section of the video, the speaker discusses the challenges of dealing with radial distortion and obtaining f and tz. The solution suggested is to work in polar coordinates, where radial distortion only alters the length, not the angle, and using an equation with fewer unknowns. The equation involves coordinate components of the calibration object and image coordinates, which are known, and the unknown components of r and tx and ty. A linear equation can be formed to approximate the position of the principal point, which is necessary for the solution.

  • 00:45:00 In this section, the speaker discusses the process of determining the principal point of an image sensor and the use of homogeneous equations in machine vision. To determine the principal point, the speaker suggests assuming a center point and throwing away correspondences that are too close to the center as small errors can significantly affect their direction. Once the center point is assumed, the speaker explains that eight equations are needed to find the eight unknowns for each correspondence, and these equations are homogeneous, resulting in zero. While homogeneous equations are often overlooked in traditional education, they are critical in machine vision, and it's essential to know how to work with them.

  • 00:50:00 In this section, the speaker discusses the method of solving the homogeneous equations by fixing one of the unknowns and setting it to a value of choice, reducing the number of unknowns to seven. This means that at least seven correspondences are required, and it is desirable to have more to estimate the error. The over-determined system of linear equations can then be solved using techniques like Pseudo-inverse. Finally, a scale factor is computed to make the calculated vectors unit vectors, which acts as a sanity check for the correspondences identified. The method provides a first estimate for all the unknowns except F, Tz, radial distortion, and Tz, which requires further analysis.

  • 00:55:00 In this section, the lecture explains the process of finding the f and tz in relation to relative orientation, binocular stereo, structure, quadrics, calibration, and reprojection. The lecture stresses the importance of orthogonal unit vectors when finding the full rotational matrix. When two non-orthogonal vectors are present, a small adjustment is needed that will result in a pair of vectors that are orthogonal. The lecture then goes on to explain how the quadratic equation can be problematic to find k, so another formula is used that is more stable.

  • 01:00:00 In this section, the lecturer discusses the formula for solving a quadratic equation and the potential loss of precision that can occur in the computation due to the subtraction of nearly equal-sized quantities. The lecturer introduces an approximation method involving iteration, which can provide a simple solution. The discussion also includes the planar target case, which, due to its high accuracy and ease of use, is commonly employed in calibration and machine vision applications. The lecturer explains that a pattern with accurately determined feature corners is mounted on the target to measure the rotation of components along two different axes, allowing for high accuracy wheel alignment.

  • 01:05:00 In this section, the lecturer discusses using a planar target for calibration, which allows for the construction of a coordinate system with known x, y, and z values. The equation for this approach has fewer unknowns and only requires five correspondences instead of seven, making it a more efficient method. However, if the y translation is zero, this method can become inaccurate, and setting tx equal to one is recommended for more accurate solutions. The lecture also touches on recovering the top two by two pieces of the rotation matrix for the planar case.

  • 01:10:00 In this section, the lecturer explains the difficulty of finding the relationship between the aspect ratio of the stepping in the x and y directions in the old days. There was a need for another parameter that would scale x relative to y, as different things controlled horizontal and vertical spacing. The lecture mentions the use of algebra that makes a mess, therefore manufacturers' spec sheets can be used to find the aspect ratio precisely. The lecturer also explains that with the perspective projection equation and knowing the unknowns, f and tz, it's possible by using one correspondence to calculate them both. However, there is an issue with depth variation when trying to use a calibration target planar.

  • 01:15:00 In this section, the lecturer discusses the ambiguity of calibrating the exterior orientation in computer vision. It is impossible to determine focal length and translation separately due to scale factor ambiguity, and so variations in depth are needed. The lecturer explains that exterior orientation is ambiguous if the calibration target is not mounted at a 45-degree angle. Finally, the principal point and radial distortion are discussed, and a non-linear optimization is required to minimize the error between predicted and actual image coordinates. The package LM Diff, which is built into MATLAB, is recommended for this purpose.

  • 01:20:00 In this section of the lecture, the presenter discusses the problem of redundancy in representing rotation parameters and suggests solutions such as the Euler angles, Gibbs vector, and unit quaternions. However, the unit quaternions are redundant with four numbers for three degrees of freedom. The presenter proposes adding another equation and an error term proportional to the difference between the size of the quaternion and one to enforce the constraint. The lecture also mentions the noise gain issue and the use of Monte Carlo methods to address this problem in the absence of an analytic method.

  • 01:25:00 In this section, the speaker explains how to determine the statistical properties of a given answer through the noise gain ratio by fiddling with inputs many times. It allows one to analyze the answer distribution in parameter space, and to find out that certain factors like radial distortion's higher-order coefficients are poorly determined due to sensitivity to noise measurements. The next topic to be discussed is the representation of shape and recognition, and attitude determination in 3D space, using the knowledge built so far in 2D recognition and attitude determination in patterns.
Lecture 21: Relative Orientation, Binocular Stereo, Structure, Quadrics, Calibration, Reprojection
Lecture 21: Relative Orientation, Binocular Stereo, Structure, Quadrics, Calibration, Reprojection
  • 2022.06.08
  • www.youtube.com
MIT 6.801 Machine Vision, Fall 2020Instructor: Berthold HornView the complete course: https://ocw.mit.edu/6-801F20YouTube Playlist: https://www.youtube.com/p...