Computer vision

By: Prachurya Priyadarshini & Ritocheta Roy

Vision is the highest bandwidth sense providing a firehose of information about the state of the world and how to act on it. This serves as a reason to give computers the ability to extract high-level understanding from digital images and videos. As a refresher, images on computers are most often stored as big grids of pixels. Each pixel is defined by a colour, stored as a combination of three additive primary colours: red, green and blue. This algorithm can be run on not just a single photo but frames in videos too where the ball can be tracked over time. Due to variations in lighting, shadows, and other effects, the ball on the field is almost certainly not going to be the exact same RGB value as our target colour, but merely the closest match.

To make such sort of features recognizable in images, computer vision algorithms have to consider small regions of pixels, called patches. For example, let the image is converted into grayscale. On zooming into one of these poles, it can be easily observed where the left edge of the pole starts, because there’s a change in colour that persists across many pixels vertically. This is due to the fact that pixel being a vertical edge is the magnitude of the difference in colour between some pixels to its left and some pixels to its right. The bigger the colour difference between these two sets of pixels, the more likely the pixel is on an edge. If the colour difference is small, it’s not an edge. The mathematical notation for this operation looks like this – it’s called a kernel or filter.

It contains the values for a pixel-wise multiplication, the sum of which is saved into the centre pixel. The pixels are labelled with their grayscale values. Kernel is centred it over our pixel of interest. This specifies what each pixel value underneath should be multiplied by. Then, all those numbers are added up. That becomes our new pixel value. This operation, of applying a kernel to a patch of pixels, is call a convolution. If kernel is applied to every pixel in the photo, the strong vertical edges will hold the highest pixel values. The horizontal edges are almost invisible. To highlight those features, a different kernel is used which is sensitive to horizontal edges. Both of these edge enhancing kernels are called Prewitt Operators.

Convolutional Neural Networks are the hottest new algorithms of the day. Artificial Neuron, the building block of a neural network takes a series of inputs, and multiplies each by a specified weight, and then sums those values all together. The input weights are equivalent to kernel values, but unlike a predefined kernel, neural networks can use their own kernels that are able to recognize intrinsic structures in images. Convolutional Neural Networks use banks of these neurons to process image data, each outputting a new image, which in turn are processed by subsequent layers of neurons.

The very first convolutional layer finds edges, which are identified by convolution. The next layer has neurons that convolve on those edge features to scrutinize simple shapes, comprised of edges, like corners. A layer beyond that might convolve on those corner features, and contain neurons that can depict simple objects, like mouths and eyebrows. And this keeps going, building up in complexity, until there’s a layer that does a convolution that puts it together: eyes, ears, mouth, nose, the whole nine yards, until it gives rise to a face.
Applications:

Application of computer vision ranges from industrial level machine vision systems for automatic inspection for research in artificial intelligence, robots, and computers to apps used in everyday life. Though in most computer vision applications, computers are pre-programmed to solve a particular task, recently the use of neural network and fuzzy i.e., methods based on learning have become increasingly common, and have reached every aspect of human life.

For example, in the medical field, Computer Vision is applied for medical image processing which involves extracting features from images (usually CT scans and X-Rays) to diagnose a patient. This has been used for detection of tumours, and also for identifying its type, blood flow, organ dimension etc. In the current ongoing pandemic, there has been extensive research to use image processing to identify COVID-19 pneumonia from Chest X-Rays with greater accuracy, ease and time efficiency. This would be a major milestone as it would result in faster detection of COVID-19 patients, and hence their isolation can also be done rapidly. It will also ensure that the medical team will not have to check on patients physically to determine if it is in fact COVID-19 and reduce their chances of being infected.

Computer vision application in the industrial field is commonly known as machine vision. The information is extracted for the purpose of supporting the production process. It is used for quality control where the products are automatically being inspected to find defects, for measurement of position, and orientation of details to be picked up by a robot, for optical sorting where undesirable products are removed from bulk material etc.

Military is probably one of the largest domains of computer vision application. It involves simple things like detection of enemy soldiers or vehicles to advance systems for missile guidance where the missile is sent to an area rather than a specific target and the target is selected when the missile reaches the area based on the locally acquired image data or battlefield awareness where various sensors for providing a rich set of information about a
combat scene which is used to come up with strategy points.

Vehicles are one of the recent applications of computer vision. Autonomous vehicles, where the level of autonomy ranges from fully autonomous to vehicles where computer vision-based systems aid the driver in various situations.

Some other areas of computer vision application are: Tactile Feedback, Visual effect support (for creation of cinema, broadcast, gaming, animation etc), Surveillance, Driver drowsiness detection, Image transformation using GANs, Meteorology, Robotics.

Leave a comment