What is computer vision?

Man's face with connected dots graphic overlay
Posted by

Feb 13, 2023

In the field of artificial intelligence (AI), computer vision is how computers acquire and form meaning from visual input, whether that’s from digital imagery or video footage. Computer systems cannot literally see, but they can utilise machine learning to begin to recognise patterns and so identify objects. 

With the use of cameras, data, and algorithms combined with deep learning, computers can develop functions such as object recognition and image processing. Machines have an advantage in that they can process vast amounts of data in a short time. However, they do not have the experience of context that human beings have. It has also become evident that the complexity of human vision, which involves nuanced interaction between the eyes and the brain, is difficult to replicate in neural networks. For example, correlating human emotions with facial expressions is difficult for computers.

How is computer vision developed?

This subfield of artificial intelligence is honed by the computer labelling and detecting objects within images over and over again. By analysing vast amounts of data and creating datasets, computers get more familiar with particular objects and they no longer need manual coding. To get to this point, computer vision draws on many skill sets such as pattern recognition, digital image processing, scientific computations, and mathematics to carry out deep learning.

Image classification is a supervised learning problem which supports a computer in understanding an image in its entirety rather than by its individual parts. The image is assigned a label or a class, which helps the computer recognise when it sees a similar image. This method makes it easier for the computer to retrieve all digital images with the same label when you do a search. The computer system learns as it goes along, for example, noticing that you like taking photos of sunsets and creating an album.

Even before reaching the labelling stage, object detection and outlining objects in the image is practised. Beyond basic object detection there is also:

  • Discriminative object tracking – this attempts to separate the background from the target object as it moves in a video by using a decision boundary.
  • Semantic segmentation – this divides images into pixel groups that can then be classified. This gives the computer a deeper comprehension of the pixels in each part of the image.
  • Instance segmentation – this categorises all the various instance classes to identify differences, boundaries, and defining characters.
  • The image reconstruction approach – this enhances image restoration.

Alongside deep learning, the other main route to achieving computer vision is using a convolutional neural network (CNN). A CNN is a type of artificial neural network (ANN) widely used in object recognition and classification. This uses multiple hidden layers within the neural network between the input layer and the output layer to recognise more complex imagery that can’t be identified through outline alone. The mechanism roughly mimics human vision, reducing images into a form that is easier to process but without losing features that are critical to getting a good prediction.

What are some examples of computer vision applications?

Facial recognition is perhaps one of the most well-known ways that computer vision has become integrated into everyday life. Facial recognition is used at airports to aid in automated border control posts. In this example, a computer algorithm compares a snapshot taken of the face at the checkpoint with the person’s passport picture. Other biometric measurements that are often used to verify and validate someone’s identity include iris and fingerprint scanning. Unlocking devices such as smartphones, laptops and tablets can now be done with face recognition or fingerprint identification.

Self-driving cars rely heavily on computer vision to help the vehicle’s internal computer systems to decipher the myriad of visual input from the car’s cameras and other sensors. Some autonomous vehicles use lidar to support their cameras. Lidar is like radar but it measures 3D depth measurements by shooting out lasers and detecting how long it takes for them to return.

Optical character recognition (OCR) converts printed paper documents into machine-readable text documents. OCR is a mature field of research in pattern recognition that has its foundations in the creation of reading devices for the blind. Early versions of this machine vision required training with images of each character, one font at a time. OCR technology has advanced beyond those early stages and is now used in data mining for real-time information extraction. It is also how Google Translate works, for instance, when you use your phone’s camera to understand what a sign says in a foreign language.

What are the advantages and disadvantages of computer vision?

Applications of computer vision are diverse and wide-ranging, from medical imaging that detects cancer tumours at an early stage to helping farmers spot the signs of plant disease before it progresses. According to SAS Institute, the accuracy rates for object identification and classification have leapt from 50 percent to 99 percent in less than a decade. Today’s computer vision systems are more accurate and faster than humans at detecting and reacting to visual inputs.

However, computer vision is not completely without its faults and decision-making should not be left entirely to a computer based on visual information alone. Computer vision algorithms are like any other algorithm – they are only as good as the training material they are given. There have been many cases of encoded bias due to who is labelling the original training material.

Studies have shown that CNNs trained on ImageNet and other popular datasets sometimes fail to detect objects when they are seen under different lighting conditions or from new angles. Facebook also famously censored the 30,000 year old Venus of Willendorf after a CNN deemed it pornographic.

Enhance your knowledge of AI with a master’s

The field of artificial intelligence continues to make great strides with deep learning models which support machines in completing computer vision tasks. These ultimately help to solve real-world problems in industries such as automotive and healthcare. The field of computer vision is exciting and challenging with many opportunities to make current technologies even better. 

Find out more about how an MSc Computer Science with Artificial Intelligence from Abertay University could equip you with all the skills you need to embark upon a new career path or consolidate your knowledge and experience to expand your job options.