What Is Computer Vision
Let’s revisit our example from the previous chapter: kicking a ball. As we have seen, this involves multiple tasks that our brains can do in a split second. The extraction of what is meaningful information from an image input is at the core of computer vision. But what is computer vision?
Definition
Computer vision is the science and technology of making machines see. It involves the development of theoretical and algorithmic methods to acquire, process, analyze, and understand visual data, and to use this information to produce meaningful representations, descriptions, and interpretations of the world (Forsyth & Ponce, Computer Vision: A Modern Approach).
Deep Learning and the Computer Vision Renaissance
The evolution of computer vision has been marked by a series of incremental advancements in and across its interdisciplinary fields, where each step forward gave rise to breakthrough algorithms, hardware, and data, giving it more power and flexibility. One such leap was the jump to the widespread use of deep learning methods.
Initially, to extract and learn information in an image, you extract features through image-preprocessing techniques (Pre-processing for Computer Vision Tasks). Once you have a group of features describing your image, you use a classical machine learning algorithm on your dataset of features. It is a strategy that already simplifies things from the hard-coded rules, but it still relies on domain knowledge and exhaustive feature engineering. A more state-of-the-art approach arises when deep learning methods and large datasets meet. Deep learning (DL) allows machines to automatically learn complex features from the raw data. This paradigm shift allowed us to build more adaptive and sophisticated models, causing a renaissance in the field.
The seeds of computer vision were sown long before the rise of deep learning models during 1960’s, pioneers like David Marr and Hans Moravec wrestled with the fundamental question: Can we get machines to see? Early breakthroughs like edge detection algorithms, object recognition were achived with a mix of cleverness and brute-force which laid the ground work for this developing computer vision systems. Over time, as research and development advanced and hardware capabilities improved, the computer vision community expanded exponentially. This vibrant community is composed of researchers,engineers, data scientists, and passionate hobbyists across the globe coming from a vast arrayof disciplines. With open-source and community driven projects we are witnessing democratized access to cutting-edge tools and technologies helping to create a renaissance in this field.
Interdisciplinary with other fields and Image Understanding
Just as it is hard to draw a line that separates artificial intelligence and computer vision, it is also hard to separate computer vision from its neighbouring fields. Take image preprocessing and analysis as an example. A tentative separation is that the input and output of image analysis are always images. However, this is a shortsighted take. Even simple tasks, such as calculating the medium value of an image, would be classified under computer vision. To clarify their differences, we must introduce a new concept of image understanding.
Image understanding is the process of making sense of the content of an image. It can be defined in three different levels:
Low-level processes are primitive operations on images (i.e. image sharpening, changing the contrast). The input and the output are images.
Mid-level processes include segmentation, description of objects, and object classification. The information is an image, but the result is attributes associated with the image. This could be done with a combination of image preprocessing and ML algorithms.
High-level processes include making sense of the entirety of an image, i.e., recognition of a given object, scene reconstruction, and image-to-text. These are tasks typically associated with human cognition.
Image analysis is mainly concerned with low and mid-level processes. However, computer vision is interested in mid- and high-level processes. Thus, there is an overlap in the mid-level processes between image analysis and computer vision.
It is essential to remember this since allocating resources to develop a sophisticated model, such as a neural network for data-poor scenarios or simple images, might not be appropriate. From a business point of view, model development costs time and money; it is necessary to know when to use the right tools.
Combining a “preprocessing” part before moving on to a more robust model is usual. On the opposite side, sometimes, the layers of a neural network automatically perform tasks like these, eliminating the need for explicit preprocessing. Image analysis might act as a first exploratory data analysis step for those familiar with data science. Lastly, classical image analysis methods can also be used for data augmentation to improve the quality and diversity of training data for computer vision models.
Computer Vision Tasks Overview
We have seen before that computer vision is really hard for computers because they have no previous knowledge of the world. In our example, we start knowing what a ball is, how to track its movement, how objects usually move in space, how to estimate when the ball will reach us, where your foot is, how a foot moves, and how to estimate how much force you need to hit the ball. If we were to break this down into specific computer vision tasks, we would have:
- Scene Recognition
- Object Recognition
- Object Detection
- Segmentation (instance, semantic)
- Tracking
- Dynamic Environment Adaptation
- Path Planning
You will read more about the core tasks of computer vision in the Computer Vision Tasks chapter. But there are many more tasks that computer vision can do! Here is a non-exhaustive list:
- Image Captioning
- Image Classification
- Image Description
- Anomaly Detection
- Image Generation
- Image Restoration
- Autonomous Exploration
- Localization
Task Complexity
The complexity of a given task in the realm of image analysis and computer vision is not solely determined by how noble or difficult a question or task may seem to an informed audience. Instead, it primarily hinges on the properties of the image or data being analyzed. Take, for example, the task of identifying a pedestrian in an image. To a human observer, this might appear straightforward and relatively simple, as we are adept at recognizing people. However, from a computational perspective, the complexity of this task can vary significantly based on factors such as lighting conditions, the presence of occlusions, the resolution of the image, and the quality of the camera. In low-light conditions or with pixelated images, even the seemingly basic task of pedestrian detection can become exceedingly complex for computer vision algorithms,requiring advanced image enhancement and machine learning techniques. Therefore, the challenge in image analysis and computer vision often lies not in the inherent nobility of a task, but in the intricacies of the visual data and the computational methods required to extract meaningful insights from it.
Link to computer vision applications
As a field, computer vision has a growing importance in society. There are many ethical considerations regarding its applications. For example, a model that is deployed to detect cancer can have terrible consequences if it classifies a cancer sample as healthy. Surveillance technology, such as models that are capable of tracking people, also raises a lot of privacy concerns. This will be discussed in detail in “Unit 12 - Ethics and Biases”. We will give you a taste of some of its cool applications in “Applications of Computer Vision”.
< > Update on GitHub