A stereo camera is a type of camera with two or more image sensors. This allows the camera to simulate human binocular vision and therefore gives it the ability to perceive depth.
Human binocular vision
The human binocular vision perceives depth by using Stereo disparity which refers to the difference in image location of an object seen by the left and right eyes, resulting from the eyes’ horizontal separation.
The brain uses this binocular disparity to extract depth information from the two-dimensional retinal images which are known as stereopsis.
Similarly, some stereo cameras such as Tara and TaraXL try to mimic this stereopsis technique of human vision to perceive depth. The depth-perceiving is done by a Geometric approach called Triangulation.
Stereo Disparity in Cameras
Stereo Disparity in a camera can be found by using two 2D images taken from different positions and the correlation between the images can be used to create a depth image. However, to find correlations the two images need to have sufficient details and texture or non-uniformity.
Thus the Stereo vision is suitable for applications with a large field of view and for outdoor usage.
To obtain better results, one may need to add those details by illuminating the scene with structured lighting.
Depth Perception Technologies
Capturing the third dimension can be done in many different ways, and each of the machine vision technologies available has its own pros and cons. Three-dimensional imaging can be broken into two main categories: passive and active, which can be further broken down into specific techniques.
The main Passive techniques are:
- Depth from focus
- Light field
The main Active techniques are:
- Structured light
Classification of Stereo Depth Perception
1. Passive Stereo
The Passive Stereo system depends on the available light in the environment and doesn’t employ any kind of external light.
Passive stereo is suitable for well lighted textured regions and works well in sunlight.
- Performs well in sunlight
- Cost Effective
- Mediocre performance in low light
- Mediocre performance in non-textured scenes
2. Active Stereo
The active stereo vision is a form of stereo vision which actively employs a light such as a laser or a structured light to simplify the stereo matching problem.
Active stereo is useful in regions where there is a lack of light and/or texture. The infrared projector or another light source will flood the scene with texture thereby cutting off the dependency of an external light source. But along with its positive, there are some negatives such as active stereo will lose its effectiveness in direct sunlight and in regions with a high interference of the same external light source technology used.
- Performs well in low light.
- Performs well in the non-textured indoor scene.
- Can be used as a hybrid Time of flight and Stereo Triangulation depth-perceiving technology.
- Under sunlight, it is same as passive stereo.
- Over long-range, it is same as passive stereo.
- IR projector adds to cost.
What determines the depth range in Stereo vision?
- Focal length
The Distance between the two cameras is called baseline, for human eyes, it is about 50–75 mm (interpupillary distance) depending on each individual.
The baseline of Tara and TaraXL is 60 mm which is similar to the average baseline of a Human being. The Baseline is Directly proportional to the Depth. So, the longer the Baseline the longer the depth we can cover with better accuracy.
The Resolution of the two Cameras and it is directly proportional to the Depth.
Higher the number of pixels to search, higher the number of disparity levels. So, at higher resolutions, the disparity levels will be high but with a higher computational load.
The Focal length of the lens and it is directly proportional to the Depth.
Lower the focal length the farther we see, but with reduced field of view. With Higher Focal length we see near depth with a high field of view.
To learn more about these properties and how to select your Stereo Camera, take a look at the following blogs.
Long Range Depth Sensing
Theoretically, with a stereo camera, we can cover infinite depth with 60mm itself, but the error rate increases quadratically over depth.
As stated above, if the Baseline increases depth accuracy over distance then you might ask,
“How can the Human eye perceive so much distance with just 50-75 mm baseline?“
Well, the answer is that the resolution of human is so high (~576 Megapixels) which enables the eye to perceive larger depths.
But this much Megapixels isn’t possible in today’s technology and even if we had a Camera that is able to pump a 576 MP image we still don’t have the processing capability of that size.
So, the bottleneck in resolution restricts our depth range, but we can compensate that by increasing the Baseline but in turn it reduces the nearest perceivable depth.
And at higher resolution and baseline, the stereo correspondence problem is amplified, and the computational load increases. This can be solved by using GPU’s to some extent.
To learn about the need for GPU look into The Rise of GPU.