30 November, 2018

Continuous Video Stream Processing Integration

With the abundance of cheap processing power today (especially on phones), I'm wondering what information we can build from the continuous video stream of a camera - basically what the visual processing part of the brain of all sighted living things does. Even flying insects have to build a map of their surroundings, quickly, accurately, and without much resources. And a lot of this stuff that our brains do is behind the scenes to us, we only get the final picture, but not the raw stream or how it was processed. It seems obvious to me that visual cortexes don't process individual snapshots, they look at the continuous "video stream" and use context to determine size, distance, true colour etc.

Don't believe me? Look far away through a mesh like a screen door or lace. Keep your head still. Now move your head around. Notice how the "picture" is much clearer? Your mind is using the different perspectives to fill in the blocked areas.

I mean, we're already doing this for panoramic photos - taking each new frame, determining the overlap and adding the new region to the compilation.

What's the point? Well, we can do things like building a 3D model just from a video feed, when it usually needs a laser grid. Kind of like a CAT scan, come to think of it. Other things like determining an object's true colour, especially for shiny or iridescent surfaces. Or compensating for damaged optics like scratched lenses. Actually CCD dust removal in software sort of does this.

iOS 12's new Augmented Reality measurement tool is a good example. I believe it uses the gyro and accelerometers, and the camera. You let it look around to calibrate, then it can measure lines (e.g. furniture and room sizes) from a distance. Like your brain, it can compensate for further objects appearing smaller.

The next step I'm thinking of is what hardware can improve this. Or even better, which additional sensors can improve the quality of what we can process, at the least cost. Things like (roughly increasing cost/complexity):
  • Using the focus distance from an autofocus system to get rough distance data
  • Motion data. Our visual and balance systems are tightly connected, hence motion sickness and vertigo.
  • Dual cameras for parallax - just like nature!
  • Laser grid for accurate distance measurements - like the Kinect and Apple's Face ID
  • Bracketing (so faster image capture - faster processing, shorter shutter times) in focus or exposure (HDR) or white balance - process multiple elements in one picture
  • Now, the most complicated upgrade will be to build an OBJECT DATABASE. Think about it - when you look at something new, you try and identify familiar elements and match it to things you already know. Size-shape-colour-surface details-movement. 

They're already working on things like machine vision and captchas and recognising cats, but I think the key difference of this is CONTINUOUS VIDEO. It's what nature relies on and can give so much more information.

No comments:

Post a Comment