Research
My research explores how changing common assumptions about visual
algorithms leads to new problems and new capabilities. Over the last
several years, I have posed three questions whose answers suggest
paradigms for collecting and analyzing imagery, with applications to
surveillance, robotics, and environmental and medical imaging.
-
Passive Vision: "What can one learn about a scene from static camera?"
-
Manifold Learning: "What can one learn from thousands of crummy images of a moving object?"
-
Generalized Cameras: "How can one use a camera that looks at the world through a fun-house mirror?"
Questions that have recently kept me awake at night include:
- Re-purposing sensors: "What if we could use all the worlds webcams as a coherent imaging sensor?"
- Collaborative Imaging: "What if everyones' camera phone served as an environmental imaging resource?"
- Health@Home: "What if we can measure (indicators of) health from sensors already in the home?"
Passive Vision is the analysis of video taken by cameras that are not
moving. Many cameras do not move, and continually watch a specific
scene - an airport security desk, a beach, a volcano - for months or
years. Much as Active Vision (the ability to intentionally control
camera motion) simplifies problems in structure from motion, Passive
Vision simplifies statistical image analysis by observing the same
scene for very long time periods. These statistics support algorithms
for more robust video surveillance, the ability to geo-locate any
webcam feed, and the potential to re-purpose webcams for environmental
monitoring.
One basic question is: "where is the camera?" There are many live
webcams broadcasting online from unknown locations; these cameras can
be geo-located because the lighting and weather changes they observe
depends on where the camera is. Our paper on webcam geolocation
[jacobs2007b]
offered the first algorithms to geolocate a time-series. The algorithm
used tensor factorization of imagery that we found to be consistent
across nearly all outdoor camera scenes
[jacobs2007a]. Related
cues helped to geo-calibrate (i.e. find the orientation and the zoom
level) cameras
[jacobs08].
Another natural question is "what is in the scene?" Classical
approaches attempt to recognize objects by their appearance in one
image, but we have explored what can be learned by measuring the time
scale over which things change. Tensor factorization of long term
time-lapses gives an approach to automatically labeling scene
locations (like trees) that vary over annual time scales, locations
(like eastward facing walls) that are consistently brighter in the
morning
[jacobs2007a],
or segmenting objects in a scene based on very small motions
[dixon2011].
At shorter time scale, we have begun to explore variations in lighting
due to clouds as a form of “stochastically structured light”, and
recently derived constraints for building the 3D model of a scene from
a time-lapse of clouds passing overhead [jacobs10integral].
To support our research, and the larger community, we have built and
actively share the Archive of Many Outdoor Scenes
(AMOS). AMOS provides a
variety of tools for large scale data visualization and integration
with Google Earth [1], and is a widely used as an experimental
platform to ground research in webcam geo-location and calibration
(e.g. [
cites]).
Pages my former student, Nathan Jacobs, maintains about parts of this project:
- Shape from Clouds
- Geo-location
Acknowledgements
This project is supported under
NSF IIS 0546383: "CAREER: Passive Vision, What Can Be Learned by a
Stationary Observer". Any opinions, findings, and conclusions or
recommendations expressed in this material are those of the author(s)
and do not necessarily reflect the views of the National Science
Foundation.