Interview: Noah Snavely on 3D Rome
A team of software developers in the United States has come up with an algorithm that can construct virtual 3D models of cities, monuments and the interiors of buildings by using the photos available on public photo-sharing Web sites such as Flickr. The first city they chose to model was Rome - as mentioned here.
The software is still in development but, as an alternative to cumbersome and expensive laser-scanning techniques, it has potential uses in the field of archaeology and heritage preservation. The team's members were Sameer Agarwal, Yasutaka Furukawa, Ian Simon and Steve Seitz from the University of Washington; Noah Snavely from Cornell University and Richard Szeliski from Microsoft Research. I spoke to Noah Snavely, who has never visited the real Rome, but is optimistic for the team's plans to improve their software and is aiming to create a model using all of the four billion photos currently on Flickr.
HK: So where did the idea for this project come from?
NS: This grew out of a project that I worked on when I was a graduate student at the University of Washington – called Photo Tourism – and the original goal was to create a way to browse your own personal photo collection in a more intuitive way. We started using computer vision to try to reassemble photos into a 3D reconstruction, using the computer to figure out the position the photo was taken from. This allows you to browse your photos in 3D. We found this worked surprisingly well with Internet collections of photos such as Flickr, as well as with personal collections.
The problem with that system was that it was very computationally expensive. Back in 2006 when I was working on that project, it would take two weeks to construct the model of the Notre Dame cathedral in Paris from 2,500 images. And the time taken to process a greater number of photos increases exponentially as the number of photos increases. So our inspiration for the current project was that we wanted to be able to reconstruct models with much larger photo collections, far more quickly.
We've shown that we can now handle up to 250,000 photos (used for reconstructing Venice). For Rome we worked with 150,000 (although there are more than two million if you search for Rome on Flickr). Our eventual goal is to handle millions of photos.
HK: How can you make the process faster?
NS: One option would be to increase the number of machines, but we plan to keep developing better algorithms. We used 500 machines for the latest models but we'd like to use a far smaller cluster so we're going to continue to work on the basic algorithms. Personally I think there's still a lot of room for improvement.
HK: What made you pick Rome, Venice and Dubrovnik for this project?
NS: We picked Rome because it's so full of interesting sites and buildings – both from an aesthetic and a historical point of view. For a large city it has a very high density of interesting things. We chose Venice because, again, it's a beautiful city.
As for Dubrovnik, when we chose that city we also had google's Street View in mind. Dubrovnik is an interesting city, but it's smaller, less popular and is a city that big companies may not devote attention to. So we wanted to show that we could reconstruct a model of it just by using photos that already exist on the Web. If a town's authorities, for example, wanted to recreate a model for themselves, then they would be able to do this by buying time on a cluster (eg, Amazon's cloud computing services), which is cost-effective, and using our software. They could use photos on Flickr – or they could send photographers to take photos around town, without the need for cars or GPS equipment.
HK: Is there any possibility of using your software in conjunction with Street View then?
NS: The photos on Flickr are complementary to the photos on Street View, and Street View is comprehensive when it comes to views of streets, but what's missing is the insides of buildings, and places you can walk to but can't reach with the Street View car. Photos on Flickr are also much richer in terms of view point, close-ups, wide-angle shots, time of day and weather, and so on. It would definitely be interesting to combine these together.
HK: The clips you have on your Web site show 3D models made of point clouds – are you going to superimpose photographic images onto these to create a realistic 3D model?
NS: The user-interface we have at the moment does this to a certain extent, but what we really want is to create a 3D model similar to the ones you can achieve with a laser scanner – something you can take measurements on and use for rendering in a virtual environment. This will be similar to Rome Reborn, which is a project to recreate a virtual model of the ancient city of Rome using historical evidence about how they think the city would have looked. Google earth also has 3D models of Rome's landmarks, but they're not very high quality. So one ongoing part of our project is to take the point clouds and turn these into dense 3D models. [Read more about this technique here].
HK: So how accurate are your models – are they scale models?
NS: The reconstruction is unitless – it's not in metres, so by looking at a photo the computer can't tell the measurements. However, we have taken laser scans that are metrically accurate, then compared our digital models to them – we still have a lot of work to do to measure the accuracy of our technique. But for one site where we compared our model with that of a laser scanner – a 50m-high building in Pisa – there was an error margin of 10cm. So we're getting fairly high accuracy, although it's not quite as good as a laser scanner yet.
HK: You've already been collaborating with an archaeologist on this project?
NS: Yes – at Cornell, an archaeologist has shown interest in this project as a low-cost replacement for a laser scanner. Archaeologists do a lot of scanning to document sites, but buying laser scanners is still relatively costly. If you you can just take several hundred photos of the site, then reconstruct the model on a computer cluster at home, this would be much cheaper.
HK: How could the software be applied to heritage, archaeology or conservation of sites?
NS: It can be used as a tool in the field as a low-cost replacement for a laser scanner. Also, there are some objects that have only been preserved with photos, for instance the large statues of Buddha in Afghanistan that were destroyed. We have photos of them but no other way of taking measurements of those statues. So if you have a system that can use vision to do reconstructions, you can recreate models of objects that are only preserved with photographs – also allowing you to take measurements, and preserve objects with 3D geometry.
I've also been talking to people in the field of ecology who are interested in using this as a tool for measuring foliage from the air, for example. Using a plane and a laser scanner isn't always possible – sometimes because permission is hard to get. So if you can use far simpler techniques, for example a camera on a kite, you can do it without necessarily getting permission. I think there are a lot of possibilities for how this technology can be used.
There may also be circumstances where it is difficult to take a laser scanner into a site – either because getting permission is an issue or because it is a confined space.
HK: What next for the project?
NS: This could easily occupy my time for the next five-10 years. My goal is to process all of Flickr, which is probably almost four billion photos by now. Other things I'm interested in is developing techniques that adapt to changes in the world – you can never really say you've finished a 3D model of the world, because the world is constantly changing, so when a new photo is added to Flickr, then we need to be able to adapt the 3D model to that. That's a whole new research area for us.
Read one comment, or leave your own



videos
Comments
For God's sake do not call them software developers, they are RESEARCHERS
Post new comment