SKYSCENES Dataset Could Lead to Safe, Reliable Autonomous Flying Vehicles

Is it a building or a street? How tall is the building? Are there powerlines nearby?

These are details autonomous flying vehicles would need to know to function safely. However, few aerial image datasets exist that can adequately train the computer vision algorithms that would pilot these vehicles.

That鈥檚 why 色花堂 researchers created a new benchmark dataset of computer-generated aerial images.

Judy Hoffman, an assistant professor in 色花堂鈥檚 School of Interactive Computing, worked with students in her lab to create SKYSCENES. The dataset contains over 33,000 aerial images of cities curated from a computer simulation program.

Hoffman said sufficient training datasets could unlock the potential of autonomous flying vehicles. Constructing those datasets is a challenge the computer vision research community has been working for years to overcome.

鈥淵ou can鈥檛 crowdsource it the same way you would standard internet images,鈥 Hoffman said. 鈥淭rying to collect it manually would be very slow and expensive 鈥 akin to what the self-driving industry is doing driving around vehicles, but now you鈥檙e talking about drones flying around. 

鈥淲e must fix those problems to have models that work reliably and safely for flying vehicles.鈥

Many existing datasets aren鈥檛 annotated well enough for algorithms to distinguish objects in the image. For example, the algorithms may not recognize the surface of a building from the surface of a street.

Working with Hoffman, Ph.D. student Sahil Khose tried a new approach 鈥 constructing a synthetic image data set from a ground-view, open-source simulator known as CARLA.

CARLA was originally designed to provide ground-view simulation for self-driving vehicles. It creates an open-world virtual reality that allows users to drive around in computer-generated cities.

Khose and his collaborators adjusted CARLA鈥檚 interface to support aerial views that mimic views one might get from unmanned aerial vehicles (UAVs). 

What's the Forecast?

The team also created new virtual scenarios to mimic the real world by accounting for changes in weather, times of day, various altitudes, and population per city. The algorithms will struggle to recognize the objects in the frame consistently unless those details are incorporated into the training data.

鈥淐ARLA鈥檚 flexibility offers a wide range of environmental configurations, and we take several important considerations into account while curating SKYSCENES images from CARLA,鈥 Khose said. 鈥淭hose include strategies for obtaining diverse synthetic data, embedding real-world irregularities, avoiding correlated images, addressing skewed class representations, and reproducing precise viewpoints.鈥

SKYSCENES is not the largest dataset of aerial images to be released, but a paper co-authored by Khose shows that it performs better than existing models. 

Khose said models trained on this dataset exhibit strong generalization to real-world scenarios, and integration with real-world data enhances their performance. The dataset also controls variability, which is essential to perform various tasks.

鈥淭his dataset drives advancements in multi-view learning, domain adaptation, and multimodal approaches, with major implications for applications like urban planning, disaster response, and autonomous drone navigation,鈥 Khose said. 鈥淲e hope to bridge the gap for synthetic-to-real adaptation and generalization for aerial images.鈥

Seeing the Whole Picture

For algorithms, generalization is the ability to perform tasks based on new data that expands beyond the specific examples on which they were trained.

鈥淚f you have 200 images, and you train a model on those images, they鈥檒l do well at recognizing what you want them to recognize in that closed-world initial setting,鈥 Hoffman said. 鈥淏ut if we were to take aerial vehicles and fly them around cities at various times of the day or in other weather conditions, they would start to fail.鈥

That鈥檚 why Khose designed algorithms to enhance the quality of the curated images.

鈥淭hese images are captured from 100 meters above ground, which means the objects appear small and are challenging to recognize,鈥 he said. 鈥淲e focused on developing algorithms specifically designed to address this.鈥

Those algorithms elevate the ability of ML models to recognize small objects, improving their performance in navigating new environments.

鈥淥ur annotations help the models capture a more comprehensive understanding of the entire scene 鈥 where the roads are, where the buildings are, and know they are buildings and not just an obstacle in the way,鈥 Hoffman said. 鈥淚t gives a richer set of information when planning a flight.

鈥淭o work safely, many autonomous flight plans might require a map given to them beforehand. If you have successful vision systems that understand exactly what the obstacles in the real world are, you could navigate in previously unseen environments.鈥

For more information about 色花堂 Research at ECCV 2024, click .