Uber has announced the launch of a new division, Uber AV Labs, dedicated to collecting extensive real-world driving data for its more than 20 autonomous vehicle (AV) partners. This strategic move underscores the critical demand for diverse data in the rapidly evolving self-driving industry.
Despite the "AV Labs" moniker, Uber is emphatically not returning to the development of its own robotaxis. The company famously exited this venture after a fatal incident involving one of its test vehicles in 2018, eventually selling off its self-driving unit in a complex deal with Aurora in 2020. Instead, Uber will deploy its own sensor-equipped cars into urban environments to gather data for partners such as Waymo, Waabi, and Lucid Motors, though specific contracts are still pending.
The autonomous vehicle industry is currently undergoing a significant paradigm shift, moving from rigid, rules-based operational systems towards more flexible, reinforcement learning models. This transition has made real-world driving data immensely valuable for training and refining these sophisticated AI systems.
According to Uber, the autonomous vehicle companies most eager for this data are often those already collecting substantial amounts themselves. This indicates a growing industry consensus, similar to leading AI labs, that resolving the most challenging "edge cases" in self-driving technology is fundamentally a volume game—requiring vast quantities of diverse data.
Overcoming Data Collection Limitations
Currently, the sheer size of an autonomous vehicle company's fleet imposes a physical constraint on the volume of data it can gather. While simulations are widely used to prepare for various scenarios, there's no substitute for extensive real-world driving. Operating on actual roads is crucial for uncovering the myriad strange, difficult, and utterly unexpected situations that autonomous vehicles encounter.
The challenges are evident, even for industry leaders. For instance, Waymo, with a decade of autonomous vehicle operation and testing, recently faced scrutiny after its robotaxis were observed illegally passing stopped school buses.
Praveen Neppalli Naga, Uber’s Chief Technology Officer, stated in an exclusive interview that a larger pool of diverse driving data could significantly help robotaxi companies proactively address such issues as they emerge.
Notably, Uber will not be charging for this data, at least initially.
“Our goal, primarily, is to democratize this data,” said Praveen Neppalli Naga. “The value of this data and having partners’ AV tech advancing is far bigger than the money we can make from this.”
Danny Guo, Uber’s VP of Engineering, emphasized the necessity of building a fundamental data foundation before determining product-market fit.
“Because if we don’t do this, we really don’t believe anybody else can,” Guo explained. “As someone who can potentially unlock the whole industry and accelerate the whole ecosystem, we believe we have to take on this responsibility right now.”
The AV Labs Operation: Starting Small
The new AV Labs division is commencing operations on a modest scale. Currently, it utilizes a single Hyundai Ioniq 5 (though Uber states it is not committed to one specific model). Danny Guo revealed that his team was still in the process of manually installing essential sensors, including lidars, radars, and cameras, onto the vehicle.
“We don’t know if the sensor kit will fall off, but that’s the scrappiness we have,” Guo quipped. “I think it will take a while for us to deploy 100 cars to the road to start collecting data. But the prototype is there.”
Partners will not receive raw data. Once the Uber AV Labs fleet is fully operational, Naga explained that the division will "massage and work on the data to help fit to the partners." This processed "semantic understanding" layer will be crucial for driving software at companies like Waymo to enhance a robotaxi’s real-time path planning capabilities.
Furthermore, Guo outlined an intermediate step involving "shadow mode" testing. Uber plans to integrate a partner’s driving software into AV Labs cars, running it in the background. Any discrepancy between the AV Labs human driver’s actions and the autonomous software’s decisions in shadow mode will be flagged and reported to the partner company.
This method is expected to not only identify shortcomings in the driving software but also aid in training models to emulate human driving behavior more closely, rather than rigid robotic responses, according to Guo.
Echoing Tesla's Strategy
This data collection strategy bears a resemblance to Tesla’s long-standing approach to training its own autonomous vehicle software. However, Uber’s initiative currently lacks the immense scale of Tesla, which leverages millions of customer vehicles globally for daily data collection.
Despite the difference in scale, Uber remains unfazed. Guo anticipates a more targeted data collection strategy, tailored to the specific needs of individual autonomous vehicle companies.
“We have 600 cities that we can pick and choose from,” Guo stated. “If a partner tells us a particular city they’re interested in, we can just deploy our cars there.”
Naga indicated that Uber plans to rapidly expand this new division, aiming for a workforce of several hundred people within a year. While he envisions a future where Uber’s entire ride-hail fleet could be leveraged for even broader data collection, he acknowledged the necessity of starting with the new division’s focused efforts.
“From our conversations with our partners, they’re just saying: ‘give us anything that will be helpful,’” Guo concluded. “Because the amount of data Uber can collect just outweighs everything that they can possibly do with their own data collection.”








