A Data Bottleneck is Holding AI Science Back
October 15, 2024
“If there were many databases as good as the PDB, I would say, yes, this [prize] probably is just the first of many, but it is kind of a unique database in biology” - David Baker (Nobel prize in Chemistry).
Agreed. That’s why in most other problems in science and engineering, we will need to generate high quality synthetic data using numerical simulators where the rules of Physics are programmed explicitly. You can use them to generate big datasets, with perfect annotations for supervised learning (think: fluid dynamics where you know the velocities at every point in space and time).
That’s what we are building at Inductiva.AI : a platform that makes it much easier to generate physics/chemistry/biology datasets. If you are an engineer, e.g. in coastal dynamics, you can also use it to explore many parameters and run many simulations in parallel for those (no need to run ML on top afterwards, but you can!).
See the MIT Technology Review article: A data bottleneck is holding AI science back, says new Nobel winner .