
NVIDIA is speeding up drug discovery—and it’s not in a lab.
A new AI-powered breakthrough could flip the pharma game. SandboxAQ, a startup backed by NVIDIA and spun out of Google, just dropped a massive dataset designed to supercharge how we find new drugs. It’s called SAIR—the Structurally Augmented IC50 Repository—and it packs over 5.2 million synthetic protein-drug structures, each linked to real potency data.

This might sound like science fiction, but it’s not. It’s science done smarter.
Why This Matters
Before any drug hits the market, it needs to prove one thing: Does it actually stick to the target protein? That “binding” step is make-or-break. If it doesn’t bind well, it doesn’t work.
But figuring that out has always been slow and expensive. Labs spend months testing molecules, one by one, hoping to find the right match.
With SAIR, researchers can now let AI do the heavy lifting—virtually.
NVIDIA-Powered Predictions
To build SAIR, SandboxAQ used NVIDIA GPUs to simulate thousands of 3D molecular structures. These are not just wild guesses. They’re grounded in real-world data from public sources like ChEMBL and BindingDB.
For each protein-drug pair, they generated five different binding poses, then picked only the most accurate ones based on their predicted potency. The result? A high-quality, synthetic dataset that’s ready to train next-gen AI models.
Also Read Base44, Six-Month-Old Vibe Code AI Startup, Acquired by Wix for $80 Million
Real Data, Synthetic Speed
“This is a long-standing problem in biology,” said Nadia Harhen, head of AI simulation at SandboxAQ. “Now we’re solving it with data that’s synthetic—but anchored in experimental truth.”
This changes everything. Most AI models struggle with new molecules or unknown proteins. That’s because they haven’t seen enough training data.
SAIR fixes that by offering an open, massive new training set—freely available to researchers. No more waiting for lab results. No more paywalls.
A Public Dataset with Private Muscle
Here’s the twist: while the SAIR dataset is free, SandboxAQ’s AI models trained on this data will be paid tools. These models aim to predict protein binding—and do it faster than traditional lab methods.
They could eventually replace some early-stage drug experiments, cutting costs and saving months of work.
And yes, NVIDIA’s chips are powering it all.
What’s Next?
AI is no longer a sidekick in biotech. It’s becoming the main driver. Datasets like SAIR mean startups and research labs can now train smarter, build faster, and test virtually.
This isn’t just data—it’s a shortcut to the future of medicine.
Also Read U.S. Performs First Fully Robotic Heart Transplant, Avoids Chest Opening