Protege and OmicsData Inc., Partner to Offer Global-Scale Multi-Omics and Longitudinal Clinical Data from Over 6 Million Patients

Protege and OmicsData Inc., Partner to Offer Global-Scale Multi-Omics and Longitudinal Clinical Data from Over 6 Million Patients

Aug 28, 2025

OmicsData Inc, a leader in data structuring across industries with special focus on multi-omics and clinical data across Asia and the Middle East, has partnered with Protege to make its longitudinal dataset available through the Protege AI Training Data Platform. With more than 6 million patient records and 100+ petabytes of standardized EMRs, diagnostics, DICOM imaging, and multi-omics files (BAM, FASTQ), the collection is among the world’s most diverse and advanced resources for AI development. It also includes 550,000+ biospecimens and complete records for 4 Million+ patients, with particular strength in oncology, neurology, cardiometabolic, gastroenterology, immunology, and rare diseases.

This partnership directly addresses the rapidly growing demand for international longitudinal datasets and advanced file formats that enable robust, modality-rich AI model development. It also significantly enhances the diversity of available training data, providing much-needed representation from Asia and the Middle East — regions historically underrepresented in medical AI development.

“AI developers have consistently asked for datasets that combine deep clinical, omics, and imaging data across long time horizons, and now, they have it,” said Bobby Samuels, CEO and Co-Founder of Protege. “OmicsData Inc’s multi-modal, longitudinal data opens new frontiers in precision medicine, biomarker discovery, and clinical trial prediction.”

“We’re proud to partner with Protege to make our dataset available to the global AI community,” said Sumit Sinha, CEO and Founder of OmicsData Inc. “Our healthcare division’s mission has always been to accelerate scientific discovery and clinical insight by unlocking the full potential of complex biomedical data. Through Protege, we’re making that data accessible to the innovators building the next generation of tools for global health.”

About OmicsData Inc

OmicsData Inc is a leading data structuring platform. Their healthcare division looks at biomedical data and is focused on unlocking the power of multi-omics and clinical data across Asia and the Middle East. With a secure and scalable infrastructure, OmicsData Inc., provides researchers and developers access to over 6 million de-identified patient records, spanning EMRs, diagnostics, imaging, genomics, and biospecimens. Learn more at https://omicsbank.com/.

About Protege

Protege is the trusted source for finding and sharing  AI training data, enabling seamless and compliant data exchange. By empowering data holders and connecting them with AI developers, Protege supports the creation of thoughtful AI solutions. Protege’s scientific & strategic approach allows AI teams to quickly discover and license a wide array of curated datasets across industries, expediting the time to obtain AI-ready data for model development. Learn more at www.withprotege.ai.