Nov 17, 2025
Summary
Data Licensing Revenue Unlocked: Gradient Health, a leading provider of medical imaging studies, partnered with Protege to license de-identified, HIPAA‑compliant data at scale, unlocking seven-figures in net new revenue in under a year with more opportunities in the pipeline.
Protege Partnership: Protege aggregates multi-modal healthcare datasets at scale across dozens of partners, opening new revenue opportunities through data breadth (access to the largest buyers via scale and modality coverage) and depth (deal‑specific, cross-modality data curation that requires multiple partners).
Combined Value: The scale, data diversity, and cloud-native storage of Gradient Health’s de-identified imaging datasets combined with Protege’s technical expertise in AI data has shortened data delivery timelines from months to a week or less for healthcare AI model builders.
Ongoing Opportunities: The speed and Protege’s relationships with leading AI research teams has led to repeat demand and follow-up revenue discussions that continue to involve Gradient Health and other Protege partners across modalities.
“Working with Gradient has unlocked new possibilities for cutting edge healthcare AI development,” shared Protege Healthcare Partnerships Lead Kaleb Dubin. “Their large datasets and quick turnaround times pair perfectly with Protege’s AI training data expertise to deliver effective and efficient results.”
Gradient’s Chief Data Officer, Benji Meltzer shared: “Our view has always been that a world without data is a world without progress. Protege shares that belief, and this partnership has proven both useful and profitable for everyone involved. Gradient specializes in the uniquely complex space of ingesting and deidentifying medical imaging data, and by working with Protege we’re able to help our customers connect imaging with other modalities of healthcare data, unlocking the scale and diversity needed to build sophisticated medical technologies.”
Introduction: Readying Health Data for Licensing
As one of the healthcare market’s largest imaging dataset providers, Gradient Health was a natural fit for the Protege healthcare data platform. Gradient Health provides large-scale imaging datasets, with the data de-identified and stored in the cloud for streamlined access and delivery.
The biggest players in healthcare AI come to Protege searching for large scale, multimodal datasets for training and evaluating their AI models. Developing these models requires specialized data that fulfills the following criteria:
Scale and Speed: The largest AI companies continue to seek millions of imaging studies on aggressive timelines.
Multi-modal Needs: Often these imaging studies must be paired with other healthcare data modalities such as clinical notes, pathology slides, claims, and EHR. These modalities must be connected at the patient level, while also maintaining the timeline of care across all the formats.
Privacy and HIPAA Compliance: Working with patient healthcare data requires maintaining the highest levels of privacy standards and HIPAA compliance throughout all stages of data transfer.
As the single platform for multi-modal AI training data and healthcare data expertise, Protege was able to work closely with Gradient Health to solve for all three of these core needs for multiple healthcare AI companies.
“We’re seeing more and more that no single source can meet the demands of leading AI developers in healthcare,” explained Kaleb Dubin, Protege’s Healthcare Data Partnerships Lead. “The sheer volume of data combined with very specific data requirements across modalities for each use case makes it nearly impossible for a single healthcare data provider to deliver on their own.”
The Data Difference: Breadth, Curation, Multi-Modality, Privacy, and Speed
On the surface, it’s easy to think that licensing healthcare data for AI training and evaluation is similar to any other healthcare data transfer. However, ongoing data requests have demonstrated key differences in four distinct ways:
1) Breadth: Volume of requests requires aggregation
As the leading AI companies train their models, they increasingly need sheer volume. Think: 15 million x-ray studies or 10 million multi-modal patient records with unstructured clinical notes connected to radiology reports and DICOMs.
That’s where Gradient Health’s massive imaging datasets combined with Protege’s multi-modal data sources across dozens of other healthcare data partners and AI data expertise comes together to meet AI developer needs.
Protege packaged Gradient Health’s datasets with collections from Protege’s other partners to create ready-to-use bundles that met the requested volumes. This helped Gradient gain access to a wider range of revenue opportunities and data requests, while Gradient’s dataset volumes helped meet the growing high-volume healthcare data requests in the market.
2) Depth: Health data often requires custom curation
In addition to volume requirements, leading AI developers and labs are also looking for curated data sets in healthcare that go beyond a typical request that individual healthcare data sources might typically receive.
AI model builders often request data that only meets a specific criteria, whether that’s a particular patient cohort or specific modality combinations that require overlapping datasets that span multiple partners. This is where Gradient Health’s breadth of data coverage helps ensure that there are robust patient counts and cohorts available even when multiple modalities or specific inclusion or exclusion criteria are required for the AI training or evaluation data needs.
In many cases where there are specific data requirements, the Protege team uses custom NLP and model-assisted classification techniques on healthcare data to narrow down to the right patient cohorts and curate the right dataset, regardless of the data criteria. Thanks to the cloud-native datasets provided by Gradient Health, Protege is also able to meet accelerated data delivery timelines involving these imaging studies.
3) Multi-Modality: Aggregating Patient Data Across Sources
Because the subjects in the Gradient Health datasets are already de-identified, their imaging studies could be safely connected to other medical records for a single patient across modalities, such as EHR, pathology, and claims.
However, combining those data sources required a single platform — and that’s where Protege came into play. By being the single source of truth for de-identified patient data across modalities, Protege unlocked new deal opportunities with multiple of the largest foundation models, who continue to crave that level of single-patient knowledge across what have traditionally been disjointed sources.
Understanding a single patient’s history across different providers and formats is key for AI companies in the healthcare space — even the best AI models to-date struggle with specific use cases that require a longitudinal, comprehensive view of the patient’s healthcare history across providers and forms of care.
Early results have shown that training the next generation of AI models for healthcare with multi-modal, longitudinal records has led to materially better results. The demand for that data continues to rise.
4) Privacy: De-Identification and Maintaining Privacy
With deep expertise in both AI development and healthcare data governance, Protege holds an important place between health data providers like Gradient Health and large foundation models.
Protege works closely with specialized privacy partners to ensure all healthcare data meets the highest de-identification standards while preserving clinical utility throughout the data transfer delivery process. This ensures that health data providers are safely sending de-identified patient information to Protege, and that healthcare AI model builders are training, fine-tuning, and evaluating their models on secure and HIPAA-compliant data sources.
The Protege Data Lab has established comprehensive processes in close coordination with leading privacy technology partners to maintain privacy across multimodal datasets. For example, when imaging data partners apply date-shifting for privacy protection, Protege has developed proprietary methodology to maintain the patient journey chronology across EHR and imaging data sources. This ensures that the datasets remain clinically meaningful for AI training and evaluation, while fully protecting patient privacy.
5) Speed: Data needed this month, not year
Leading AI developers now release their latest updates on a quarterly, if not monthly basis, and their data needs match that pace as well. Unlike existing healthcare systems, networks, and providers, AI companies continue to demand accelerated data delivery timelines and quick results.
The healthcare data industry as a whole is used to 6-12 month delivery windows. However, the combination of Gradient Health’s cloud-based datasets, speedy contracting thanks to pre-vetted agreements, Protege’s other multimodal data sources, and the Protege Data Lab’s technical input for data curation and delivery have sped up large-scale transfers by orders of magnitude. The multi-source breadth of data on the Protege platform—one of the largest in the world—also enables the Protege team to check for data integrity issues for each data partner involved in a deal.
Protege’s technical guidance—both for Gradient Health and the AI company buyers—has been instrumental to both the speed and accuracy of the data processing and transfer. This applied in the case of a specific imaging study request that involved a massive volume of studies, isolated to positive studies only.
“We approached the problem from an applied research perspective—treating it as both a scientific and operational challenge,” explained Protege’s Chief Science Officer and Tulane healthcare economist Engy Ziedan, Ph.D. “We spent about two weeks analyzing tens of millions of studies, focusing on reliably identifying positive findings and detecting any biases that could distort the data. That required combining the Protege Data Lab’s expertise across computer science, statistics, healthcare informatics, and radiology to address the data from multiple angles.”
“The result was a highly refined dataset: roughly 0.5% of all images in that modality performed over the past decade, each with positive findings. That precision was a major win for both data quality and downstream AI performance.”
Relationship Roadmap: Meeting AI Developer Needs
Together, Gradient Health and Protege were able to generate millions in new revenue with their initial partnership. In addition, this first opportunity opened the doors to future data deals with other partners that also used data that the provider uploaded to the Protege platform. In the first year since joining the Protege platform, Gradient Health has worked together to fulfill multiple enterprise AI deals, with additional follow-up requests actively being discussed.
Looking ahead, it has become increasingly clear that combining their existing imaging dataset with other sources on the aggregated Protege platform has unlocked additional opportunities for both Gradient Health and Protege together.
“Partners worry about commoditization,” said Protege’s Kaleb Dubin. “But the reality is the opposite. Aggregation gets them into more deals, and Protege’s curation commands a price premium. In a few cases, we’ve actually negotiated higher prices than what partners originally asked for. We made sure they realized the full value of their data assets.”
Unlock New Revenue with Protege
To learn more about how your healthcare organization can unlock new revenue through licensing for AI like this imaging provider did in less than a year, contact the Protege healthcare team at contact@withprotege.ai.
