Jan 20, 2026
Summary
Healthcare Data for Real World Use Cases: A leading healthcare AI company partnered with Protege to connect its patient-level data to Loopback Health’s EHR dataset and other healthcare provider data, which unlocked richer training cohorts for training the next generation of AI models.
Multi-modal Healthcare Data for AI Training: Protege coordinated multiple healthcare data partners to deliver both structured and unstructured EHR data, connecting modalities into a single, de-identified dataset ready for AI development.
Quick Delivery Timelines: The initial cohort of thousands of matched patients was delivered in less than 90 days from initial discovery, with ongoing refreshes expanding both the matched and EHR-only cohorts and opening up follow-on data opportunities.
Multimodal Healthcare Data To Unlock Expanded Use Cases
A leading healthcare AI company was looking to build next-generation AI models that were trained on multiple modalities that built upon their existing data. The company already had one modality; however, they wanted to improve their model’s performance in care recommendation situations by layering on EHR data as well.
As a result, Protege’s healthcare data partner network was uniquely positioned to fulfill the highly targeted overlap needs thanks to Protege’s access to clinical data across its healthcare data partner network. The end result was a multimodal training dataset that connected the existing data with the new modalities, giving the AI models a more thorough understanding of the patient population.
Protege’s healthcare data partners benefitted immensely from this request as well, as it demonstrated the growing need for multi-modal healthcare data that an individual provider tended not to have full diverse coverage for. This unlocked a new, incremental deal opportunity that partners could access with their existing data assets previously used in other Protege deal opportunities — but that no single partner could fulfill on their own given the client’s extensive patient coverage needs.
In addition to the initial overlap, the healthcare AI company also wanted a fast, compliant pathway for ongoing cohort expansion as new data became available via partners. As a result, the final solution needed to:
Reliably connect existing modality data with new layered on EHR data across modalities into a single dataset
Preserve privacy and de-identification throughout the data pipeline process
Support ongoing cohort growth in the future, whether through existing data partner coverage expansion or net new partners added to the Protege platform
Meet a quick turnaround timeline suitable for active AI development cycles
As the go-to, trusted data network for AI training data, Protege delivered on all of the key criteria, unifying disparate data sources and applying AI data expertise to deliver the required data on schedule and at scale.
The Protege Solution: A Single Data Provider Network
Protege served as the single access point to a distributed network of healthcare data partners to deliver a unified, model-ready cohort for the healthcare company for AI development.
Speed and reliability
Protege ran a clear scoping and feasibility process to translate the company’s requirements into a concrete cohort definition and overlap strategy. After reviewing data samples and confirming that they were what the client was looking for, fast contracting and tightly managed coordination with partners supported timely data delivery. This also reduced operational friction for the end buyer.
Breadth and depth of data
Along with a few other providers, Protege partnered with Loopback Health, a healthcare data company, to make structured and unstructured EHR data from a nationwide network of academic medical centers and integrated delivery networks available for AI training use. For the end AI research team, this ensured that the data would have direct usefulness for their upcoming training data runs and support the full model building cycle.
Scalable overlap strategy
Rather than create a one-time static patient cohort delivery alone, Protege designed the data pipeline so that continuous dataset refreshes could potentially increase matched patient counts over time. As new data that was applicable to the client was added to the Protege data partner network, this expanded the patient cohort overlap without creating the need to re-architect the original approach and pipeline design.
Flexible commercialization model
To align with the AI company’s roadmap, Protege implemented multi-year licenses with specified options for re-licensing. This structure allowed the company to plan long-term model development and evaluation on a stable data foundation, while preserving appropriate flexibility over time.
Partnership operating model
As a part of every healthcare data scope of work, Protege program manages feasibility, contracting, and delivery across the healthcare data partner network. This operating model unlocks broader reach to additional partners for AI-specific use cases, extending what any single provider could offer on its own.
Together, this “single data provider network” approach gave the healthcare company a clear path from the initial overlapping patient dataset to ongoing cohort enrichment in the future to support ongoing training data requirements.
Timeline & Data Delivery
From initial data discovery to final delivery of the first linked cohort, Protege executed on a 90-day turnaround timeline. This process included:
Upfront feasibility and scoping across Protege’s healthcare data partner network
Contract execution with the AI company and involved partners
Cohort definition and overlap implementation
Final data preparation and delivery
Within that overall window, final data delivery completed within 30 days of contract signature. This timeliness was important to support the AI company’s active development cycle.
For the patient cohort, Protege delivered patient-level data that connected the existing modality with net new EHR data for those same patients. This also opened the possibility that matched patient counts that could have EHR data layered on in the future could grow via ongoing data refreshes and licensing renewals.
New Healthcare Data Opportunities with Protege
By working with Protege, the healthcare AI company was able to access a coordinated healthcare data network rather than a series of one-off data relationships. Protege’s role as a single data network for healthcare AI development unlocks several ongoing opportunities for both buyers and data partners.
For data buyers, that includes:
Access to one of the world’s largest data catalogs from dozens of partners, which enables overlapping patient cohorts across modalities
Both structured and unstructured EHR data that can be used to enrich existing patient cohorts and support new AI use cases
Ongoing follow-on options to layer on additional modalities and/or expanded patient coverage as model needs evolve and the Protege data partner network continues to expand
Commercial flexibility through multi-year licenses and re-licensing options, aligned with the realities of healthcare AI model development cycles
A partnership model in which Protege manages feasibility, contracting, and delivery across partners, so healthcare AI teams can stay focused on research and product rather than data orchestration
For data partners, that includes:
Access to additional scaled healthcare data opportunities from some of the world’s largest AI organizations
Automatic inclusion in ongoing deal discussions without the need for a dedicated account or commercial team for AI-specific training and evaluation data use cases
Access to Protege’s AI-specific data expertise for data curation and delivery
To learn more about how your healthcare organization can license their data assets to AI companies through Protege, or how your healthcare AI organization can get access to curated AI training and evaluation datasets, fill our partnerships form or contact our healthcare team at contact@withprotege.ai.
