Jan 27, 2026
We’re excited to announce our newest partnership with Tough Data, a company building the human skill data infrastructure for Physical AI. We believe that the type of data that Tough Data specializes in will be crucial for translating real-world expertise into production-grade training data for robots and embodied intelligence.
“Physical AI needs training data that reflects how manual work is done in real world settings,” said Grant Murphy-Herndon, Protege’s Head of Motion Capture Data. “Tough Data is capturing the combination of motion, decision-making, and safety behaviors that embodied systems need to learn. We’re excited to partner with them to help AI teams access training-ready human skill data with the protections and licensing clarity they expect.”
As the robotics and embodied AI ecosystem accelerates, many teams face the same bottleneck: models need more than clean motion clips or simulation traces—they need real demonstrations of competent work, performed under real world constraints.
Vishaal Bhuyan, CEO and Founder at Tough Data, added: “We built Tough Data to capture how experts really work—the judgment, adaptation, and recovery behaviors that are missing from most datasets. Partnering with Protege gives AI builders a streamlined way to access our production-grade human skill data in formats that work for training pipelines, so embodied intelligence can move from demos to real deployment.”
Tough Data’s data, which is now available for licensing via Protege, captures how expert humans operate in complex environments in reality, rather than in simulation or controlled environments. This includes how people move, make decisions, adapt to unexpected conditions, recover from errors, and work safely when the stakes are real.
This approach goes beyond traditional motion capture and beyond synthetic simulation — the result is structured and labeled data that can plug directly into AI development pipelines.
Training-ready data for embodied intelligence
We see this data as being immediately applicable for AI labs and builders looking for high-fidelity human skill data.
Imitation learning and behavior cloning from skilled workers
Policy learning and evaluation across complex, real-world task variants
Robustness and safety testing, including recovery behaviors and edge-case handling
Embodied intelligence development for humanoids, industrial automation, and intelligent machines
From “movement data” to “work data”
Most existing robotics datasets skew toward narrowly scoped tasks—warehouse pick-and-pack, repetitive factory motions, or controlled lab interactions. Tough Data is purpose-built for skilled trades and industrial work, where judgment and problem-solving matter as much as movement.
This applies to human expertise across domains such as agriculture, construction, electrical work, plumbing, equipment operation, maintenance, and other high-value physical professions. These are some of the key environments where Physical AI must ultimately perform to create real economic value. This transforms skilled labor tasks into a scalable data asset, specifically built to help automated learning systems perform real work with greater competence and reliability beyond simply manipulating objects. This may include actions such as executing multi-step tasks, responding to changing conditions, and prioritizing human safety when constraints collide.
Protege + Tough Data
Protege enables trusted, compliant access to specialized training datasets—so AI builders can move faster without compromising rights or governance. Through this partnership, Protege will help make Tough Data’s human skill datasets licensable to AI developers, with clear terms and a secure path from data source to model development.
This partnership reflects a shared focus on accelerating Physical AI with real-world, training-ready data, enabling intelligent machines to learn the skills that matter most: working safely, adapting intelligently, and performing reliably in the environments where value is created.
About Tough Data
Tough Data is building the world’s first human skill data infrastructure for Physical AI, transforming real-world expertise into production-grade training data for robots and embodied intelligence. The company captures how expert humans actually work in complex environments: how they move, make decisions, adapt to unexpected conditions, recover from errors, and operate safely under real constraints.
About Protege
Protege is the trusted source for finding and sharing AI training data, enabling seamless and compliant data exchange. By empowering data holders and connecting them with AI developers, Protege supports the creation of thoughtful AI solutions. Protege's scientific & strategic approach allows AI teams to quickly discover and license a wide array of curated datasets across industries, expediting the time to obtain AI-ready data for model development.
The Protege Motion Capture data team licenses, aggregates, and prepares diverse data sources for all stages of the AI development lifecycle across industries ranging from audiovisual applications to multimodal models to world building and more. Protege’s team of in-house Data Lab experts provide the requisite expertise necessary to curate and evaluate datasets that are purpose-built for AI. As a whole, the company’s vision is that the right training and evaluation data is used to reflect the entirety of the human experience, reducing bias and increasing representation.
To learn more about how your organization can unlock new revenue by ethically licensing your content for AI, or access datasets purpose-built for AI, fill out our partner information form or contact the Protege team at contact@withprotege.ai.

