The Problem Nobody Talks About
Synthetic data generation pipelines for robot imitation learning have matured fast. Tools like IsaacMimic and SkillGen can take a handful of human demonstrations and generate thousands of training episodes at scale. When the pipeline works, the results are compelling. When it doesn’t, the failure is rarely where people expect.
The physics are mature. The policy architectures are well-validated. The demonstration collection, assuming reasonable teleoperation quality, is not usually the problem. The problem is frequently the step sitting between demonstration collection and data generation — subtask annotation — and the field has not treated it with the seriousness it deserves.
Subtask annotation is the process of dividing teleoperated demonstrations into labelled segments that the data generator uses to transform and recombine trajectories across new scene configurations. It is manual, judgment-intensive, has no standardised methodology, limited tooling, and almost no community guidance beyond brief documentation notes. It is also the step whose quality most directly determines whether your generated dataset is genuinely useful for policy training — or subtly, silently broken.
This blog is about that step: how it goes wrong, and how to validate it systematically before committing to expensive generation runs.