Why Are We Hiring for this Role
- Develop and optimize vision-language-action models, including transformers, diffusion models, and multimodal encoders/decoders.
- Build representations for 2D/3D perception, affordances, scene understanding, and spatial reasoning.
- Integrate LLM-based reasoning with action planning and control policies.
- Design datasets for multimodal learning: video-action trajectories, instruction following, teleoperation data, and synthetic data.
- Interface VLAM outputs with real-time robot control stacks (navigation, manipulation, locomotion).
- Implement grounding layers that convert natural language instructions into symbolic, geometric, or skill-level action plans.
- Deploy models on on-board or edge compute platforms, optimizing for latency, safety, and reliability.
- Build scalable pipelines for ingesting, labeling, and generating multimodal training data.
- Create simulation-to-real (Sim2Real) training workflows using synthetic environments and teleoperated demonstration data.
- Optimize training pipelines, model parallelism, and evaluation frameworks.
- Work closely with robotics, hardware, controls, and safety teams to ensure model outputs are executable, safe, and predictable.
- Collaborate with product teams to define robot capabilities and user-facing behaviors.
- Participate in user and field testing to iterate on real-world performance.
What Kind of Person are we looking For
- Strong experience with training multimodal models, including VLAs, VLMs, vision transformers, LLMs.
- Ability to build and iterate on large-scale training pipelines.
- Deep proficiency in PyTorch or JAX, distributed training, and GPU acceleration.
- Strong software engineering skills in Python and modern ML tooling.
- Experience with (synthetic) dataset creation and curation.
- Understanding of real-time deployment constraints on embedded hardware.
- Optimally, familiarity with robotics simulation environments (Isaac Lab, Mujoco, or similar).
- Ideally, hands-on experience with robotics, embodied AI, or reinforcement/imitation learning.
- MSc or PhD in Computer Science, Robotics, Machine Learning, or related field—or equivalent industry experience.
Benefits
We provide market standard benefits (health, vision, dental, 401k, etc.). Join us for the culture and the mission, not for the benefits.
Salary
The annual compensation is expected to be between $150,000 - $300,000. Exact compensation may vary based on skills, experience, and location.