Sean Kirmani

Downloadable PDF

Work Experience

Google DeepMind — Senior Research Scientist [Apr 2023 - Present]

Artificial intelligence research in vision, language, and robotics. Mountain View, California.

Everyday Robots — Research Lead, Semantic Perception [Nov 2021 - Apr 2023]

Everyday Robots spun out of Google[x] in November 2021. Mountain View, California.

  • Vision-language model lead at Everyday Robots. Introduced and deployed first vision and language model (CLIP) in production for robot visual question answering (VQA). Scaled diffusion models to create synthetic data for CLIP. Landed on-robot open-vocabulary object detector to detect novel objects.
  • Designed and built a multi-sensor (camera and lidar), open-vocabulary panoptic segmentation model.
  • Full-stack ML engineer: end-to-end ownership of entire ML flywheel from data collection to inference. Built model automation pipeline for data collection, training, evaluation, and on-robot deployment for all perception models.
Google[x] — Senior Research Engineer, The Everyday Robot Project [July 2018 – Nov 2021]

Early computer vision engineer at The Everyday Robot Project. Mountain View, California.

  • Early engineer on the perception team. Expert in bringing research to production in real world systems.
  • Created the lidar panoptic segmentation model and RGB-D camera panoptic segmentation model (with associated automation flywheel) and deployed to robot fleet.
  • Trained multimodal vision and action models, resulting in publication at ICRA.
  • Filed 5 patents and published 1 paper.
  • Built the first 3D object tracker.
Google[x] — Research Engineering Intern, The Everyday Robot Project [May 2017 – Aug 2017]

Worked on perception for human-robot interaction. Mountain View, California.

Google — Software Engineering Intern, Project Tango [May 2016 – Aug 2016]

Worked on experimental augmented reality. Created environmental lighting system allowing more photorealistic lighting and reflections in augmented reality for Tango SDK. Published Google Developer Blog post with tutorial for usage. Also experimented with video stabilization. Experience in computer vision, computer graphics, and computational photography. Worked with C++, Unity, and Java. Mountain View, California.

Google — Software Engineering Intern, Chrome for Android [May 2015 – Aug 2015]

Served as an intern on tools and infrastructure for Chrome for Android. All code is open source as part of Chromium. Wrote test infrastructure for sign-in authentication test. Also created parametrizable testing framework. All my code is open source as part of Chromium! Worked with Java, Python, and C++. Mountain View, California.

Accordion Health — Software Engineer [Aug 2014 – Jan 2015]

Used machine learning for health care data analytics. Clustered co-morbidity for several sets of patients. Experience in data visualization. Worked with R, Python, and D3.js. Austin, Texas.

Internet Marketing Inc. — Web Developer Intern [Jun 2013 – Aug 2013]

Set up Unix servers and configured SQL databases. Developed over 20 websites in the summer. Managed and maintained cloud servers. Worked with HTML, CSS, PHP, JavaScript, and jQuery. Las Vegas, Nevada.

Publications

Vision Language Models are In-Context Value Learners
Jason Ma, Joey Hejna, Ayzaan Wahid, Chuyuan Fu, Dhruv Shah, Jacky Liang, Zhuo Xu, Sean Kirmani, Peng Xu, Danny Driess, Ted Xiao, Jonathan Tompson, Osbert Bastani, Dinesh Jayaraman, Wenhao Yu, Tingnan Zhang, Dorsa Sadigh, Fei Xia

STEER: Flexible Robotic Manipulation via Dense Language Grounding
Laura Smith, Alex Irpan, Montserrat Gonzalez Arenas, Sean Kirmani, Dmitry Kalashnikov, Dhruv Shah, Ted Xiao

RT-Affordance: Affordances are Versatile Intermediate Representations for Robot Manipulation
Soroush Nasiriany Sean Kirmani, Tianli Ding, Laura Smith, Yuke Zhu, Danny Driess, Dorsa Sadigh, Ted Xiao

Gen2Act: Human Video Generation in Novel Scenarios enables Generalizable Robot Manipulation
Homanga Bharadhwaj, Debidatta Dwibedi, Abhinav Gupta, Shubham Tulsiani, Carl Doersch, Ted Xiao, Dhruv Shah, Fei Xia, Dorsa Sadigh, Sean Kirmani

Mobility VLA: Multimodal Instruction Navigation with Long-Context VLMs and Topological Graphs
Hao-Tien Lewis Chiang, Zhuo Xu, Zipeng Fu, Mithun George Jacob, Tingnan Zhang, Tsang-Wei Edward Lee, Wenhao Yu, Connor Schenck, David Rendleman, Dhruv Shah, Fei Xia, Jasmine Hsu, Jonathan Hoech, Pete Florence, Sean Kirmani, Sumeet Singh, Vikas Sindhwani, Carolina Parada, Chelsea Finn, Peng Xu, Sergey Levine, Jie Tan

Conference on Robot Learning (CoRL), 2024.

Evaluating Real-World Robot Manipulation Policies in Simulation
Xuanlin Li*, Kyle Hsu*, Jiayuan Gu*, Karl Pertsch, Oier Mees, Homer Rich Walke, Chuyuan Fu, Ishikaa Lunawat, Isabel Sieh, Sean Kirmani, Sergey Levine, Jiajun Wu, Chelsea Finn, Hao Su, Quan Vuong, Ted Xiao

Conference on Robot Learning (CoRL), 2024.

Learning to Learn Faster from Human Feedback with Language Model Predictive Control
Jacky Liang*, Fei Xia*, Wenhao Yu*, Andy Zeng*, Montserrat Gonzalez Arenas, Maria Attarian, Maria Bauza, Matthew Bennice, Alex Bewley, Adil Dostmohamed, Chuyuan Kelly Fu, Nimrod Gileadi, Marissa Giustina, Keerthana Gopalakrishnan, Leonard Hasenclever, Jan Humplik, Jasmine Hsu, Nikhil Joshi, Ben Jyenis, Chase Kew, Sean Kirmani, Tsang-Wei Edward Lee, Kuang-Huei Lee, Assaf Hurwitz Michaely, Joss Moore, Ken Oslund, Dushyant Rao, Allen Ren, Baruch Tabanpour, Quan Vuong, Ayzaan Wahid, Ted Xiao, Ying Xu, Vincent Zhuang, Peng Xu†, Erik Frey†, Ken Caluwaerts, Tingnan Zhang, Brian Ichter, Jonathan Tompson, Leila Takayama, Vincent Vanhoucke, Izhak Shafran, Maja Mataric, Dorsa Sadigh, Nicolas Heess, Kanishka Rao, Nik Stewart, Jie Tan, Carolina Parada

Robotics: Science and Systems (RSS), 2024.

PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany*, Fei Xia*, Wenhao Yu*, Ted Xiao*, Jacky Liang, Ishita Dasgupta, Annie Xie, Danny Driess, Ayzaan Wahid, Zhuo Xu, Quan Vuong, Tingnan Zhang, Tsang-Wei, Edward Lee, Kuang-Huei Lee, Peng Xu, Sean Kirmani, Yuke Zhu, Andy Zeng, Karol Hausman, Nicolas Heess, Chelsea Finn, Sergey Levine, Brian Ichter*

International Conference on Machine Learning (ICML), 2024.

Spatial VLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Boyuan Chen*, Zhuo Xu*, Sean Kirmani, Brian Ichter, Danny Driess, Pete Florence, Dorsa Sadigh, Leonidas Guibas, Fei Xia

Computer Vision and Pattern Recognition (CVPR), 2024.

AutoRT: Embodied Foundation Models for Large Scale Orchestration of Robotic Agents
Michael Ahn, Debidatta Dwibedi, Chelsea Finn, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Karol Hausman, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Sean Kirmani, Isabel Leal, Edward Lee, Sergey Levine, Yao Lu, Sharath Maddineni, Kanishka Rao, Dorsa Sadigh, Pannag Sanketi, Pierre Sermanet, Quan Vuong, Stefan Welker, Fei Xia, Ted Xiao, Peng Xu, Steve Xu, Zhuo Xu

RT-Trajectory: Robotic Task Generalization via Hindsight Trajectory Sketches
Jiayuan Gu, Sean Kirmani, Paul Wohlhart, Yao Lu, Montserrat Gonzalez Arenas, Kanishka Rao, Wenhao Yu, Chuyuan Fu, Keerthana Gopalakrishnan, Zhuo Xu, Priya Sundaresan, Peng Xu, Hao Su, Karol Hausman, Chelsea Finn, Quan Vuong, Ted Xiao

International Conference on Robotics and Automation (ICRA), 2024.

RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches
Priya Sundaresan, Quan Vuong, Jiayuan Gu, Peng Xu, Ted Xiao, Sean Kirmani, Tianhe Yu, Michael Stark, Ajinkya Jain, Karol Hausman, Dorsa Sadigh, Jeannette Bohg, Stefan Schaal

Conference on Robot Learning (CoRL), 2024.

How to Prompt Your Robot: A PromptBook for Manipulation Skills with Code as Policies
Montserrat Gonzalez Arenas, Ted Xiao, Sumeet Singh, Vidhi Jain, Allen Z. Ren, Quan Vuong, Jake Varley, Alexander Herzog, Isabel Leal, Sean Kirmani, Dorsa Sadigh, Vikas Sindhwani, Kanishka Rao, Jacky Liang, Andy Zeng

The Conference on Robot Learning (CoRL), 2023.

Open X-Embodiment: Robotic Learning Datasets and RT-X Models
Abhishek Padalkar, Acorn Pooley, Ajinkya Jain, Alex Bewley, Alex Herzog, ..., Ryan Julian, Samuel Bustamante, Sean Kirmani, Sergey Levine, ..., Zhuo Xu, Zichen Jeff Cui

International Conference on Robotics and Automation (ICRA), 2024.

Language to Rewards for Robotic Skill Synthesis
Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Arenas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever, Jan Humplik, Brian Ichter, Ted Xiao, Peng Xu, Andy Zeng, Tingnan Zhang, Nicolas Heess, Dorsa Sadigh, Jie Tan, Yuval Tassa, Fei Xia

The Conference on Robot Learning (CoRL), 2023.

Open-World Object Manipulation using Pre-Trained Vision-Language Models
Austin Stone, Ted Xiao, Yao Lu, Keerthana Gopalakrishnan, Kuang-Huei Lee, Quan Vuong, Paul Wohlhart, Sean Kirmani, Brianna Zitkovich, Fei Xia, Chelsea Finn, Karol Hausman

The Conference on Robot Learning (CoRL), 2023.

Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators
Alexander Herzog, Kanishka Rao, Karol Hausman, Yao Lu, Paul Wohlhart, Mengyuan Yan, Jessica Lin, Montserrat Gonzalez Arenas, Ted Xiao, Daniel Kappler, Daniel Ho, Jarek Rettinghouse, Yevgen Chebotar, Kuang-Huei Lee, Keerthana Gopalakrishnan, Ryan Julian, Adrian Li, Chuyuan Kelly Fu, Bob Wei, Sangeetha Ramesh, Khem Holden, Kim Kleiven, David Rendleman, Sean Kirmani, Jeff Bingham, Jon Weisz, Ying Xu, Wenlong Lu, Matthew Bennice, Cody Fong, David Do, Jessica Lam, Yunfei Bai, Benjie Holson, Michael Quinlan, Noah Brown, Mrinal Kalakrishnan, Julian Ibarz, Peter Pastor, Sergey Levine

Robotics: Science and Systems (RSS), 2023.

Practical Imitation Learning in the Real World via Task Consistency Loss
Mohi Khansari, Daniel Ho, Yuqing Du, Armando Fuentes, Matthew Bennice, Nicolas Sievers, Sean Kirmani, Yunfei Bai, Eric Jang

International Conference on Robotics and Automation (ICRA), 2023.

PRISM: Pose Registration for Integrated Semantic Mapping
Justin Hart, Rishi Shah, Sean Kirmani, Nick Walker, Kathryn Baldauf, Nathan John, Peter Stone

IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.

Passive Demonstrations of Light-Based Robot Signals for Improved Human Interpretability
Rolando Fernandez, Nathan John, Sean Kirmani, Justin Hart, Jivko Sinapov, Peter Stone

IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2018.

Education

The University of Texas at Austin
Bachelor of Science, Computer Science
Bachelor of Science, Electrical Engineering

Honors Thesis: Deep Reinforcement Learning for Aerial Obstacle Avoidance using Monocular RGB Images

Selected Coursework: Show
  • Robot Learning (CS 395T)
  • Human Robot Interaction (EE 382V)
  • Artificial Intelligence (CS 343)
  • Neural Networks (CS 342)
  • Computer Vision (CS 378H)
  • Computer Graphics (CS 354)
  • Physical Simulation (CS 395T)
  • Operating Systems (CS 439)
  • Signal Processing (EE 313)
  • Computer Architecture (EE 460N)
  • Embedded Systems (EE 445L)

University Research

Robotics Lab at UT Austin, Dr. Peter Stone/Dr. Justin Hart [Dec 2017 – Apr 2018]

Worked on semantic mapping and social navigation for non-anthropomorphic robots with Building-wide Intelligence (BWI) project. Austin, Texas.

Robotics Lab at UT Austin, Dr. Andrea Thomaz/Dr. Scott Niekum [Jan 2016 – May 2017]

Research in human robot interaction in the Personal Autonomous Robotics Lab (PeARL) and Socially Intelligent Machines (SiM) Lab. Experience in behavior architectures, perception, manipulation, and machine learning. Austin, Texas.

Wireless Networking & Communication Group, Dr. Joydeep Ghosh [Aug 2014 – Jan 2016]

Selected by Professor Joydeep Ghosh in the University of Texas Electrical and Computer Engineering department in the Intelligent Data Exploration and Analysis Laboratory (IDEAL). Lab focuses on machine learning and data mining. Research on making self-driving cars a safe reality using distributed machine learning through wireless mmWave communication in collaboration with Dr. Robert Heath. [In the news] Austin, Texas.

Contact Info

Research Interests

  • Computer Vision
  • Natural Language Processing
  • Robot Learning
  • Reinforcement Learning
  • Deep Learning
  • Artificial Intelligence

Volunteering

Gatorbotics [Nov 2018 - Jan 2021]

Mentor for FIRST Robotics Competition for team 1700. Palo Alto, CA.

I'm also a...