Hello, I'm Wei Yang

I am a Senior Research Scientist at the NVIDIA Seattle Robotics Lab. I received my Ph.D. in Electronic Engineering from the Chinese University of Hong Kong in 2018, under the supervision of Prof. Xiaogang Wang and co-supervision of Prof. Wanli Ouyang. During my doctoral studies, I was a visiting student at the Robotics Institute, Carnegie Mellon University, where I worked with Prof. Abhinav Gupta.

Research interests: Robotics Manipulation, Computer Vision, Embodied Intelligence.


News

Publications

Conferences & Preprints

Dexplore: Scalable Neural Control for Dexterous Manipulation from Reference-Scoped Exploration

Dexplore: Scalable Neural Control for Dexterous Manipulation from Reference-Scoped Exploration

CoRL, 2025

A scalable neural control framework for dexterous manipulation using reference-scoped exploration.

VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tuning

VT-Refine: Learning Bimanual Assembly with Visuo-Tactile Feedback via Simulation Fine-Tuning

CoRL, 2025 Best Paper Award at IROS 2025 AHFHR Workshop

Learning bimanual assembly tasks using visuo-tactile feedback through simulation fine-tuning.

Cosmos World Foundation Model Platform for Physical AI

Cosmos World Foundation Model Platform for Physical AI

Whitepaper, 2025

A world foundation model platform to help developers build customized world models for Physical AI applications with video curation pipeline, pre-trained models, and post-training examples.

FoundationPose: Unified 6d pose estimation and tracking of novel objects

FoundationPose: Unified 6d pose estimation and tracking of novel objects

CVPR, 2024 Highlight (AC 2.8%)

A foundation model for 6D pose estimation and tracking of novel objects.

Learning Human-to-Robot Handovers from Point Clouds

Learning Human-to-Robot Handovers from Point Clouds

CVPR, 2023 Highlight (AC 2.5%)

Learning human-to-robot handovers directly from point cloud observations.

Reactive Human-to-Robot Handovers of Arbitrary Objects

Reactive Human-to-Robot Handovers of Arbitrary Objects

ICRA, 2021 Best Paper Award on Human-Robot Interaction (HRI)

A reactive approach for human-to-robot handovers.

Human Grasp Classification for Reactive Human-to-Robot Handovers

Human Grasp Classification for Reactive Human-to-Robot Handovers

IROS, 2020

A grasp classification approach for reactive human-to-robot handovers.

Collaborative Interaction Models for Optimized Human-Robot Teamwork

Collaborative Interaction Models for Optimized Human-Robot Teamwork

IROS, 2020

Optimizing human-robot teamwork through collaborative interaction models.

Visual Semantic Navigation using Scene Priors

Visual Semantic Navigation using Scene Priors

ICLR, 2019

Using scene priors for visual semantic navigation.

3D Human Pose Estimation in the Wild by Adversarial Learning

3D Human Pose Estimation in the Wild by Adversarial Learning

CVPR, 2018

Using adversarial learning for 3D human pose estimation in unconstrained environments.

Learning Feature Pyramids for Human Pose Estimation

Learning Feature Pyramids for Human Pose Estimation

ICCV, 2017

A pyramid network architecture for human pose estimation.

Identity-Aware Textual-Visual Matching with Latent Co-attention

Identity-Aware Textual-Visual Matching with Latent Co-attention

ICCV, 2017

Using latent co-attention for identity-aware textual-visual matching.

Towards Multi-Person Pose Tracking: Bottom-up and Top-down Methods

Towards Multi-Person Pose Tracking: Bottom-up and Top-down Methods

ICCV PoseTrack Workshop, 2017

Bottom-up and top-down methods for multi-person pose tracking.

Multi-Context Attention for Human Pose Estimation

Multi-Context Attention for Human Pose Estimation

CVPR, 2017

Using multi-context attention mechanisms for human pose estimation.

End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation

End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation

CVPR, 2016 Oral (AC 3.9%)

Learning deformable mixture of parts and CNNs end-to-end for pose estimation.

Multi-task Recurrent Neural Network for Immediacy Prediction

Multi-task Recurrent Neural Network for Immediacy Prediction

ICCV, 2015 Oral (AC 3.3%)

Using multi-task RNNs for immediacy prediction.

Clothing Co-Parsing by Joint Image Segmentation and Labeling

Clothing Co-Parsing by Joint Image Segmentation and Labeling

Wei Yang, Ping Luo, Liang Lin
CVPR, 2014

Joint image segmentation and labeling for clothing co-parsing.

Data-Driven Scene Understanding by Adaptive Exemplar Retrieval

Data-Driven Scene Understanding by Adaptive Exemplar Retrieval

ICME, 2014

Using adaptive exemplar retrieval for scene understanding.

Learning Contour-Fragment-based Shape Model with And-Or Tree Representation

Learning Contour-Fragment-based Shape Model with And-Or Tree Representation

CVPR, 2012

Using And-Or tree representation for shape modeling.

Interactive CT image segmentation with online discriminative learning

Interactive CT image segmentation with online discriminative learning

ICIP, 2011

Interactive medical image segmentation with online learning.

Journals

Scene Synthesizer: A Python Library for Procedural Scene Generation in Robot Manipulation

Journal of Open Source Software (JOSS), January 2025

Progressively diffused networks for semantic visual parsing

Pattern Recognition (PR), 2019

Clothes Co-Parsing via Joint Image Segmentation and Labeling with Application to Clothing Retrieval

IEEE Transactions on Multimedia (T-MM), 2016

Inference With Collaborative Model for Interactive Tumor Segmentation in Medical Image Sequences

IEEE Transactions on Cybernetics (T-Cybernetics), 2015

Data-Driven Scene Understanding with Adaptively Retrieved Exemplars

IEEE Multimedia, 2015

Discriminatively Trained And-Or Graph Models for Object Shape Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI), 37(5): 959-972, 2015