Visual Representations in Robot Learning
We discuss the role of visual representations in robot learning. While particularly focusing on action policy learning (with robot arms), we describe how computer vision models, such as Vision Transformers and Vision-Language Models, enable robot learning of complicated real-world tasks by supporting better visual representations. Multiple cases including Visionary (with neural architecture search), Robotics Transformer (RT-1), and Token Turing Machines will be discussed.
Dr. Michael S. Ryoo
SUNY Empire Innovation Associate Professor
Stony Brook University, USA. Staff Research Scientist, Google DeepMind, USA.
Michael S. Ryoo is a SUNY Empire Innovation Associate Professor in the Department of Computer Science at Stony Brook University, and is also a staff research scientist at Google DeepMind. His research focuses on video representations, self-supervision, neural architectures, egocentric perception, as well as robot action policy learning. He previously was an assistant professor at Indiana University Bloomington, and was a staff researcher within the Robotics Section of NASA's Jet Propulsion Laboratory (JPL). He received his Ph.D. from the University of Texas at Austin in 2008, and B.S. from Korea Advanced Institute of Science and Technology (KAIST) in 2004. His paper on robot-centric activity recognition at ICRA 2016 received the Best Paper Award in Robot Vision. He provided a number of tutorials at the past CVPRs (including CVPR 2022/2019/2018/2014/2011) and organized previous workshops on Egocentric (First-Person) Vision.
Neural Fields in Visual Computing: Foundations and Applications
Neural fields are emerging as a new scene representation for computer vision and computer graphics. This technology has been applied to various applications including novel-view synthesis and digital human, demonstrating its efficacy over other data representations. In this talk, we will take a deep dive into the foundations of neural fields and how neural fields advance several computer vision applications. We will also discuss remaining challenges and open questions to facilitate future research.
Dr. Shunsuke Saito
Reality Labs Research (Pittsburgh), USA
Shunsuke Saito is a Research Scientist at Meta Reality Labs Research in Pittsburgh. He obtained his PhD degree at the University of Southern California. Prior to USC, he was a Visiting Researcher at University of Pennsylvania in 2014. He obtained his BE (2013), ME (2014) in Applied Physics at Waseda University. His research lies in the intersection of computer graphics, computer vision and machine learning, especially centered around digital human, 3D reconstruction, and performance capture. His work has been published in SIGGRAPH, SIGGRAPH Asia, NeurIPS, ECCV, ICCV and CVPR, two of which have been nominated for CVPR Best Paper Award (2019, 2021). His real-time volumetric teleportation work also won Best in Show award in SIGGRAPH 2020 Real-time Live!