Tutorials (July 28)
Tutorial 1: Learning and Improving Multimodal Commonsense Reasoning
Abstract
The tutorial explores the journey from basic perceptual processes to complex, real-world reasoning tasks. We begin by examining how the human brain and modern artificial systems perceive and understand visual information. Then, we dive into visual cognition, the mental processes that enable recognition, attention, and categorization. The talk progresses to visual reasoning, where we will discuss recent breakthroughs, including advances in several mainstream methods. Applications of these advancements are wide-ranging, and we will demonstrate real use cases where systems make complex decisions based on real-time visual data, highlighting their impact on fields.
Tutor
Bo Wu
MIT-IBM Watson AI Lab
Biography
Bo Wu is currently a Researcher with MIT-IBM Watson AI Lab, Cambridge, MA. Bo received his Ph.D. in Computer Science from the Institute of Computing Technology, Chinese Academy of Sciences, and he was a Research Scientist at Columbia University. His research interests encompass deep learning, multimodal learning, computer vision, and natural language understanding, with a focus on visual, linguistic, and user behavior analysis, forecasting, and reasoning. He won the research awards, including the IBM Master Inventor Award, IBM Level-A Accomplishment Award, ACL Best Demo Paper Award (2020), the ACM Turing 50th Student Scholarship, and the Schlumberger Ph.D. Award. His team has excelled in global competitions, securing top positions in the NIST TAC SM-KBP (1st), the ICIP Prediction Challenge (1st), and the Alibaba Global Vision AI Challenge (3rd, top 0.1%), etc. He has organized events such as the ACM MM SMP Challenge, and the CVPR Workshop and Challenge MVCS and MMFM. He has also served in key roles as Area Chair, Track Chair, Senior Program Committee Member, and Program Committee Board Member for the conferences including ACM Multimedia, AAAI, IJCAI, etc.
Tutorial 2: Developing and Evaluating Interactive Lifelog Retrieval Systems
Abstract
Lifelogging is the process of gathering rich multimodal archives of life experience data over extended time periods, often months or years. Over the past decade, lifelogging has evolved from a niche research topic into a vibrant interdisciplinary field at the intersection of computer vision, information retrieval, and human-computer interaction. In this tutorial, I will reflect on ten years of research into interactive lifelog retrieval, drawing insights from the ACM Lifelog Search Challenge (LSC) and other major initiatives that have shaped the community. I will highlight the key milestones in the development of multimodal lifelog datasets, advances in visual/multimodal semantic indexing and search, and the emergence of novel access methodologies and user interfaces for multimodal interactive systems. By examining how the field has addressed the challenge of turning vast multimodal personal archives into searchable, meaningful content, I will offer a forward-looking perspective on the opportunities and open questions that lie ahead in building intelligent, user-centric lifelog systems.
Tutor
Cathal Gurrin
Dublin City University
Biography
Professor Cathal Gurrin is a researcher and academic at Dublin City University (DCU) in Ireland. He is also the Deputy Director of the national ADAPT Centre for digital content technologies. Gurrin’s research interests focus primarily on personal analytics and lifelogging, which involves creating extensive personal databases of lifelog images and other sensor data to capture and analyse daily activities and experiences. He has been a pioneer in this field, amassing a continuous personal archive since 2006 that includes over 15 million wearable camera images and hundreds of millions of other sensor readings. His work aims to develop assistive technologies that use wearable sensors and data analytics to infer knowledge about real-world activities and enhance individual performance and health. Gurrin is heavily involved in community conference organising activities and he has been the general co-chair of many leading conferences in his field, such as ECIR'11, MM'14, ICMR'20, MM'22/23, ICMR'24 and will be the general co-chair of ACM MM'25 and ACM Web'27 in Dublin. He is also an organiser of the annual VBS Challenge at MMM and co-founder of the ACM LSC Challenge at ICMR.