IAPR Invited Talks

July 23rd,

Opportunities in Egocentric Video Understanding

IAPR Invited Speaker


Prof. Dima Damen
University of Bristol

Biography

Dima Damen is a Professor of Computer Vision at the University of Bristol. Dima is currently an EPSRC Fellow (2020-2025), focusing her research interests in the automatic understanding of object interactions, actions and activities using wearable visual (and depth) sensors. She has contributed to novel research questions including assessing action completion, skill/expertise determination from video sequences, discovering task-relevant objects, dual-domain and dual-time learning as well as multi-modal fusion using vision, audio and language. She is the project lead for EPIC-KITCHENS, the largest dataset in egocentric vision, with accompanying open challenges. She also leads the EPIC annual workshop series alongside major conferences (CVPR/ICCV/ECCV). Dima is associate editor of IJCV, IEEE TPAMI and Pattern Recognition, and was a program chair for ICCV 2021. She was selected as a Nokia Research collaborator in 2016, and as an Outstanding Reviewer in CVPR2021, CVPR2020, ICCV2017, CVPR2013 and CVPR2012. Dima received her PhD from the University of Leeds (2009), joined the University of Bristol as a Postdoctoral Researcher (2010-2012), Assistant Professor (2013-2018), Associate Professor (2018-2021) and was appointed as chair in August 2021. She supervises 8 PhD students, and 3 postdoctoral researchers.

Abstract

Forecasting the rise of wearable computing with video feeds, this talk will present opportunities for research in video understanding when footage is captured from such devices. The unscripted, unedited footage from a camera that travels with the wearer around their daily activities poses challenges to current video understanding models, but more importantly offers opportunities in multi-modal fusion (video, audio and language) for the tasks of recognition and retrieval. In this talk, I will tackle new tasks, new approaches for supervision, and new models in egocentric vision.
All projects details are at: https://dimadamen.github.io/index.html#Projects

July 24th,

On the 2nd AI Wave: Toward Interpretable, Reliable, and Sustainable AI

IAPR Invited Speaker


Prof. CC Jay Kuo
University of Southern California

Biography

Dr. Kuo has received numerous awards for his research contributions, including the 2010 Electronic Imaging Scientist of the Year Award, the 2010-11 Fulbright-Nokia Distinguished Chair in Information and Communications Technologies, the 2019 IEEE Computer Society Edward J. McCluskey Technical Achievement Award, the 2019 IEEE Signal Processing Society Claude Shannon-Harry Nyquist Technical Achievement Award, the 72nd annual Technology and Engineering Emmy Award (2020), and the 2021 IEEE Circuits and Systems Society Charles A. Desoer Technical Achievement Award. Dr. Kuo was Editor-in-Chief for the IEEE Transactions on Information Forensics and Security (2012-2014) and the Journal of Visual Communication and Image Representation (1997-2011). He is currently the Editor-in-Chief for the APSIPA Trans. on Signal and Information Processing (2022-2023). He has guided 164 students to their PhD degrees and supervised 31 postdoctoral research fellows.

Abstract

Rapid advances in artificial intelligence (AI) in the last decade have been primarily attributed to the wide applications of deep learning (DL) technologies. I view these advances as the first AI wave. There are concerns with the first AI wave. DL solutions are a black box (i.e., not interpretable) and vulnerable to adversarial attacks (i.e., unreliable). Besides, the high carbon footprint yielded by large DL networks is a threat to our environment (i.e., not sustainable). Many researchers are looking for an alternative solution that is interpretable, reliable, and sustainable. This is expected to be the second AI wave. To this end, I have conducted research on green learning (GL) since 2015. GL was inspired by DL. Low carbon footprints, small model sizes, low computational complexity, and mathematical transparency characterize GL. It offers energy-effective solutions in cloud centers and mobile/edge devices. It has three main modules: 1) unsupervised representation learning, 2) supervised feature learning, and 3) decision learning. GL has been successfully applied to a few applications. My talk will present the fundamental ideas of the GL solution and highlight a couple of demonstrated examples.

July 25th,

How AI Transforms the Medical Field: A Focus on Medical Imaging

IAPR Invited Speaker


Prof. Kensaku Mori
Nagoya University

Biography

Graduate School of Informatics, Nagoya University
Information Technology Center, Nagoya University
Research Center for Medical Big Data, National Institiute of Informatics

Abstract

In this presentation, we explore the ways in which AI is transforming the medical field, with a particular focus on medical imaging. AI technology has rapidly permeated various aspects of our daily lives, with developments like generative AI and ChatGPT regularly making headlines. The medical field is no exception, as diagnostic procedures employing AI are becoming increasingly popular. One notable example is the automated diagnosis of colonoscopic videos. Real-time detection and classification of colonic polyps are now achievable through AI-based algorithms, and have been commercialized as government-certified medical devices. This growing utilization of AI in the medical field has the potential to significantly change the landscape of healthcare. In fact, many young doctors are now adept at creating their own AI systems. In this talk, we will present the current state of medical AI and discuss the future direction of AI in the medical field, drawing upon our experiences in developing AI systems for medical imaging. Additionally, we will address the challenges and regulatory considerations surrounding the development of medical AI.