International Challenge on Compositional and Multimodal Perception

August 23, held in conjunction with ECCV 2020

Overview

People understand the world as a sum of its parts. When new concepts are presented, people break them down into familiar parts. Thus, our knowledge representation is naturally compositional. However, many of the underlying architectures in visual tasks generate non-compositional representations. We do not understand the world with just our eyes; we use our other senses as well. Multimodal sensing provides additional information that may not be present with a single modality. We propose the 1st annual installment of the ”Compositionality and Multimodal Perception” Challenge (CAMP).

Links to Datasets

Home Action Genome

Action Genome

Program Schedule

Time (UTC+1)

Event

Title/Presenter

Links

pre-recorded

Opening Remarks

Kazuki Kozuka, Panasonic

[video]

pre-recorded

Intro of Dataset

Action Genome
Jingwei Ji, Stanford University

[video]

pre-recorded

Intro of Dataset

Home Action Genome
Shun Ishizaka, Panasonic

[video]

pre-recorded

Invited Talk

Learning Visual Representations from Textual Annotations
Justin Johnson, University of Michigan, FAIR

[video]

pre-recorded

Invited Talk

Learning to Understand and Generate Actions
Ivan Laptev, INRIA Paris, VisionLabs

[video]

pre-recorded

Invited Talk

Multimodal Intelligence
Chuang Gan, MIT

[video]

20:00 - 21:00

Invited Talk

Audio-visual Learning in Video and 3D Environments
Kristen Grauman, University of Texas at Austin, FAIR

[video]

21:00 - 22:00

Invited Talk

Compositional and Multimodal Perception of Object Interactions
Dima Damen, University of Bristol

[video]

22:00 - 22:15

Closing Remarks

Juan Carlos Niebles, Stanford University

[video]

Invited Speakers

Dima Damen is a Reader (Associate Professor) in Computer Vision at the University of Bristol, United Kingdom. Received her PhD from the University of Leeds, UK (2009). Dima is currently an EPSRC Fellow (2020-2025), focusing on her research interests in the automatic understanding of object interactions, actions and activities using static and wearable visual (and depth) sensors. Dima co-chaired BMVC 2013, is area chair for BMVC (2014-2018), associate editor of IEEE TPAMI (2019-) and associate editor of Pattern Recognition (2017-). She was selected as a Nokia Research collaborator in 2016, and as an Outstanding Reviewer in ICCV17, CVPR13 and CVPR12. She currently supervises 6 PhD students, and 4 postdoctoral researchers.

Justin Johnson is an Assistant Professor at the University of Michigan and a Visiting Scientist at Facebook AI Research. He is broadly interested in computer vision and machine learning. His research involves visual reasoning, vision and language, image generation, and 3D reasoning using deep neural networks. He received his PhD from Stanford University, advised by Fei-Fei Li.

Kristen Grauman is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research Scientist in Facebook AI Research (FAIR). Her research in computer vision and machine learning focuses on visual recognition and search. Before joining UT-Austin in 2007, she received her Ph.D. at MIT. She is a AAAI Fellow, Sloan Fellow, a Microsoft Research New Faculty Fellow, and a recipient of NSF CAREER and ONR Young Investigator awards, the PAMI Young Researcher Award in 2013, the 2013 Computers and Thought Award from the International Joint Conference on Artificial Intelligence (IJCAI), the Presidential Early Career Award for Scientists and Engineers (PECASE) in 2013, and the Helmholtz Prize (computer vision test of time award) in 2017. She was inducted into the UT Academy of Distinguished Teachers in 2017. She and her collaborators were recognized with the CVPR Best Student Paper Award in 2008 for their work on hashing algorithms for large-scale image retrieval, the Marr Prize at ICCV in 2011 for their work on modeling relative visual attributes, the ACCV Best Application Paper Award in 2016 for their work on automatic cinematography for 360 degree video, a Best Paper Honorable Mention at CHI in 2017 for work on crowds and visual question answering, and a Best Paper Finalist at CVPR 2019 for their work on 2.5D visual sound. She currently serves as an Associate Editor-in-Chief for the Transactions on Pattern Analysis and Machine Intelligence (PAMI) and as an Editorial Board member for the International Journal of Computer Vision (IJCV). She previously served as a Program Chair of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2015 and a Program Chair of Neural Information Processing Systems (NeurIPS) 2018.

Ivan Laptev is a senior researcher at INRIA Paris and head of scientific board at VisionLabs. He received a PhD degree in Computer Science from the Royal Institute of Technology in 2004 and a Habilitation degree from École Normale Supérieure in 2013. Ivan's main research interests include visual recognition of human actions, objects and interactions, and more recently robotics. He has published over 70 papers at international conferences and journals of computer vision and machine learning. He serves as an associate editor of IJCV and TPAMI journals, he has served as a program chair for CVPR’18 and is a regular area chair for CVPR, ICCV and ECCV. He has co-organized several tutorials, workshops and challenges at major computer vision conferences. He has also co-organized a series of INRIA summer schools on computer vision and machine learning (2010-2013) and Machines Can See summits (2017-2019). He received an ERC Starting Grant in 2012 and was awarded a Helmholtz prize in 2017.

Chuang Gan is a principal research staff member at MIT-IBM Watson AI Lab. He is also an affiliated researcher at MIT, working closely with Prof. Antonio Torralba and Prof. Josh Tenenbaum. Before that, he completed his Ph.D. with the highest honor at Tsinghua University, supervised by Prof. Andrew Chi-Chih Yao. His research focuses on video understanding, including representation learning, neural-symbolic visual reasoning, audio-visual scene analysis, and model-based embodied intelligence. His research works have been recognized by Microsoft Fellowship, Baidu Fellowship, and media coverage from CNN, BBC, The New York Times, WIRED, Forbes, and MIT Tech Review.