International Challenge on Compositional and Multimodal Perception
August 23, held in conjunction with ECCV 2020
Overview
People understand the world as a sum of its parts. When new concepts are presented, people break them down into familiar parts. Thus, our knowledge representation is naturally compositional. However, many of the underlying architectures in visual tasks generate non-compositional representations. We do not understand the world with just our eyes; we use our other senses as well. Multimodal sensing provides additional information that may not be present with a single modality. We propose the 1st annual installment of the ”Compositionality and Multimodal Perception” Challenge (CAMP).
Program Schedule
Time (UTC+1)
Event
Title/Presenter
Links
pre-recorded
Invited Talk
Learning Visual Representations from Textual Annotations
Justin Johnson, University of Michigan, FAIR
Justin Johnson, University of Michigan, FAIR
[video]
20:00 - 21:00
Invited Talk
Audio-visual Learning in Video and 3D Environments
Kristen Grauman, University of Texas at Austin, FAIR
Kristen Grauman, University of Texas at Austin, FAIR
[video]
Invited Speakers
Dima Damen
is a Reader (Associate Professor) in Computer Vision at the University of Bristol, United Kingdom. Received her
PhD from the University of Leeds, UK (2009). Dima is currently an EPSRC Fellow (2020-2025), focusing on her
research interests in the automatic understanding of object interactions, actions and activities using static
and wearable visual (and depth) sensors. Dima co-chaired BMVC 2013, is area chair for BMVC (2014-2018),
associate editor of IEEE TPAMI (2019-) and associate editor of Pattern Recognition (2017-). She was selected as
a Nokia Research collaborator in 2016, and as an Outstanding Reviewer in ICCV17, CVPR13 and CVPR12. She
currently supervises 6 PhD students, and 4 postdoctoral researchers.
Justin Johnson
is an Assistant Professor at the University of Michigan and a Visiting Scientist at Facebook AI Research. He is
broadly interested in computer vision and machine learning. His research involves visual reasoning, vision and
language, image generation, and 3D reasoning using deep neural networks. He received his PhD from Stanford
University, advised by Fei-Fei Li.
Kristen Grauman
is a Professor in the Department of Computer Science at the University of Texas at Austin and a Research
Scientist in Facebook AI Research (FAIR). Her research in computer vision and machine learning focuses on visual
recognition and search. Before joining UT-Austin in 2007, she received her Ph.D. at MIT. She is a AAAI Fellow,
Sloan Fellow, a Microsoft Research New Faculty Fellow, and a recipient of NSF CAREER and ONR Young Investigator
awards, the PAMI Young Researcher Award in 2013, the 2013 Computers and Thought Award from the International
Joint Conference on Artificial Intelligence (IJCAI), the Presidential Early Career Award for Scientists and
Engineers (PECASE) in 2013, and the Helmholtz Prize (computer vision test of time award) in 2017. She was
inducted into the UT Academy of Distinguished Teachers in 2017. She and her collaborators were recognized with
the CVPR Best Student Paper Award in 2008 for their work on hashing algorithms for large-scale image retrieval,
the Marr Prize at ICCV in 2011 for their work on modeling relative visual attributes, the ACCV Best Application
Paper Award in 2016 for their work on automatic cinematography for 360 degree video, a Best Paper Honorable
Mention at CHI in 2017 for work on crowds and visual question answering, and a Best Paper Finalist at CVPR 2019
for their work on 2.5D visual sound. She currently serves as an Associate Editor-in-Chief for the Transactions
on Pattern Analysis and Machine Intelligence (PAMI) and as an Editorial Board member for the International
Journal of Computer Vision (IJCV). She previously served as a Program Chair of the IEEE Conference on Computer
Vision and Pattern Recognition (CVPR) 2015 and a Program Chair of Neural Information Processing Systems
(NeurIPS) 2018.
Ivan Laptev
is a senior researcher at INRIA Paris and head of scientific board at VisionLabs. He received a PhD degree in
Computer Science from the Royal Institute of Technology in 2004 and a Habilitation degree from École Normale
Supérieure in 2013. Ivan's main research interests include visual recognition of human actions, objects and
interactions, and more recently robotics. He has published over 70 papers at international conferences and
journals of computer vision and machine learning. He serves as an associate editor of IJCV and TPAMI journals,
he has served as a program chair for CVPR’18 and is a regular area chair for CVPR, ICCV and ECCV. He has
co-organized several tutorials, workshops and challenges at major computer vision conferences. He has also
co-organized a series of INRIA summer schools on computer vision and machine learning (2010-2013) and Machines
Can See summits (2017-2019). He received an ERC Starting Grant in 2012 and was awarded a Helmholtz prize in 2017.
Chuang Gan
is a principal research staff member at MIT-IBM Watson AI Lab. He is also an affiliated researcher at MIT,
working closely with Prof. Antonio Torralba and Prof. Josh Tenenbaum. Before that, he completed his Ph.D. with
the highest honor at Tsinghua University, supervised by Prof. Andrew Chi-Chih Yao. His research focuses on video
understanding, including representation learning, neural-symbolic visual reasoning, audio-visual scene analysis,
and model-based embodied intelligence. His research works have been recognized by Microsoft Fellowship, Baidu
Fellowship, and media coverage from CNN, BBC, The New York Times, WIRED, Forbes, and MIT Tech Review.
Organizers
Kazuki Kozuka
Panasonic
Ranjay Krishna
Stanford University
Jingwei Ji
Stanford University
Alec Hodgkinson
Panasonic
Yusuke Urakami
Panasonic
Olga Russakovsky
Princeton University
Juan Carlos Niebles
Stanford University
Fei-Fei Li
Stanford University
Please contact Kazuki Kozuka with any questions: kozuka.kazuki [at] jp [dot] panasonic [dot] com