Yanbang Wang

I am a second-year Ph.D. student of Computer Science at Cornell University. I have been very fortunate to be advised by Prof. Jon Kleinberg.

Before joining Cornell, I did M.S. in Computer Science at Stanford University, where I worked at Stanford Social Network Analysis Group (SNAP) with Prof. Jure Leskovec and Prof. Pan Li. I received my B.S. in Computer Science degree Summa Cum Laude (1%) from Hong Kong University of Science and Technology, with an additional major in Mathematics. I am also honored to have worked with other great scholars including Prof. Jiawei Han (UIUC), Prof. Una-May O’Reily (MIT CSAIL), and Prof. Huamin Qu (HKUST) on multiple research projects.

Email  /  Google Scholar  /  GitHub  /  LinkedIn

profile photo
Research Area

My research focuses on data mining and machine learning algorithms for understanding graph-structured data. I work with rich-labeled relational structures, graphs, networks that model real-world complex interconnected systems in both static and dynamic nature. My works typically involve interdisciplinary topics such as computational social science, contextualized natural language understanding, e-commerce and recommender systems, etc. I also work on theoretical foundations that characterize and enhance the structural expressiveness of graph neural algorithms. I am broadly interested in data science and general AI research.

  • [01/2021] Two first-authored papers on dynamic network modeling accepted respectively to ICLR'21 and WebConf'21. Check them out below!
  • [09/2020] Our Distance Encoding for GNN Design paper is accepted to NeurIPS'20.
  • [07/2020] Invited to give a talk on collective text classification using semantic links at UIUC Data Mining Group. slides
  • [07/2020] Excited to start my summer research at UIUC Data Mining Group with Prof. Jiawei Han.
  • [07/2020] Ever thought of detecting lies and nervousness with graph algorithms? Check out our on-going work on graph neural modeling of dynamic social interactions presented spotlight at KDD'20 Workshop
  • [06/2020] Invited to give a talk at Social Science History Assoication Annual Meeting in Washington D.C., US on our novel graph algorithm for large-scale modeling of career movement in Chinese Qing Dynasty over 300 years.
  • [10/2019] Our AI-powered video analysis system EmotionCue gets media coverage by IEEE Spectrum and NHK (Japan's National Broadcast).
  • [08/2019] Invited to give a spotlight talk at Central China Normal University (Wuhan, China) on graphical modeling of Chinese history data. Updated 01/2020: my dearest hope and support to all my friends there!
  • [03/2019] Two papers accomplished during internship at MIT CSAIL are presented in oral at International Conference on Learning Analytics and Knowledge (LAK'19), Phoenix, US.

Academic Service
PC member/regular reviewer: NeurIPS, ICLR, ICML, KDD, WebConf

Selected Publications

Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks
Yanbang Wang, Yen-Yu Chang, Yunyu Liu, Jure Leskovec, Pan Li
International Conference on Learning Representations (ICLR) , 2021
[preprint] [project page] [code]   

We propose Causal Anonymous Walks (CAWs) for inductive representation learning in temporal networks. CAWs are extracted by temporal random walks and work as automatic retrieval of temporal network motifs to represent network dynamics, while avoiding the time-consuming selection and counting of those motifs. CAWs adopt a novel anonymization strategy that replaces node identities with the hitting counts of the nodes based on a set of sampled walks to keep the method inductive, and simultaneously establish the correlation between motifs. CAWs are the current SOTA on both transductive and inductive link prediction tasks in temporal networks

TEDIC: Neural Modeling of Behavioral Patterns in Dynamic Social Interaction Networks
Yanbang Wang, Pan Li, Chongyang Bai, Jure Leskovec
The World Wide Web Conference (WWW), 2021
[preprint]   [project page]   [KDD'20 Workshop spotlight video]  

We propose a novel framework, termed Temporal Network-diffusion Convolutional Networks (TEDIC), for generic representation learning on dynamic social interaction networks. TEDIC adopts diffusion of node attributes over a combination of the original network and its complement to capture long-hop interactive patterns embedded in the behaviors of people making or avoiding contact. It also leverages temporal convolution networks with hierarchical set-pooling operation to flexibly extract patterns from different-length interactions scattered over a long time span. TEDIC is evaluated over four social character prediction tasks: detecting deception detection, dominance, nervousness detection and community. It consistently outperforms previous SOTAs and provides interesting social insights.

Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning
Pan Li, Yanbang Wang, Jure Leskovec
Neural Information Processing Systems (NeurIPS), 2020
[paper]   [project page]   [code]  

We propose and mathematically analyze a general class of structure-related features, termed Distance Encoding (DE). DE assists GNNs in representing any set of nodes, while providing strictly more expressive power than the 1-Weisfeiler-Lehman test. We also prove that DE can distinguish node sets embedded in almost all regular graphs where traditional GNNs always fail. DE is the current SOTA on link prediction and triplet prediction tasks over static graphs.

Using Detailed Access Trajectories for Learning Behavior Analysis
Yanbang Wang, Nancy Law, Erik Hemberg, Una-May O'Reily
International Conference on Learning Analytics and Knowledge (LAK), 2019
[paper]   [project page]  

We present a new organization of MOOC learner activity data at a resolution that is in between the fine granularity of the clickstream and coarse organizations that count activities, aggregate students or use long duration time units. A detailed access trajectory (DAT) consists of binary values and is two dimensional with one axis that is a time series, and the other that is a chronologically ordered list of a MOOC component type's instances, videos in instructional order, for example. Four empirical mini-studies suggest that DATs contain rich information about students' learning behaviors and facilitate MOOC learning analyses.

Transfer Learning using Representation Learning in Massive Open Online Courses
Mucong Ding, Yanbang Wang, Erik Hemberg, Una-May O'Reily
International Conference on Learning Analytics and Knowledge (LAK), 2019
[paper]   [project page]  

We present an automated transductive transfer learning approach that addresses students’ dropping out behaviors in Massive Open Online Courses. It consists of two alternative transfer methods based on representation learning with auto-encoders: a passive approach using transductive PCA and an active approach that uses a correlation alignment loss term. Experiments show improved model transferability and suggest that the methods are capable of automatically learning feature representations that expresses common predictive characteristics of MOOCs.

Revisit Graph Neural Networks and Distance Encoding in a Practical View
Haoteng Yin, Yanbang Wang, Pan Li
AAAI Deep Learning on Graph Workshop, 2021
[paper]   [code]  

A recently proposed technique distance encoding (DE) (Li et al. 2020) magically enhances GNN’s power in many applications. The theory in (Li et al. 2020) supports DE by proving that DE improves the representation power of GNNs. However, it is not obvious how the theory assists the applications accordingly. Here, we revisit GNNs and DE from a more practical point of view to explain how DE makes GNNs fit for node classification and link prediction. We focus on node classification scenarios and categorize the node labels into two types, community type and structure type, and then analyze different mechanisms that GNNs adopt to predict these two types of labels. We also run extensive experiments to compare eight different configurations of GNNs paired with DE to predict node labels over eight real-world graphs. The results demonstrate the uniform effectiveness of DE to predict structure-type labels. We also reach three pieces of conclusions on how to use GNNs and DE properly in tasks of node classification.

EmotionCues: Emotion-Oriented Visual Summarization of Classroom Videos
Haipeng Zeng; Xinhuan Shu; Yanbang Wang; Yong Wang; Ting-Chuen Pong; Huamin Qu
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2019, and jointly in
IEEE Visualization Conference (VIS) , 2020
[paper]   [project page]  

We present EmotionCues, a video analytics system for analyzing classroom videos via emotion summary and detailed positioning recording, which integrates multiple vision algorithms with visualizations. It consists of three views: a summary view depicting the overall emotions and their dynamic evolution, a character view presenting the detailed emotion status of an individual, and a video view enhancing the video analysis with further details. Considering the possible inaccuracy of emotion recognition, we also explore several factors affecting the emotion analysis, such as face size and occlusion. They provide hints for inferring the possible inaccuracy and the corresponding reasons.

Honors and Awards
  • HKUST Academic Achievement Medal (2019)
  • All-semester Dean’s List (2015-2019)
  • HKMA IT Management Club Scholarship (2019)
  • University’s Scholarship for Undergraduate, top-tier for twice (2017, 2018)
  • Hong Kong SAR Government Scholarship, Reaching Out Award (2018)
  • Overseas Learning Experience Scholarship (2018)

This guy makes a nice webpage.