Yanbang Wang

I am a rising third-year Ph.D. student in Computer Science at Cornell University, extremely fortunate to be advised by Prof. Jon Kleinberg.

Prior to Cornell, I earned my M.S. in Computer Science from Stanford University. I was a member of the Stanford Social Network Analysis Group (SNAP), working with Prof. Jure Leskovec and Prof. Pan Li. I received my B.S. in Computer Science, Summa Cum Laude (top 1%), from the Hong Kong University of Science and Technology, with an additional major in Mathematics.

I visited the ALFA Group at MIT CSAIL in summer 2018, and interned at Microsoft Research, Redmond in summer 2023.

Email  /  Google Scholar  /  LinkedIn

profile photo
Research Interest

I research machine learning algorithms in large-scale social and information systems, as well as their broader societal implications.

News
  • [09/2023] Accepted at NeurIPS'23: Paper on how recommender systems influence opinion dynamics in large-scale social systems.
  • [09/2023] As the leading proposal writer, awarded Microsoft AFMR Grant for our on-going project on network science + LLM.
  • [05/2023] Starting an internship at Microsoft Research.
  • [01/2023] Our new paper on implications of hypergraph clique expansion and its learning-based reconstruction now online.
  • [01/2023] Our Graph Machine Learning chapter is now online as a part of our AI for Science blog series.
  • [03/2022] Accepted at VLDB'22: Paper on accelerating inductive representation learning in graphs.
  • [01/2021] Accepted at ICLR'21: Paper on capturing classic laws in temporal networks modeling via causal anonymous walks.
  • [01/2021] Accepted at WebConf'21: Paper on neural modeling of social behavior patterns.
  • [09/2020] Accepted at NeurIPS'21: Paper on theoretical foundations of GNN's expressiveness.

Selected Publications

From Graphs to Hypergraphs: Hypergraph Projection and its Remediation
Yanbang Wang, Jon Kleinberg
International Conference on Learning Representations (ICLR)
[preprint]

We study the implications of the modeling choice to use a graph, instead of a hypergraph, to represent real-world interconnected systems whose constituent relationships are of higher order by nature. Such a modeling choice typically involves an underlying projection process that maps the original hypergraph onto a graph, and is common in graph-based analysis. While hypergraph projection can potentially lead to loss of higher-order relations, there exists very limited studies on the consequences of doing so, as well as its remediation.


On the Relationship Between Relevance and Conflict in Online Social Link Recommendations
Yanbang Wang, Jon Kleinberg
Neural Information Processing Systems (NeurIPS)
[preprint]

Link recommendations in online social networks help users discover connections, enhancing engagement. However, the added links can also escalate conflicts and polarization. There's limited understanding of the relationship between these two outcomes. This work provides one of the first rigorous analysis of this question, using the recently popular Friedkin-Johnsen model of opinion dynamics.


Inductive Representation Learning in Temporal Networks via Causal Anonymous Walks
Yanbang Wang, Yen-Yu Chang, Yunyu Liu, Jure Leskovec, Pan Li
International Conference on Learning Representations (ICLR)
[preprint] [project page] [code]   

We propose Causal Anonymous Walks (CAWs) for inductive representation learning in temporal networks. CAWs are extracted by temporal random walks and work as automatic retrieval of temporal network motifs to represent network dynamics, while avoiding the time-consuming selection and counting of those motifs. CAWs adopt a novel anonymization strategy that replaces node identities with the hitting counts of the nodes based on a set of sampled walks to keep the method inductive, and simultaneously establish the correlation between motifs. CAWs is the SOTA method on both transductive and inductive link prediction tasks in temporal networks.

TEDIC: Neural Modeling of Behavioral Patterns in Dynamic Social Interaction Networks
Yanbang Wang, Pan Li, Chongyang Bai, Jure Leskovec
The World Wide Web Conference (WWW)
[preprint]   [project page]   [KDD'20 Workshop spotlight video]  

We propose a novel framework, termed Temporal Network-diffusion Convolutional Networks (TEDIC), for generic representation learning on dynamic social interaction networks. TEDIC adopts diffusion of node attributes over a combination of the original network and its complement to capture long-hop interactive patterns embedded in the behaviors of people making or avoiding contact. It also leverages temporal convolution networks with hierarchical set-pooling operation to flexibly extract patterns from different-length interactions scattered over a long time span. TEDIC is evaluated over four social character prediction tasks: detecting deception detection, dominance, nervousness detection and community. It consistently outperforms previous SOTAs and provides interesting social insights.

Distance Encoding: Design Provably More Powerful Neural Networks for Graph Representation Learning
Pan Li, Yanbang Wang, Jure Leskovec
Neural Information Processing Systems (NeurIPS)
[paper]   [project page]   [code]  

We propose and mathematically analyze a general class of structure-related features, termed Distance Encoding (DE). DE assists GNNs in representing any set of nodes, while providing strictly more expressive power than the 1-Weisfeiler-Lehman test. We also prove that DE can distinguish node sets embedded in almost all regular graphs where traditional GNNs always fail.

A Unified Framework for Reducing False Positives in Authentication Alerts in Security Systems
Yanbang Wang, Karl Hallgren, Jonathan Larson
The World Wide Web Conference (WWW)
[to appear] (work done during internship at Microsoft Research). 

The high false positive (FP) rate of authentication alerts remains to be a prominent challenge in cybersecurity nowadays. We identify two problems that cause this issue, which are unaddressed in existing learning-based anomaly detection methods. We address these problems by proposing a new framework based on self-supervised link prediction on dynamic authentication networks. We validate our framework on 4 months of authentication data of 125 randomly selected, real organizations that subscribe to Microsoft's defense services. Our model is intended to be integrated into Microsoft Security products.

Using Detailed Access Trajectories for Learning Behavior Analysis
Yanbang Wang, Nancy Law, Erik Hemberg, Una-May O'Reily
International Conference on Learning Analytics and Knowledge (LAK)
[paper]   [project page]  

We present a new organization of MOOC learner activity data at a resolution that is in between the fine granularity of the clickstream and coarse organizations that count activities, aggregate students or use long duration time units. A detailed access trajectory (DAT) consists of binary values and is two dimensional with one axis that is a time series, and the other that is a chronologically ordered list of a MOOC component type's instances, videos in instructional order, for example. Four empirical mini-studies suggest that DATs contain rich information about students' learning behaviors and facilitate MOOC learning analyses.

Academic Service
PC member/regular reviewer: NeurIPS, ICLR, ICML, KDD, WebConf



This guy makes a nice webpage.