I am a Computer Science Ph.D. candidate at Stanford University, advised by Professor Christos Kozyrakis. I have broad interests in computer systems and architecture. The way how hardware, software, and data interact with each other appeals to me the most. My recent research focuses on efficient management of emerging new workloads in heterogeneous datacenters. I am a member of the MAST research group and the Platform Lab at Stanford.
I earned my M.S. in Computer Science at Stanford University in 2019. Prior to joining Stanford, I received my Bachelor of Science from School of EECS at Peking University in 2017. I graduated summa cum laude in Computer Science and Technology. I was a member of Center for Energy-efficient Computing and Applications (CECA) .
I am currently working on DBOS, where we propose a radically new cluster-OS design based on data-centric architecture: all operating system state should be represented uniformly as database tables, and operations on this state should be made via queries from otherwise stateless tasks. This design makes it easy to scale and evolve the OS without whole-system refactoring.
I have built INFaaS: an INFerence-as-a-Service platform that makes ML inference accessible and easy-to-use by abstracting resource management and model selection. By leveraging heterogeneous compute resources and efficient sharing, INFaaS achieves high throughput and low SLO violations while minimizing cost.
[HotOS'19] [arXiv] [USENIX ATC'21 (to appear)]
Thanks to the First-Year Rotation Program, I was honored to have worked with two amazing groups during 2017-2018.
Winter Rotation (Advisor: Prof. John Ousterhout )
Arachne: Towards Core-Aware Scheduling. We are building a low-latency user-level thread library.
Memcached-A: We used Arachne to restructure Memcached that reduces performance interference and provides finer-grain load-balancing; achieved lower tail latency and higher SLO-compliant throughput.
[code] [benchmark code]
Autumn Rotation (Advisor: Prof. Christos Kozyrakis )
Thousand Island Scanner: Scaling Video Analysis on AWS Lambda. Presented a scalable video analysis framework that uses AWS Lambda to efficiently meet computational needs while minimizing unused resources by quickly scaling up and down.
Hosts: Ramesh Illikkal, Bin Li; Collaborators: Pietro Mercati, Charlie Tai, and Michael Kishinevsky.
Designed and implemented an SLA-aware framework for microservices that leverages multi-objective Bayesian Optimization to allocate resources and meet performance/cost goals.
Advisor: Prof. Yun Liang
Explored the sparsity in deep neural networks, including the neural network design and the acceleration on embedded platforms.
*My senior thesis was based on this work.
Advisor: Prof. Todd C. Mowry ; Collaborators: Vivek Seshadri , Joy Arulraj , and Andy Pavlo
Proposed novel partition algorithms, query execution strategies, and a new SIMD approach to using the mechanism.
Advisor: Prof. Xuanzhe Liu
Investigated the country difference of emoji usage by mining a large-scale production dataset.
INFaaS: Automated Model-less Inference Serving
Francisco Romero*, Qian Li*, Neeraja J. Yadwadkar, Christos Kozyrakis
to appear in 2021 USENIX Annual Technical Conference (ATC'21). (* indicate equal contribution)
From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers
Sadjad Fouladi, Francisco Romero, Dan Iter, Qian Li, Shuvo Chatterjee, Christos Kozyrakis, Matei Zaharia, and Keith Winstein
In Proceedings of the 2019 USENIX Annual Technical Conference (ATC ’19), Renton, WA, USA, July 2019.
Arachne: Core-Aware Thread Management
Henry Qin, Qian Li, Jacqueline Speiser, Peter Kraft, John Ousterhout
In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’18), Carlsbad, CA, USA, October 2018.
Enabling High Performance Deep Learning Networks on Embedded Systems
Qian Li, Qingcheng Xiao, Yun Liang
The 43rd Annual Conference of the IEEE Industrial Electronics Society (IECON ’17), Beijing, China, November 2017. (invited paper)
Learning from the Ubiquitous Language: An Empirical Analysis of Emoji Usage of Smartphone Users
Xuan Lu, Wei Ai, Xuanzhe Liu, Qian Li, Ning Wang, Gang Huang, Qiaozhu Mei
In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp '16), Heidelberg, Germany, September 2016.
Interference-Aware Scheduling for Inference Serving
Daniel Mendoza, Francisco Romero, Qian Li, Neeraja J. Yadwadkar, Christos Kozyrakis
In Proceedings of the 1st Workshop on Machine Learning and Systems (EuroMLSys '21), Virtually in Edinburgh, Scotland, UK, April 2021.
A Case for Managed and Model-less Inference Serving
Neeraja J. Yadwadkar, Francisco Romero, Qian Li, Christos Kozyrakis
In Proceedings of the 17th Workshop on Hot Topics in Operating Systems (HotOS '19), Bertinoro, Italy, May 2019.
[the morning paper blog]
RAMBO: Resource Allocation for Microservices Using Bayesian Optimization
Qian Li, Bin Li, Pietro Mercati, Ramesh Illikkal, Charlie Tai, Michael Kishinevsky, Christos Kozyrakis
IEEE Computer Architecture Letters (CAL), Volume: 20, Issue: 1, Jan. - June 1 2021.
Outsourcing Everyday Jobs to Thousands of Cloud Functions with gg
Sadjad Fouladi, Francisco Romero, Dan Iter, Qian Li, Alex Ozdemir, Shuvo Chatterjee, Matei Zaharia, Christos Kozyrakis, Keith Winstein
USENIX ;login:, Volume 44, Article No. 3, Fall 2019.
Exploiting Sparsity to Accelerate Fully Connected Layers of CNN-based Applications on Mobile SoCs
Xinfeng Xie, Dayou Du, Qian Li, Yun Liang, Wai Teng Tang, Zhong Liang Ong, Mian Lu, Huynh Phung Huynh, Siow Mong Rick
ACM Transactions on Embedded Computing Systems (TECS), Volume 17 Issue 2, Article No. 37, December 2017.
DBOS: Data-Centric Operating System
INFaaS: A Model-less Inference Serving System