Keith Winstein’s homepage

Keith Winstein
Associate Professor of Computer Science
Associate Professor of Electrical Engineering (by courtesy)

Stanford University
Gates Computer Science, room 410
353 Jane Stanford Way
Stanford, CA 94305-9025

N 37.42994°, W 122.17342°
third-person pronouns: he, him, etc.

Bio: Keith Winstein is an associate professor of computer science and, by courtesy, of electrical engineering at Stanford University. His research group creates new kinds of networked systems by rethinking abstractions around communication, compression, and computing. Some of his group's research has found broader use, including the Mosh tool, the Puffer video-streaming site, the Lepton compression tool, the Mahimahi network emulators, and the gg lambda-computing framework. He has received the SIGCOMM Rising Star Award, the Sloan Research Fellowship, the NSF CAREER Award, the Usenix NSDI Community Award (2024, 2020, 2017), the Usenix NSDI Outstanding Paper Award, the Usenix ATC Best Paper Award, the Applied Networking Research Prize, the SIGCOMM Doctoral Dissertation Award, and a Sprowls award for best doctoral thesis in computer science at MIT. Winstein previously served as a staff reporter at The Wall Street Journal and worked at Ksplice, a startup company (now part of Oracle) where he was the vice president of product management and business development and also cleaned the bathroom. He did his undergraduate and graduate work at MIT.

CV: Here is my CV.

Students

Current

Doctoral
- Yuhan Deng
- Akshay Srivatsan (with Dawson Engler)
- Colin Drewes (with Caroline Trippel)
- Gina Yuan (with Matei Zaharia and David Mazières)
Master's and undergraduate

Former

Doctoral
- Francis Yan (2020, with Philip Levis), Microsoft Research & Azure for Operators
- Colleen Josephson (2020, with Sachin Katti), assistant professor, UC Santa Cruz
- Luke Hsiao (2021, with Philip Levis), Numbers Station
- Riad S. Wahby (2021, with Dan Boneh), assistant professor, Carnegie Mellon University
- Sadjad Fouladi (2021), Microsoft Research
- John Emmons (with Silvio Savarese), Tesla
- Greg Hill, Bot Lab
Master's, undergraduate, and HS
- Angela Montemayor (now PhD. student, U. Washington)
- Pratyush Muthukumar
- Sneha Shah
- Neha Kunjal
- Emily Marx (now Berkeley Ph.D. student)
- Dan Iter (now ~~Stanford Ph.D. student~~ Microsoft Research)
- Jestin Ma (now IXL Learning)
- Henri Stern (now Privy)
- Emre Orbay (now Nvidia)
- Brennan Shacklett (now Stanford Ph.D. student)
- William Zeng (now Google)
- Katie Wu (now Scale AI)
- Charles Sheelam
- Kevin Song

Teaching

In the Winter 2023 term, I taught a section of Stanford's new first-year class on "Citizenship in the 21st Century." I have also taught CS 144: Introduction to Computer Networking, CS 181/181W: Computers, Ethics, and Public Policy, a first-year seminar (CS 81N: Hackers and Heroes), a graduate networking seminar (CS 344G: Network Application Studio), CS 244: Advanced Topics in Networking, and CS 349T / EE 192T: Video and Audio Technology for Live Theater in the Age of COVID.

Writing

“What advice would you give to a young woman in high school considering computer science?”
Advice on the CS faculty interview, February 2017. Things other people told me that proved helpful.
Stock advice for undergraduates interested in doing a Ph.D. in computer science
The N=2 Interview about Ph.D.s in Computer Science, August 2015. Discussion with Phil Guo (UC San Diego) and Eugene Wu (Columbia).
Introducing the ‘right to eavesdrop on your things’, Politico The Agenda, June 29, 2015. Proposing that owners of electronic devices be able to listen in on what their own things are saying about them. (Later an academic paper.)
How does an investigative reporter get started?, Oct. 16, 2011.
What is the difference between Bayesian and frequentist statistics?, June 13, 2010.
...

Research

April 2024:

Gina Yuan, Matthew Sotoudeh, David K. Zhang, Michael Welzl, David Mazières, and KW, Sidekick: In-Network Assistance for Secure End-to-End Transport Protocols, NSDI 2024 (won Community Award and Outstanding Paper Award).
The challenge: old transport protocols (like TCP) benefit from performance-enhancing proxies (PEPs) that transparently split TCP connections in the network. But these proxies have prevented TCP from adding new features. To prevent this, new transport protocols (like QUIC) are fully encrypted—no proxy can transparently meddle with them. This guarantees evolvability at the cost of performance.
Gina's paper shows a way to get the best of both worlds:

It may be possible to create a “universal PEP” that works with arbitrary, pre-existing transport protocols (even fully encrypted ones). The PEP assists endpoints over an adjacent “sidekick” connection.
The core technical challenge: how can the PEP efficiently refer to ranges of packets of a fully encrypted connection. Some linear algebra makes this possible, compromising between size on the wire, computation for the proxy and endpoint, and ability to recover from loss.
Once endpoints are getting information from the PEP, there are some fun questions about how that should influence their behavior. We created a "path-aware" congestion-control scheme whose response to packet loss considers where loss occurred along the path. When assisted by a universal PEP, it's possible for an endpoint emulate the performance of "split" TCP pretty closely.

November 2022:

Yuhan Deng, Angela Montemayor, Amit Levy, and KW, Computation-Centric Networking, HotNets 2022.
Yuhan is leading a project to create a new kind of OS and an environment for computation that we call "computation-centric networking." The core principles are a separation between I/O and compute (with delineated nondeterminism), a common notion of correctness, and fine-grained visibility into application dataflow. This suggests an end-to-end argument for serverless computing, shifting the service model from “renting CPUs by the second” to “providing the unambiguously correct result of a computation.” Accountability to these higher-level abstractions could permit agility and innovation on other axes.

November 2022:

Gina Yuan, David K. Zhang, Matthew Sotoudeh, Michael Welzl, and KW, Sidecar: In-Network Performance Enhancements in the Age of Paranoid Transport Protocols, HotNets 2022
Is it possible to get the anti-ossification benefits of a fully encrypted transport protocol (like QUIC), and without changing anything about the protocol's wire format, also get the benefits of in-network acceleration? Probably!

August 2022:

Sadjad Fouladi, Brennan Shacklett, Fait Poms, Arjun Arora, Alex Ozdemir, Deepti Raghavan, Pat Hanrahan, Kayvon Fatahalian, and KW, R2E2: Low-latency Path Tracing of Terabyte-scale Scenes Using Thousands of Cloud CPUs, SIGGRAPH 2022.
Sadjad, Brennan, and co-authors created a gigantic ray-tracing engine using swarms of thousands of cloud CPUs fired up in a few seconds, in order to render truly gigantic film-scale scenes (with a terabyte of geometry and texture data) interactively on demand.

January 2022:

Luke Hsiao, Brooke Krajancich, Philip Levis, Gordon Wetzstein, and KW, Towards Retina-Quality VR Video Streaming: 15 ms Could Save You 80% of Your Bandwidth, Computer Communication Review 52, 1 (January 2022). (Stanford press release here.)
Luke and Brooke built a "gaze-contingent" foveated UHD video compression/streaming system with a total latency (from eye motion to responsive photons) of about 14 ms. This includes the latencies from eye-tracking (~1.5 ms), video encoding (~3 ms), video decoding (~2 ms), HDMI video scanout (~3.5 ms), and physical transition of the LCD (~4 ms), but doesn't include network latency. Luke then ran a user study (under strict COVID protocols) to evaluate the available benefit, in terms of compressed video bitrate, if the server only sends high-quality video where the user's gaze is pointing and low-quality video elsewhere.
The bottom line is that if you can get the latencies down to this level, you can achieve a roughly 5× reduction in bitrate over the current state-of-the-art before the foveated compression becomes noticable. But the user-study results suggest there is an unfortunate "cliff"—at the eye-motion-to-photon latency of current VR HMDs (~80 ms), or even a hypothetical improved HMD with 45 ms, the available bitrate savings were much more limited before users become annoyed by the sharp region noticably snapping into place. Only at the lowest achievable latency (14 ms) did we observe a substantial savings. The shape of the curve between 14 ms and 45 ms, and what the highest tolerable latency is that still permits a substantial bitrate savings from foveated compression, will have to wait for future studies.

June 2021:

Colleen Josephson, Manikanta Kotaru, KW, Sachin Katti and Ranveer Chandra, Low-cost In-ground Soil Moisture Sensing with Radar Backscatter Tags, ACM SIGCAS Conference on Computing and Sustainable Societies (COMPASS ‘21).
Colleen created a new way of measuring soil moisture (important for agriculture, especially re: irrigation) efficiently, by putting the expensive part in a handheld reader that measures the time-of-flight to “dumb” backscatter tags embedded in the soil. The work involved a lot of building electronics, digging up different kinds of soils across the Bay Area and Central Valley to test with, and persuading Stanford's environmental scientists that her technique wasn't crazy. The predicted-vs.-actual link budget turned out really close—check out Figure 6!

March 2021:

Working with my student Sadjad Fouladi, and our collaborators Michael Rau (Theater and Performance Studies), Tsachy Weissman (EE), and Dustin Schroeder (Geophysics), we operationalized Sadjad's earlier “Salsify” research into "Stagecast," an effort to produce the best possible videoconferencing system for live theater and music during the pandemic.
In the fall of 2020, we co-taught a class for undergraduates and master's students: CS 349T / EE 192T: Video and Audio Technology for Live Theater in the Age of COVID where our students developed out the system into a usable tool. Then in the winter of 2021, we rehearsed and eventually held three live performances in March 2021, including four new plays performed by a cast of five student actors, plus three musical numbers performed by members of the Wet Ink Ensemble from New York City. The performances were enabled by a backstage crew of seven student technicians who used our software to adjust camera focus/framing/brightness, audio levels, and shot selection in real time, advised by the instructional staff from the TAPS department.
One innovation in the Stagecast system is a new media transport protocol that allows multiple reconstructions of the same incoming stream of packets. Each musician sends out their audio (ideally) once over a reliable transport protocol. The priority of original vs. retransmitted packets is sensitive to the low-latency needs of the stream. At the receiver side, the stream is reconstructed multiple times with different latency thresholds: once to produce a low-latency (but low-tolerance-for-jitter) feed, and again 100 ms later to produce a higher-latency, higher-quality version of the same incoming packets. The first feed goes to the other musicians; the second feed is used by the backstage mixers and eventually for the audience.

November 2020:

Emily Marx, Francis Y. Yan, and KW, Implementing BOLA-BASIC on Puffer: Lessons for the use of SSIM in ABR logic.
As part of her master's work, Emily implemented the BOLA-BASIC ABR scheme on our Puffer site and evaluated it for 17.7 “stream-years” (about three real-world months) in a randomized trial with other ABR schemes. She found some curious behavior emerging from BOLA's objective function and worked with the BOLA authors to tune the algorithm better. Emily's results suggest that the tuning constants inside these ABR schemes may often drive their real-world performance, more than the smarts or sophistication of the algorithms themselves.

August 2020:

Tong Li, Kai Zheng, Ke Xu, Rahul Arvind Jadhav, Tao Xiong, KW, and Kun Tan, TACK: Improving Wireless Transport Performance by Taming Acknowledgments, SIGCOMM 2020.
Tong Li and I met and had dinner together when I visited Huawei in 2018, and we talked about Tong's efforts with his Huawei and Tsinghua colleagues to design a new acknowledgment scheme for TCP. Tong ended up developing TACK in part by iterating its design on our Pantheon of Congestion Control, a testbed that included nodes with cellular connectivity around the world and, until 2020, ran automated bake-offs every week and published the resulting packet traces. Tong deployed 16 different versions of the TACK scheme on the Pantheon as the system collected many gigabytes of packet traces throughout 2018 and 2019. The TACK scheme is now deployed for real on Huawei's smart TVs and Mate 20 smartphones.

July 2020:

Zhixiong Niu, Hong Xu, Peng Cheng, Qiang Su, Yongqiang Xiong, Tao Wang, Dongsu Han, and KW, NetKernel: Making Network Stack Part of the Virtualized Infrastructure, USENIX ATC, July 2020.

Earlier: Zhixiong Niu, Hong Xu, Dongsu Han, Peng Cheng, Yongqiang Xiong, Guo Chen, and KW, Network Stack as a Service in the Cloud, ACM HotNets, Palo Alto, Calif., November 2017.
What if VMs interacted with the outside world not via a virtual NIC, but through stream sockets, with the TCP implementation provided by the host?

February 2020:

Francis Y. Yan, Hudson Ayers, Chenzhi Zhu, Sadjad Fouladi, James Hong, Keyi Zhang, Philip Levis, and KW, Learning in situ: a randomized experiment in video streaming, Usenix NSDI 2020, Santa Clara, Calif., 2020 (won Community Award and Applied Networking Research Prize).
Francis and our co-authors set up a website that streamed 38+ years of video to 63,500+ users over the course of a year; we found that variability, and therefore statistical uncertainty, is bigger than had been documented (calling into question the reliability of some past research in this area), and that "dumb" ABR schemes may be able to outperform some more-sophisticated schemes. We describe a way of learning in place (in situ) that appears to alleviate, at least partly, some of the pitfalls that have afflicted other ML approaches. The website is still live, and we welcome contributions from researchers who would like to evaluate or develop new algorithms for ABR selection, throughput prediction, or congestion control. We also welcome contributions from viewers who would like to contribute to the experiments by watching TV.

October 2019:

John Emmons, Sadjad Fouladi, Ganesh Ananthanarayanan, Shivaram Venkataraman, Silvio Savarese, and KW, Cracking open the DNN black-box: Video Analytics with DNNs across the Camera-Cloud Boundary, Workshop on Hot Topics in Video Analytics and Intelligent Edges (HotEdgeVideo 2019), Los Cabos, Mexico.
With our collaborators at Microsoft, we explored outsourcing computer-vision computations given three constraints: limited compute power (or actual power!) on a device, limited communications capacity off the device, and limited tolerance for end-to-end loss (e.g., classification mistakes). To me, one of the biggest contributions our paper made may have been in the axes: the idea that (given equal end-to-end accuracy, or at least an equal tolerance for loss) there is a tradeoff between the need for local compute (on the x-axis, above) and communication (y-axis), and that schemes should be evaluated in this way and viewed as part of a tradeoff space.

July 2019:

Sadjad Fouladi, Francisco Romero, Dan Iter, Qian Li, Shuvo Chatterjee, Christos Kozyrakis, Matei Zaharia, and KW, From Laptop to Lambda: Outsourcing Everyday Jobs to Thousands of Transient Functional Containers, USENIX ATC, Renton, Wash., 2019.
Framework to "functionalize" everyday tasks (e.g., software compilation, unit tests, video encoding, object recognition, ...) and, with their data-dependencies well-described, offload them efficiently to quick bursts of 5,000+ lambda functions executing in parallel. Imagine expressing workflows in, auto-converting existing workflows into, a sort of Parallel-Haskell-like representation. This paper tries to bust the myths that S3 is slow for small files (it depends on the client implementation), lambdas are slow to start en masse (it depends on the client implementation) and that AWS Lambda forbids direct network communication between workers (NAT-traversal techniques can connect them). More practical info is in the followup paper in Usenix ;login: magazine.

November 2018:

Kalev Alpernas, Cormac Flanagan, Sadjad Fouladi, Leonid Ryzhyk, Mooly Sagiv, Thomas Schmitz, and KW, Secure serverless computing using dynamic information flow control, Proc. ACM Program. Lang. 2, OOPSLA, Article 118 (November 2018).
How do you reason about information flow in a complex application of interconnected “serverless” functions? One answer is that if the application expresses its jobs in a functional content-addressed way (as in our gg system), enforcing and reasoning about information flow control becomes almost trivial.

July 2018:

Francis Y. Yan, Jestin Ma, Greg Hill, Deepti Raghavan, Riad S. Wahby, Philip Levis, and KW, Pantheon: the training ground for Internet congestion-control research, USENIX ATC ’18, Boston, Mass., July 2018 (won Best Paper award).

The Pantheon is a community evaluation platform for academic research on congestion control. It includes a curated collection of working implementations of congestion-control schemes, a testbed of measurement nodes on wired and cellular networks, a collection of network emulators (each calibrated to match the performance of a real network path or to capture some form of pathological network behavior), and a continuous-testing system that evaluates the Pantheon protocols over the real Internet between pairs of testbed nodes and publicly archives the resulting packet traces and analyses. Pantheon evaluations have assisted congestion-control research that appeared at NSDI 2018 (Copa and Vivace), ICML 2019 (Aurora), and SIGCOMM 2020 (TACK).

April 2018:

Sadjad Fouladi, John Emmons, Emre Orbay, Catherine Wu, Riad S. Wahby, and KW, Salsify: low-latency network video through tighter integration between a video codec and a transport protocol, in USENIX NSDI ’18, Renton, Wash., April 2018 (won Applied Networking Research Prize).

Salsify is a new design for real-time Internet video that jointly controls a video codec and a network transport protocol. Current systems (Skype, Facetime, WebRTC) run these components independently, which produces more glitches and stalls when the network is unpredictable. In testing, Salsify consistently outperformed today’s real-time video systems in both quality and delay.

November 2017:

Dmitry Kogan, Henri Stern, Ashley Tolbert, David Mazières, and KW, The Case For Secure Delegation, ACM HotNets, Palo Alto, Calif., November 2017.

Dima and Henri developed and released an open-source tool, called Guardian Agent, that performs secure ssh-agent forwarding for SSH and Mosh in a backwards-compatible way.

November 2017:

Michael Schapira and KW, Congestion-Control Throwdown, ACM HotNets, Palo Alto, Calif., November 2017.
Michael (as Hamilton) and I (Burr) were stuck on an airplane together and found ourselves at loggerheads about Internet congestion control. We put our disagreement to good use by collaborating on a throwdown-in-the-form-of-a-paper (and later an actual throwdown, at HotNets 2017).

June 2017:

Judson Wilson, Riad S. Wahby, Henry Corrigan-Gibbs, Dan Boneh, Philip Levis, and KW, Trust but Verify: Auditing the Secure Internet of Things, MobiSys 2017.

A way of using TLS that can allow the owners of IoT devices to learn what their own devices are saying about them to the cloud, without compromising the integrity of encrypted communications.

March 2017:

Sadjad Fouladi, Riad S. Wahby, Brennan Shacklett, Karthikeyan Vasuki Balasubramaniam, William Zeng, Rahul Bhalerao, Anirudh Sivaraman, George Porter, and KW, with demo by John Emmons, Encoding, Fast and Slow: Low-Latency Video Processing using Thousands of Tiny Threads, NSDI 2017, Boston, Mass., March 2017.
We think ExCamera started the movement to (mis-)use cloud-functions services for massively “burst-parallel” data processing. The system achieves low-latency video processing by combining a purely functional implementation of a VP8 codec (to allow parallelization at granularities smaller than the interval between key frames) with a framework that starts thousands of tiny jobs on AWS Lambda at once, each processing a small segment of the video.

March 2017:

Daniel Reiter Horn, Ken Elkabany, Chris Lesniewski-Laas, and KW, The Design, Implementation, and Deployment of a System to Transparently Compress Hundreds of Petabytes of Image Files For a File-Storage Service, NSDI 2017, Boston, Mass., March 2017 (won Community Award).

We added transparent recompression of JPEGs to the Dropbox back-end fileservers, compressing more than 200 petabytes of user data by about 23 percent. To achieve this, we created a purely functional implementation of the JPEG DC-predicted Huffman coder and adapted the VP8 format, to be able to resume compression and decompression at the arbitrary boundaries between prespecified filesystem blocks. The system is about 9x faster, and within 1 percentage point of the compression efficiency, of the best prior work. It is available as free software.

March 2017:

In case professoring doesn't work out, I have developed a side hustle as critic of the Unicode Technical Committee's current approach to standardizing emoji. I was interviewed for The Nib's 2017 "Who Makes Emoji" comic and for episodes 1–4 of Mark Bramhill's "Welcome to Macintosh" podcast (also condensed into an episode of "99% Invisible"). I also spoke at length in a 2019 Dutch documentary.

October 2016:

In 2007, an academic cardiologist downloaded 42 medical studies from the Web site of drug giant GlaxoSmithKline, combined them in a meta-analysis, and found that Avandia, the world's best-selling diabetes drug, caused heart attacks. GSK lost about $12 billion in sales and market value. But a different way to analyze the same data—a “Bayesian” way—finds that the drug actually reduces heart attacks. Or does it?
We often hear of this conflict, between Bayesian and “frequentist” statistics. But much of the conflict is misguided. Viewed formally, on the same axes, the two schools of statistics turn out to share a tight symmetry. Criticisms of each can be transformed into a corresponding criticism of the other.
Slides from talk given at University of Chicago (January 2009), U.T. Austin (April 2011), MIT CSAIL (October 2013), Boston Children's Hospital (October 2013), Harvard Medical School (February 2014), MongoDB Inc. (October 2016). Also written version of the main section of the talk.

June 2016:

Amit Levy, James Hong, Laurynas Riliskis, Philip Levis, and KW, Beetle: Flexible Communication for Bluetooth Low Energy, MobiSys 2016, Singapore, June 2016.

Amit figured out and implemented a cool way to interpose on Bluetooth Low Energy to allow multiplexing device services to multiple applications at the same time, with fine-grained access control.

July 2015:

Ravi Netravali, Anirudh Sivaraman, Somak Das, Ameesh Goyal, KW, James Mickens, and Hari Balakrishnan, Mahimahi: Accurate Record-and-Replay for HTTP, in USENIX ATC 2015, Santa Clara, Calif., July 2015.

Mahimahi is a series of cascading network emulators, each one modeling one aspect of a network path (delay, independent per-packet loss, autocorrelated loss or intermittency, varying bottlneck link capacity with a specified queue discipline, etc. Each one opens a container and affects processes launched within that container, and the emulators can be nested arbitrarily inside each other to build up a chain of emulated effects. Mahimahi is included in Debian and Ubuntu and has been used in a number of network research studies.

August 2014:

Anirudh Sivaraman, KW, Pratiksha Thaker, and Hari Balakrishnan, An Experimental Study of the Learnability of Congestion Control, in SIGCOMM 2014, Chicago, Ill., August 2014.

Working with my colleagues Anirudh Sivaraman and Pratiksha Thaker, we used the Remy automatic protocol-design program as a tool to investigate the “learnability” of the Internet congestion-control problem: how easy is it to “learn” a network protocol to achieve desired goals, given a necessarily imperfect model of the networks where it will ultimately be deployed?

July 2014:

Anirudh Sivaraman, KW, Pauline Varley, Somak Das, Joshua Ma, Ameesh Goyal, João Batalha, and Hari Balakrishnan, Protocol Design Contests, SIGCOMM Computer Communications Review, July 2014.

We ran an in-class contest to develop a congestion-control algorithm, asking 40 students in a graduate networking class to develop protocols that would outperform Sprout. Spurred on by a live “leaderboard,” the students submitted 3,000 candidate algorithms that mapped a region of realizable throughput-vs.-delay tradeoffs. The winners became co-authors on an article describing the contest and their winning entries.

May 2014:
My doctoral dissertation: Transport Architectures for an Evolving Internet, advised by Hari Balakrishnan at the Massachusetts Institute of Technology, 2014.

November 2013:

Anirudh Sivaraman, KW, Suvinay Subramanian, and Hari Balakrishnan, No Silver Bullet: Extending SDN to the Data Plane, in HotNets 2013, College Park, Md., November 2013.

Working with my colleagues Anirudh Sivaraman and Suvinay Subramanian, we demonstrated bidirectional cyclic preference loops among three popular algorithms that control queueing and scheduling behavior within a packet-switched network. Our thesis: no such scheme can remain dominant as application objectives evolve, so routers and switches should be programmable in this respect.

August 2013:

TCP ex Machina: Computer-Generated Congestion Control, in SIGCOMM 2013, Hong Kong, China, August 2013.

Remy is a computer program that creates TCP congestion-control algorithms from first principles, given uncertain prior knowledge about the network and an objective to achieve. I used to say that these computer-generated schemes can outperform their human-generated forebears, even ones that benefit from running code inside the network—I should have been equally emphatic that it matters a lot how the designer specifies their assumption and how closely those assumptions are met by the real network. The real contribution here, I think, was in the idea that the design of a CC algorithm can be the product of reinforcement learning: a process where the emphasis is appropriately on the designer's assumptions and goals, and less-so on the mechanism. (Joint work with my advisor, Hari Balakrishnan.)

April 2013:

Sprout: Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks, in USENIX NSDI 2013, Lombard, Ill., April 2013 (won 2014 Applied Networking Research Prize).

We showed that on today's cellular networks, with some simple inferential techniques it is possible to achieve 7–9× less delay than Skype, Facetime, and Google Hangout, while achieving 2–4× the throughput of these applications at the same time. We packaged the evaluation into one VM and held a contest for 40 students to try to find a better algorithm on the same conditions. We were able to match the performance of the in-network CoDel algorithm, while operating purely end-to-end. (Joint work with my colleague Anirudh Sivaraman and Hari Balakrishnan.)

January 2013:

On the divergence of Google Flu Trends from the target U.S., French, and Japanese indexes in 2012–2013. Presentation slides (March 14, 2013), delivered at Children's Hospital Informatics Program | Interview 1 | Interview 2 | Radio interview

June 2012:

Mosh: An Interactive Remote Shell for Mobile Clients, in USENIX ATC 2012, Boston, Mass., June 2012.

We built a novel datagram protocol that synchronizes the state of abstract objects over a challenged, mobile network. We used this to implement a replacement for the venerable SSH application that tolerates intermittent connectivity and roaming, and has a predictive local user interface. The program is in wide release with hundreds of thousands of downloads. Joint work with Hari Balakrishnan (research) and with Keegan McAllister, Anders Kaseorg, Quentin Smith, and Richard Tibbetts (software).

November 2011:

End-to-End Transmission Control by Modeling Uncertainty about the Network State, in HotNets 2011, Cambridge, Mass., November 2011.

We show it is possible to produce reasonable transmission control from first principles and Bayesian inference, when contending only with nonresponsive cross traffic. The workshop paper that eventually became Remy. (Joint work with Hari Balakrishnan.)

October 2009:

False positive rate for Barnard's test for superiority at nominal 0.05 alpha (red), vs. modified test (blue) informed by prior that true value lies within shaded region.

Developed exchange algorithm to calculate the coverage probability and false-positive rate of “exact” 2x2 confidence intervals and hypothesis tests. Typically these tests (e.g., Barnard's test for superiority, the Chan test for non-inferiority) take 5-10 minutes to run at sample sizes of 1000x1000 in software like StatXAct. The exchange algorithm calculates the whole tableau (all 1001x1001 outcomes) in the same difficulty as the “hardest” single p-value or confidence interval. Much similar work had been done for one-dimensional tests and intervals, but the two-dimensional case had previously been intractable.
This technique allows us to empirically test traditional statistical rules of thumb, like the appropriateness of the chi-square test when E[ n p ] > 5, or the notion that exact tests are unnecessarily conservative. It also allows us to design new tests and intervals that minimize conservatism and ripple. The above graph shows the benefit of applying a “prior” to classical (frequentist) inference. Barnard's test for superiority controls false positives unconditionally (the red line is always below 0.05), but at a cost of conservatism in the region of p=0.35. We find that if we are able to state a region where the parameter is assumed to lie a priori, we can produce a modified hypothesis test with better performance inside that region.

May 2006:

English Text Classification by Authorship and Date (class project). The n-grams used by the U.S. Supreme Court evolve quickly enough that it's possible to build a pretty good classifier to identify the year of authorship of an opinion, based only on its four-letter-grams. Other corpora, like the titles used by high-school students in their winning entries to the Westinghouse/Intel science contest, can display some amusing long-term trends. (Joint work with Adam Belay, Mujde Pamuk, and Tucker Sylvestro.)

January 2006:

MIT OpenCourseWare taped my 8-hour Introduction to Copyright Law course, which I taught for the EECS department in MIT's Independent Activities Period of 2006.

2005–6:

We were involved in some of the first “amateur” high-definition broadcasting, which required implementing an ATSC-conforming scheduler for MPEG-2 transport streams. The project is probably most notable for producing this three-hour video of the 2006 MIT Integration Bee. Other videos of 2005–6 era MIT sports are also available.

October 2004:

Created the Library Access to Music Project, which served as MIT's open-access electronic music library 2004 to 2016. (Joint work with Josh Mandel.) Engineering a Campus-Wide Accessible Music Library (MIT master's thesis, 2005). Coverage in NYT | USA Today | Boston Globe | NPR Morning Edition | Fark.

March 2004:

Broke the encryption on the Motorola (now Indala) FlexSecur system of RFID cards in use at MIT. (Joint work with Austin Roach and Josh Mandel.)

December 2003:

Improving 802.11 Range with Forward Error Correction, CSAIL AI Memo 2005-004, February 2005. Added forward error correction to Wi-Fi, extending range by 70 percent. (Joint work with Reina Riemann.)

May 2002:

Analysis of Boston Local Television News Story Selection Bias, 1993–2002. Local news programs prefer to cover entertainment news relating to prime-time TV shows from their own national network. Shifts in the affiliation of a TV station can produce a dramatic change in the news judgment of its local news program, e.g. when WHDH-TV switched from CBS to NBC in 1995. WHDH’s news director: "Why would you want to give publicity to a competitor?"

March 2001:

qrpff DVD descrambler, written for an IAP seminar at MIT on DVD and the Digital Millennium Copyright Act, joined by DVD-CCA representative David Barr, Harvard Law School's Jonathan Zittrain, and MIT's Hal Abelson. (Joint work with Marc Horowitz.) Eventually, in 2015, the “algorithm” was sold at a charity art auction to benefit the Cooper Hewitt Smithsonian Design Museum. Coverage by CNET | IDG | Wired | The Tech | New Yorker | Wall Street Journal.

January 2000:

In 2000, I took over the job of MIT Infinite Corridor Astronomer from Ken Olum. We later captured the “MIThenge” phenomenon on video and improved the accuracy of the predictions. It turns out most models of atmospheric refraction don't work well within <0.8 degrees of the horizon. Strangely, real astronomers rarely find this to be a big problem...

August 1999:

New frontiers in optical character recognition, recognized by the prestigious Obfuscated Perl Contest.

December 1998:

Richard Bergmair, Towards Linguistic Steganography, 2004, illustrating Winstein 1998

The first automated linguistic steganography.