eric schkufza

I am a member of the VMware research group (link). I graduated from stanford
university with a PhD in computer science in 2015 (thesis). My advisor was professor alex
aiken (genealogy). I am interested
in applying applying the tools of large-scale data analysis and machine
learning to the design of optimizing compilers. A list of my recent
publications is below. Click here to download my cv (pdf).

Stratified Synthesis: Automatically Learning the x86-64 Instruction Set (PLDI 2016)

The x86-64 ISA sits at the bottom of the software stack of most desktop
and server software. Because of its importance, many software analysis
and verification tools depend, either explicitly or implicitly, on
correct modeling of the semantics of x86-64 instructions. However, formal
semantics for the x86-64 ISA are difficult to obtain and often written
manually through great effort. We describe a mechanically synthesized
formal semantics for a large fraction of the x86-64 Haswell ISA’s many
thousands of instruction variants. The key to our results is stratified
synthesis, where we use a set of instructions whose semantics are known
to synthesize the semantics of additional instructions whose semantics
are unknown. As the set of formally described instructions increases, the
synthesis vocabulary expands, making it possible to synthesize the
semantics of increasingly complex instructions.
We describe an automatically synthesized formal semantics for 1,620
instruction variants of the x86-64 Haswell ISA. We evaluate the learned
semantics against manually written semantics (where available) and find
that they are formally equivalent with the exception of 58
instructions, where the manually written semantics contain an error. We
further find the learned formulas to be largely as precise as manually
written ones and of similar size.
(pdf)
(bib)

Conditionally Correct Superoptimization (OOPSLA 2015)

The aggressive optimization of heavily used kernels is an important
problem in high-performance computing. However, both general purpose
compilers and highly specialized tools such as superoptimizers often do
not have sufficient static knowledge of restrictions on program inputs
that could be exploited to produce the very best code. For many
applications, the best possible code is conditionally correct: the
optimized kernel is equal to the code that it replaces only under certain
preconditions on the kernel’s inputs. The main technical challenge in
producing conditionally correct optimizations is in obtaining non-trivial
and useful conditions and proving conditional equivalence formally in the
presence of loops. We combine abstract interpretation, decision
procedures, and testing to yield a verification strategy that can address
both of these problems. This approach yields a superoptimizer for x86
that in our experiments produces binaries that are often multiple times
faster than those produced by production compilers.
(pdf)
(bib)

Stochastic Optimization of Floating-Point Programs with Tunable Precision (PLDI 2014)

The aggressive optimization of floating-point computations is an
important problem in high-performance computing. Unfortunately,
floating-point instruction sets have complicated semantics that often
force compilers to preserve semantics as written. We present a method
that treats floating-point optimization as a stochastic search problem.
We demonstrate the ability to generate reduced precision
implementations of Intel's handwritten C numeric library which are up
to 6 times faster than the original code, and achieve end-to-end
speedups of over 30% on a direct numeric simulation and a ray tracer by
optimizing kernels that can tolerate a loss of precision while still
remaining correct. Because these optimizations are mostly not amenable
to formal verification using the current state of the art, we present a
stochastic search technique for characterizing maximum error. The
technique comes with an asymptotic guarantee and provides strong
evidence of correctness.
(pdf)
(bib)
(slides)

Data-Driven Equivalence Checking (OOPSLA 2013)

We present a data driven algorithm for checking the equivalence of two
loops: the algorithm infers simulation relations using data from test
runs. Once a candidate simulation relation has been obtained, an
off-the-shelf SMT solver is used to check whether it actually holds.
The algorithm is sound: insufficient data will cause the proof to fail.
We demonstrate a prototype implementation of our algorithm, called
DDEC, which is the first sound equivalence checker for loops written in
x86.
(pdf)
(bib)

Stochastic Superoptimization (ASPLOS 2013)

We formulate the loop-free binary superoptimization task as a
stochastic search problem. The competing constraints of transformation
correctness and performance improvement are encoded as terms in a cost
function, and a Markov Chain Monte Carlo sampler is used to rapidly
explore the space of all possible programs to find one that is an
optimization of a given target program. Although our method sacrifices
completeness, the scope of programs we are able to consider, and the
resulting quality of the programs that we produce, far exceed those of
existing superoptimizers. Beginning from binaries compiled by llvm -O0
for 64-bit x86, our prototype implementation, STOKE, is able to produce
programs which either match or outperform the code produced by gcc -O3,
icc -O3, and in some cases, expert handwritten assembly.
(pdf)
(bib)
(slides)
(errata)

Interactive Furniture Layout Using Interior Design Guidelines (SIGGRAPH 2011)

We present an interactive furniture layout system that assists users by
suggesting furniture arrangements that are based on interior design
guidelines. Our system incorporates the layout guidelines as terms in
a density function and generates layout suggestions by rapidly sampling
the density function using a hardware-accelerated Monte Carlo sampler.
Our results demonstrate that the suggestion generation functionality
measurably increases the quality of furniture arrangements produced by
participants with no prior training in interior design.
(pdf)
(bib)

Computer Generated Residential Building Layouts (SIGGRAPH Asia 2010)

We present a method for automated generation of building layouts for
computer graphics applications. Our approach is motivated by the layout
design process developed in architecture. Given a set of high-level
requirements, an architectural program is synthesized using a Bayesian
network trained on real-world data. The architectural program is
realized in a set of floor plans, obtained through stochastic
optimization. The floor plans are used to construct a complete
three-dimensional building with internal structure. We demonstrate a
variety of computer-generated buildings produced by the presented
approach.
(pdf)
(bib)

Programming the Memory Hierarchy Revisited: Supporting Irregular Parallelism in Sequoia (PPOPP 2011)

We describe two novel constructs for programming parallel machines with
multi-level memory hierarchies: call-up, which allows a child task to
invoke computation on its parent, and spawn, which spawns a dynamically
determined number of parallel children until some termination condition
in the parent is met. Together we show that these constructs allow
applications with irregular parallelism to be programmed in a
straightforward manner, and furthermore these constructs complement and
can be combined with constructs for expressing regular parallelism. We
have implemented spawn and call-up in Sequoia and we present an
experimental evaluation on a number of irregular applications.
(pdf)
(bib)