|

Jongmin Baek
Staff Machine Learning Engineer, Dropbox
Ph.D., Stanford University
[first initial][last name]@[cs].[school name].[school domain suffix]
If you need to reach me on company business, try:
[first name]@[company name].[company domain suffix]
Previously I was a doctoral student at Stanford University, advised by Marc Levoy, doing various things related to computational photography. Even before that, I completed at MIT my undergraduate degrees in theoretical mathematics and in computer science (advised by Tomasz Mrowka and Scott Cyphers, respectively) and an M. Eng degree in computer science (advised by Fredo Durand, with thesis on multi-channel coded aperture).
Institution I am currently associated with:
Dropbox, Inc. (2014-)
Bodies I had been previously affiliated with:
Stanford University (2008-2013)
Department of Computer Science, Stanford University
Stanford Computer Graphics Laboratory
MIT (2004-2008)
Department of Mathematics, MIT
Department of EECS, MIT
Computer Science & Artificial Intelligence Laboratory (CSAIL)
Computer Graphics Group (under Fredo Durand)
Software Design Group (under Daniel Jackson)
Places at which I have worked before:
NVIDIA (2012, 2013)
Google (2010)
Palo Alto Research Center (2007)
The Media Lab, MIT (2007-2008)
Fujitsu-Siemens Computers (2005)
Even before:
Cupertino High School (2000-2004)
I had resided in Seoul, Korea till 1999 since birth.
| |
News
Present: I serve as the overall technical lead (TL) for Core Intelligence Area at Dropbox. I am also one of 40-ish staff+ engineers at Dropbox.
I also informally advise Fairytrail, a venture focused on video dating and travel by a good friend of mine. Check it out.
For the work I do at Dropbox, refer to my curriculum vitae linked in the left sidebar.
Here are some counting stats in my 11 years at Dropbox (as of 2025):
- # of commits authored: 4500+ and counting.
- # of unique reviewers on commits I've authored: 413 (as of 2025-02) and counting.
- # of commits reviewed: TBD, expected to roughly match commits authored.
- # of unique authors whose commits I have reviewed: TBD.
- # of programming languages in commits: 9+ (Python, Go, Rust, C/C++, C#, Objective-C, Swift, Java, TypeScript, ...)
- # of interviews conducted: 400+ and counting.
- # of hiring committees moderated: 200+ and counting.
- # of managers I've reported to: 9 and counting.
- # of patent applications filed: ????
- # of patent applications granted: 11 (via USPTO)
Here are some technical blog posts I've authored at Dropbox:
Here are some other blog posts authored by colleagues describing work to which I've contributed.
Jan 28, 2014: I am now working at Dropbox. I work in image processing, computer vision and machine learning.
Dec 5, 2013: My dissertation is approved! As of December 5, I have concluded my Ph.D. program!
Below are some cute visualizations of my progress on the dissertation. I used version control for the source LaTeX files, which allowed me to create interesting time-lapse renders.
Research
This section is somewhat dated now, but my research interest during graduate school lay in computational photography, computer vision, and graphics. In particular, I love the mathematical aspects of imaging and image processing. I was interested in the mobile and real-time application of computational photography. I have worked on image editing, especially on mobile platforms; theory and applications of high-dimensional filtering; analysis of computational cameras; high-frame-rate imaging; general reconstruction problem for images (denoising, deblurring, etc); programmable computational cameras, to name a few. Below is a list of manuscripts published:
 |
 |
Baek, J., Pająk, D., Kim, K., Pulli, K. and Levoy, M.
WYSIWYG Computational Photography via Viewfinder Editing.
Proc. ACM SIGGRAPH Asia 2013. You can find my slides (90MB) here. Also shown at Eurographics 2014 Industrial Presentations.
Abstract: Digital cameras with electronic viewfinders provide a relatively faithful depiction of the final image, providing a WYSIWYG experience. If, however, the image is created from a burst of differently captured images, or non-linear interactive edits significantly alter the final outcome, then the photographer cannot directly see the results, but instead must imagine the post-processing effects. This paper explores the notion of viewfinder editing, which makes the viewfinder more accurately reflect the final image the user intends to create. We allow the user to alter the local or global appearance (tone, color, saturation, or focus) via stroke-based input, and propagate the edits spatiotemporally. The system then delivers a real-time visualization of these modifications to the user, and drives the camera control routines to select better capture parameters.
Check out the paper video (watch in 720p):
Alternatively, try http://vimeo.com/71116200. (Follow the link and click on "HD".)
|
 |
 |
Baek, J., Adams, A. B., Dolson, J.
Lattice-Based High-Dimensional Gaussian Filtering and the Permutohedral Lattice.
Journal of Mathematical Imaging and Vision 2013. For a locally stored copy, click here.
Abstract: High-dimensional Gaussian filtering is a popular technique in image processing, geometry processing and computer graphics for smoothing data while preserving important features. For instance, the bilateral filter, cross bilateral filter and non-local means filter fall under the broad umbrella of high-dimensional Gaussian filters. Recent algorithmic advances therein have demonstrated that by relying on a sampled representation of the underlying space, one can obtain speed-ups of orders of magnitude over the naive approach. The simplest such sampled representation is a lattice, and it has been used successfully in the bilateral grid and the permutohedral lattice algorithms. In this paper, we analyze these lattice-based algorithms, developing a general theory of lattice-based high-dimensional Gaussian filtering. We consider the set of criteria for an optimal lattice for filtering, as it offers a good tradeoff of quality for computational efficiency, and evaluate the existing lattices under the criteria. In particular, we give a rigorous exposition of the properties of the permutohedral lattice and argue that it is the optimal lattice for Gaussian filtering. Lastly, we explore further uses of the permutohedral-lattice-based Gaussian filtering framework, showing that it can be easily adapted to perform mean shift filtering and yield improvement over the traditional approach based on a Cartesian grid.
|
 |
 |
Cho, S. I., Gao, S. S., Xia, A., Wang, R., Salles, F. T., Raphael P. D., Abaya, H., Wachtel, J., Baek, J., Jacobs, D. E., Rasband, M. N., Oghalai, J. S.
Mechanisms of hearing loss after blast injury to the ear.
PLoS One, 2013. Vol.8, No.7.
Abstract: Given the frequent use of improvised explosive devices (IEDs) around the world, the study of traumatic blast injuries is of increasing interest. The ear is the most common organ affected by blast injury because it is the body's most sensitive pressure transducer. We fabricated a blast chamber to re-create blast profiles similar to that of IEDs and used it to develop a reproducible mouse model to study blast-induced hearing loss. The tympanic membrane was perforated in all mice after blast exposure and found to heal spontaneously. Micro-computed tomography demonstrated no evidence for middle ear or otic capsule injuries; however, the healed tympanic membrane was thickened. Auditory brainstem response and distortion product otoacoustic emission threshold shifts were found to be correlated with blast intensity. As well, these threshold shifts were larger than those found in control mice that underwent surgical perforation of their tympanic membranes, indicating cochlear trauma. Histological studies one week and three months after the blast demonstrated no disruption or damage to the intra-cochlear membranes. However, there was loss of outer hair cells (OHCs) within the basal turn of the cochlea and decreased spiral ganglion neurons (SGNs) and afferent nerve synapses. Using our mouse model that recapitulates human IED exposure, our results identify that the mechanisms underlying blast-induced hearing loss does not include gross membranous rupture as is commonly believed. Instead, there is both OHC and SGN loss that produce auditory dysfunction. Given the frequent use of improvised explosive devices (IEDs) around the world, the study of traumatic blast injuries is of increasing interest. The ear is the most common organ affected by blast injury because it is the body's most sensitive pressure transducer. We fabricated a blast chamber to re-create blast profiles similar to that of IEDs and used it to develop a reproducible mouse model to study blast-induced hearing loss. The tympanic membrane was perforated in all mice after blast exposure and found to heal spontaneously. Micro-computed tomography demonstrated no evidence for middle ear or otic capsule injuries; however, the healed tympanic membrane was thickened. Auditory brainstem response and distortion product otoacoustic emission threshold shifts were found to be correlated with blast intensity. As well, these threshold shifts were larger than those found in control mice that underwent surgical perforation of their tympanic membranes, indicating cochlear trauma. Histological studies one week and three months after the blast demonstrated no disruption or damage to the intra-cochlear membranes. However, there was loss of outer hair cells (OHCs) within the basal turn of the cochlea and decreased spiral ganglion neurons (SGNs) and afferent nerve synapses. Using our mouse model that recapitulates human IED exposure, our results identify that the mechanisms underlying blast-induced hearing loss does not include gross membranous rupture as is commonly believed. Instead, there is both OHC and SGN loss that produce auditory dysfunction.
|
 |
 |
Jacobs, D. E., Baek, J., Levoy, M.
Focal Stack Compositing for Depth of Field Control.
Stanford Computer Science Tech Report CSTR 2012-01.
Abstract: Many cameras provide insufficient control over depth of field. Some have a fixed aperture; others have a variable aperture that is either too small or too large to produce the desired amount of blur. To overcome this limitation, one can capture a focal stack, which is a collection of images each focused at a different depth, then combine these slices to form a single composite that exhibits the desired depth of field. In this paper, we present a theory of focal stack compositing, and algorithms for computing images with extended depth of field, shallower depth of field than the lens aperture naturally provides, or even freeform (non-physical) depth of field. We show that while these composites are subject to halo artifacts, there is a principled methodology for avoiding these artifacts---by feathering a slice selection map according to certain rules before computing the composite image.
|
 |
 |
Karpenko, A., Jacobs, D. E., Baek, J. and Levoy, M.
Digital Video Stabilization and Rolling Shutter Correction using Gyroscopes.
Stanford Computer Science Tech Report CSTR 2011-03.
Abstract: In this paper we present a robust, real-time video stabilization and rolling shutter correction technique based on commodity gyroscopes. First, we develop a unified algorithm for modeling camera motion and rolling shutter warping. We then present a novel framework for automatically calibrating the gyroscope and camera outputs from a single video capture. This calibration allows us to use only gyroscope data to effectively correct rolling shutter warping and to stabilize the video. Using our algorithm, we show results for videos featuring large moving foreground objects, parallax, and low-illumination. We also compare our method with commercial image-based stabilization algorithms. We find that our solution is more robust and computationally inexpensive. Finally, we implement our algorithm directly on a mobile phone. We demonstrate that by using the phone's inbuilt gyroscope and GPU, we can remove camera shake and rolling shutter artifacts in real-time.
|
 |
 |
Baek, J., Jacobs, D. E.
Accelerating Spatially Varying Gaussian Filters.
Proc. ACM SIGGRAPH Asia 2010.
Abstract: High-dimensional Gaussian filters, most notably the bilateral filter, are important tools for many computer graphics and vision tasks. In recent years, a number of techniques for accelerating their evaluation have been developed by exploiting the separability of these Gaussians. However, these techniques do not apply to the more general class of spatially varying Gaussian filters, as they cannot be expressed as convolutions. These filters are useful because the underlying data.e.g. images, range data, meshes or light fields.often exhibit strong local anisotropy and scale. We propose an acceleration method for approximating spatially varying Gaussian filters using a set of spatially invariant Gaussian filters each of which is applied to a segment of some non-disjoint partitioning of the dataset. We then demonstrate that the resulting ability to locally tilt, rotate or scale the kernel improves filtering performance in various applications over traditional spatially invariant Gaussian filters, without incurring a significant penalty in computational expense.
|
 |
 |
Adams, A. B., Talvala, E., Park, S. H., Jacobs, D. E., Ajdin, B., Gelfand, N., Dolson, J., Vaquero, D., Baek, J., Tico, M., Lensch, H. P. A., Matusik, W., Pulli, K., Horowitz, M. and Levoy, M.
The Frankencamera: an Experimental Platform for Computational Photography.
Proc. ACM SIGGRAPH 2010.
Abstract: Although there has been much interest in computational photography within the research and photography communities, progress has been hampered by the lack of a portable, programmable camera with sufficient image quality and computing power. To address this problem, we have designed and implemented an open architecture and API for such cameras: the Frankencamera. It consists of a base hardware specification, a software stack based on Linux, and an API for C++. Our architecture permits control and synchronization of the sensor and image processing pipeline at the microsecond time scale, as well as the ability to incorporate and synchronize external hardware like lenses and flashes. This paper specifies our architecture and API, and it describes two reference implementations we have built. Using these implementations we demonstrate six computational photography applications: HDR viewfinding and capture, low-light viewfinding and capture, automated acquisition of extended dynamic range panoramas, foveal imaging, IMU-based hand shake detection, and rephotography. Our goal is to standardize the architecture and distribute Frankencameras to researchers and students, as a step towards creating a community of photographer-programmers who develop algorithms, applications, and hardware for computational cameras.
|
 |
 |
Dolson, J., Baek, J., Plagemann, C. and Thrun, S.
Upsampling Range Data in Dynamic Environments.
Proc. IEEE CVPR 2010.
Abstract: We present a flexible method for fusing information from optical and range sensors based on an accelerated high-dimensional filtering approach. Our system takes as input a sequence of monocular camera images as well as a stream of sparse range measurements as obtained from a laser or other sensor system. In contrast with existing approaches, we do not assume that the depth and color data streams have the same data rates or that the observed scene is fully static. Our method produces a dense, high-resolution depth map of the scene, automatically generating confidence values for every interpolated depth point. We describe how to integrate priors on object shape, motion and appearance and how to achieve an efficient implementation using parallel processing hardware such as GPUs.
|
 |
 |
Baek, J.
Transfer Efficiency and Depth Invariance in Computational Cameras.
Proc. IEEE ICCP 2010.
Abstract: Recent advances in computational cameras achieve extension of depth of field by modulating the aperture of an imaging system, either spatially or temporally. They are, however, accompanied by loss of image detail, the chief cause of which is low and/or depth-varying frequency response of such systems. In this paper, we examine the tradeoff between achieving depth invariance and maintaining high transfer efficiency by providing a mathematical framework for analyzing the transfer function of these computational cameras. Using this framework, we prove mathematical bounds on the efficacy of the tradeoff. These bounds lead to observations on the fundamental limitations of computational cameras. In particular, we show that some existing designs are already near-optimal in our metrics.
|
 |
 |
Adams, A. B., Baek, J. and Davis, M. A.
Fast High-Dimensional Filtering using the Permutohedral Lattice.
Proc. Eurographics 2010. Best Paper Runner-Up!
Abstract: Many useful algorithms for processing images and geometry fall under the general framework of high-dimensional Gaussian filtering. This family of algorithms includes bilateral filtering and non-local means. We propose a new way to perform such filters using the permutohedral lattice, which tessellates high-dimensional space with uniform simplices. Our algorithm is the first implementation of a high-dimensional Gaussian filter that is both linear in input size and polynomial in dimensionality. Furthermore it is parameter-free, apart from the filter size, and achieves a consistently high accuracy relative to ground truth (> 45 dB). We use this to demonstrate a number of interactive-rate applications of filters in as high as eight dimensions.
|
Dissertations
 |
 |
Baek, J.
WYSIWYG Computational Photography via Viewfinder Editing.
Doctor of Philosophy Thesis, Stanford University, 2013. PDF.
Abstract: The past decade witnessed a rise in the ubiquity and capability of digital photography, paced by the advances in embedded devices, image processing and social media. Along with it, the popularity of computational photography also grew. Many computational photography techniques work by first capturing a coded representation of the scene---a stack of photographs with different settings, an image obtained via a modified optical path, et cetera---and then computationally decoding it later as a post-process according to the user's specification. However, the coded representation, available to the user at the time of capture, is often not sufficiently indicative of the decoded output that will be produced later. Depending on the type of the computational photography technique involved, the coded representation may appear to be a distorted image, or may not even be an image at all. Consequently, these techniques discard one of the most significant attractions of digital photography: the what-you-see-is-what-you-get (WYSIWYG) experience. In response, this dissertation explores a new kind of interface for manipulating images in computational photography applications, called viewfinder editing. With viewfinder editing, the viewfinder more accurately reflects the final image the user intends to create, by allowing the user to alter the local or global appearance of the photograph via stroke-based input on a touch-enabled digital viewfinder, and propagating the edits spatiotemporally. Furthermore, the user specifies via the interface how the coded representation should be decoded in computational photography applications, guiding the acquisition and composition of photographs and giving immediate visual feedback to the user. Thus, the WYSIWYG aspect is reclaimed, enriching the user's photographing experience and helping him make artistic decisions before or during capture, instead of after capture. This dissertation realizes and presents a real-time implementation of viewfinder editing on a mobile platform, constituting the first of its kind. This implementation is enabled by a new spatiotemporal edit propagation method that meaningfully combines and improves existing algorithms, achieving an order-of-magnitude speed-up over existing methods. The new method trades away spatial locality for efficiency and robustness against camera or scene motion. Finally, several applications of the framework are demonstrated, such as high-dynamic-range (HDR) multi-exposure photography, focal stack composition, selective colorization, and general tonal editing. In particular, new camera control algorithms for stack metering and focusing are presented, which takes advantage of the knowledge of the user's intent indicated via the viewfinder editing interface and optimizes the camera parameters accordingly.
|
 |
 |
Baek, J.
Multi-channel coded-aperture photography.
Master of Engineering Thesis, MIT, 2008. PDF.
Abstract: This thesis describes the multi-channel coded-aperture photography, a modified camera system that can extract an all-focus image of the scene along with a depth estimate over the scene. The modification consists of inserting a set of patterned color filters into the aperture of the camera lens. This work generalizes the previous research on a single-channel coded aperture, by deploying distinct filters in the three primary color channels, in order to cope better with the effect of a Bayer filter and to exploit the correlation among the channels. We derive the model and algorithms for the multi-channel coded aperture, comparing the simulated performance of the reconstruction algorithm against that of the original single-channel coded aperture. We also demonstrate a physical prototype, discussing the challenges arising from the use of multiple filters. We provide a comparison with the single-channel coded aperture in performance, and present results on several scenes of cluttered objects at various depths.
|
Education
Here is a list of my accreditation. These are also places I hold dear in my heart.
Teaching
I do enjoy teaching very much, whether the topic be computer science or others.
 |
Winter, 2012: At Stanford, I taught CS478: Computational Photography for the winter quarter of 2012 as a teaching fellow, with Dave Jacobs, in Professor Marc Levoy's absence.
|
 |
Winter, 2010: I was a teaching assistant for CS448A: Computational Photography, Winter 2010, under Marc Levoy, responsible for developing the assignments and advising student projects. This experience eventually helped me teach a future iteration of the course (CS478) as a teaching fellow.
|
 |
Fall, 2009: I also served as a teaching assistant for CS148: Introduction to Computer Graphics and Imaging , Fall 2009, under Professor Pat Hanrahan.
|
 |
Spring, 2008: In addition, I served as a graduate teaching assistant for 6.005: Elements of Software Construction for the spring term of 2008 at MIT (rated 6.8/7.0 overall by students in HKN-run evaluation) for Daniel Jackson and Saman Amarasinghe.
|
 |
I volunteered for Educational Studies Program (ESP) at MIT, in 2006 and 2008, to teach an 8-week summer course called Paradox on paradoxes, logic and philosophy of language, to local high-schoolers. I taught the same course also at Stanford Splash in 2012.
|
Coursework
 |
Spring, 2009: Check out Glacier Cave (non-Stanford link), for CS348B: Image Synthesis rendering competition, which won the grand prize. This was joint work with Dave Jacobs and Abe Davis. A 2048x2048 render is featured in the second edition of Physically Based Rendering.
|
 |
Winter, 2008: Dave Jacobs and I worked on a project titled High dynamic range imaging in the presence of motion for CS223B: Introduction to Computer Vision, but the project is currently shelved till later.
|
 |
Fall, 2008: I created a game called Hazard for CS248: Introduction to Computer Graphics video game competition, which was a finalist at the competition. It is basically a wacky arcade spin on the well-known game Minesweeper.
|
 |
Fall, 2007: There are two papers from 18.821: Project Laboratory in Mathematics, respectively titled Points on conics modulo p and Finding geodesics on surfaces (joint work with Katherine Redfield and Anand Deopurkar). Interestingly, the latter manuscript has gathered some citations over the years.
|
 |
Spring, 2007: I wrote an exposition titled Introduction to infinite Ramsey theory for 18.504: Seminar in Logic.
|
 |
Summer, 2006: I was in an undergraduate research program with the Software Design Group at MIT CSAIL, under Professor Daniel Jackson, and helped with some visualization work on the Alloy project, a language for specifying and solving logical constraints. I am credited in the "About" page of the software.
|
 |
Spring, 2006: A report on a new prototype of vegetation clipper for demining, from SP.776: Design for Demining is available here (joint work with Aaron Doody.) There are also pictures from a field blast test in 2007.
|
Accolades
Minor things here and there.
- Stanford Graduate Fellowship (2010-2013)
- NSF Graduate Fellowship: Honorable Mention (2008)
- William Lowell Putnam Mathematics Competition: Honorable Mention (2006)
- USA Mathematical Olympiad: Winner (2004), Honorable Mention (2002, 2003)
- USA Computing Olympiad: Finalist (2002)
Stuff
My Erdős number is 4. (Paul Erdős > Endre Szemerédi > Leo Guibas > Natasha Gelfand > Jongmin Baek) My Bacon number is undefined, but if you would like to help me get one, let me know. :-)
I have served as a reviewer for the following venues:
- ACM SIGGRAPH (2009, 2010, 2012, 2014)
- ACM SIGGRAPH Asia (2010, 2011)
- Eurographics (2011, 2013)
- Pacific Graphics (2013)
- IEEE ICCV (2007)
- IEEE ICCP (2010)
- IEEE TIP (2013, 2015)
- OSA Optics Express (2010)
I have maintained some interest in the philosophy of language, the philosophy of mathematics, epistemology and logic. I have taken classes from Sally Haslanger, Agustin Rayo, Vann McGee and Richard Holton, and try to keep up by finding things to read.
I do not have much wisdom to share (yet). I can nonetheless offer advice on the courses that I took during my four years at MIT.
I serve as one of MIT's Educational Counselors.
In my spare time, I root for, though not with fervor, the Boston Red Sox, the San Jose Sharks and the San Francisco 49ers. Now that I am no longer an undergraduate, I find myself following the seasons somewhat more loosely than before.
I am a novice rock climber and soccer player. I also try to be a good photographer in my spare time.
|