metrical tasks
-
Dynamic Data Layout Optimization with Worst-case GuaranteesKexin Rong, Paul Liu, Sarah Ashok Sonje, Moses Charikar
[pdf] [poster] [slides]
Many data analytics systems store and process large datasets in partitions containing millions of rows. By mapping rows to partitions in an optimized way, it is possible to improve query performance by skipping over large numbers of irrelevant partitions during query processing. This mapping is referred to as a data layout. Recent works have shown that customizing the data layout to the anticipated query workload greatly improves query performance, but the performance benefits may disappear if the workload changes. In this paper, we provide improved algorithms with theoretical guarantees for the layout optimization problem.