[ Software Research Lunch ]


Program for the winter quarter of 2023.


1/12: Organizational Lunch

Time: Thursday, January 12, 2023, 12 noon - 1pm
Location: Gates 415

Organizational lunch. Come sign up to give a talk during the quarter.


1/19: Quokka: Fast Cloud Analytics with Resilient Distributed Streams

Time: Thursday, January 19, 2023, 12 noon - 1pm
Location: Gates 415

Speaker: Tony Wang

Abstract: We present Quokka, a distributed, pipelined, fault-tolerant batch analytics system. In many Spark applications, mappers and reducers use different computational resources that can be efficiently overlapped. Spark’s execution model does not allow such overlap, leaving performance on the table. Quokka allows reducers to process batches of data as soon as they have been generated by the mappers to fully exploit such pipeline parallelism. To achieve efficient fault tolerance, Quokka exchanges traditional techniques like checkpointing and spooling for a novel strategy that combines upstream backup with dynamic lineage tracking. We show that Quokka is around 2x faster than SparkSQL on a benchmark suite of TPC-H queries, matching the performance of Trino, a specialized SQL engine, while maintaining Spark’s fault tolerance and generality.


1/26: GPU Collectives with MSCCL

Time: Thursday, January 26, 2023, 12 noon - 1pm
Location: Gates 415

Speaker: Saeed Maleki

Abstract: Collective communication primitives on GPUs are the primary bottleneck on large neural network models. Although there have been decades of research on optimizing computation kernels, there has been very little done for collective communication kernels on GPUs. There are many challenges in area including unique GPU interconnection topologies, high P2P transfer latency, wide range of use cases for neural networks, and software complexities. In this talk, I will present program synthesis as a primary solution for communication algorithms for these topologies and show how a bespoke algorithm can significantly improve the overall performance of a model. Lastly, I will present a high-level DSL along with a compiler for mapping from an abstract synthesized algorithm to a low-level CUDA code for collective communications.


2/2: Satisfiability Modulo Finite Fields (with applications to compilers to zero-knowledge proofs)

Time: Thursday, February 2, 2023, 12 noon - 1pm
Location: Gates 415

Speaker: Alex Ozdemir

Abstract: We study satisfiability modulo the theory of finite fields and give a decision procedure for this theory. We implement our procedure for prime fields inside the cvc5 SMT solver. Using this theory, we construct SMT queries that perform translation validation for various compilers to zero knowledge proofs. Our experiments show that our field solver is vastly superior to previous approaches (which encode field arithmetic using integers or bit-vectors).


2/9: Smoothness Analysis for Probabilistic Programs with Application to Optimised Variational Inference

Time: Thursday, February 9, 2023, 12 noon - 1pm
Location: Gates 415

Speaker: Wonyeol Lee

Abstract: We present a static analysis for discovering differentiable or more generally smooth parts of a given probabilistic program, and show how the analysis can be used to improve the pathwise gradient estimator, one of the most popular methods for posterior inference and model learning. Our improvement increases the scope of the estimator from differentiable models to non-differentiable ones without requiring manual intervention of the user; the improved estimator automatically identifies differentiable parts of a given probabilistic program using our static analysis, and applies the pathwise gradient estimator to the identified parts while using a more general but less efficient estimator, called score estimator, for the rest of the program. Our analysis has a surprisingly subtle soundness argument, partly due to the misbehaviours of some target smoothness properties when viewed from the perspective of program analysis designers. For instance, some smoothness properties, such as partial differentiability and partial continuity, are not preserved by function composition, and this makes it difficult to analyse sequential composition soundly without heavily sacrificing precision. We formulate five assumptions on a target smoothness property, prove the soundness of our analysis under those assumptions, and show that our leading examples satisfy these assumptions. We also show that by using information from our analysis instantiated for differentiability, our improved gradient estimator satisfies an important differentiability requirement and thus computes the correct estimate on average (i.e., returns an unbiased estimate) under a regularity condition. Our experiments with representative probabilistic programs in the Pyro language show that our static analysis is capable of identifying smooth parts of those programs accurately, and making our improved pathwise gradient estimator exploit all the opportunities for high performance in those programs.


2/16: Recursive Program Synthesis Using Paramorphisms

Time: Thursday, February 16, 2023, 12 noon - 1pm
Location: Gates 415

Speaker: Qiantan Hong

Abstract: We show that synthesizing recursive functional programs using a class of primitive recursive combinators is both simpler and solves more benchmarks from the literature than previously proposed approaches. Our method synthesizes paramorphisms, a class of programs that includes the most common recursive programming patterns on algebraic data types. The crux of our approach is to split the synthesis problem into two parts: a multi-hole skeleton that fixes the recursive structure, and a search for non-recursive program fragments to fill the skeleton holes.


2/23: Program Synthesis for All: A Synergistic Perspective

Time: Thursday, February 23, 2023, 12 noon - 1pm
Location: Gates 415

Speaker: Yanju Chen

Abstract: Program synthesis aims to find a program that satisfies user-provided specification. It has gained enormous popularity in automating problem solving: from programming tasks and code repair, to theorem proving and superoptimization. Most modern synthesizers perform some form of backtracking search to find a satisfiable program, with additional logical and/or statistical reasoning to make this approach practical, as the size of the search space grows. However, oftentimes the two modes of reasoning are not combined tightly — feedback from one mode at synthesis time is not fully leveraged to improve the other. In this talk, I will present our recent research towards bridging the gap between the two modes of reasoning via a synergistic bond. The first approach, Concord (CAV’20), connects machine learning with logical reasoning, by formalizing program synthesis as an instance of reinforcement learning and updating its policy based on logical feedback. The second approach, Poe (PLDI’22), connects logical reasoning with machine learning by synthesizing explanations for model predictions and rectifying noisy model outputs based on consistency alignment. The core synergy highlights the huge potential for securing software robustness using program synthesis.


3/2: Verifying Deep Reinforcement Learning Systems

Time: Thursday, March 2, 2023, 12 noon - 1pm
Location: Gates 415

Speaker: Guy Amir

Abstract: Deep neural networks (DNNs) have gained significant popularity in recent years, becoming the state of the art in a variety of domains. In particular, deep reinforcement learning (DRL) has recently been employed to train DNNs that realize control policies for various types of real-world systems. In this work, we present recent advances made for formally verifying complex properties of DRL systems, both from the theoretical perspective, as well as the applicability of our approach to real-world robotic navigation platforms. The talk will be mostly based on two papers: Towards Scalable Verification of Deep Reinforcement Learning (FMCAD'2021), Verifying Learning-Based Robotic Navigation Systems (TACAS'2023)


3/9: Bridging Control-Centric and Data-Centric Optimization

Time: Thursday, March 9, 2023, 12 noon - 1pm
Location: Gates 415

Speaker: Tal Ben-Nun

Abstract: Code optimization has shifted its focus towards promoting data locality. Most production-grade compilers adopt a control-centric mindset - instruction-driven optimization augmented with scalar-based dataflow - whereas other approaches provide domain-specific and general-purpose data movement minimization, which can miss important control-flow optimizations. As the two representations are not commutable, users must choose one over the other. In this talk, we will explore both control- and data-centric approaches, and how they can work in tandem. We consider the Multi-Level Intermediate Representation (MLIR) compiler framework and how to adapt it in order to enable both optimization tactics. Lastly, we discuss the opportunities of using those optimizations for local and global program transformation.


3/16: Exhaustive Testing

Time: Thursday, March 16, 2023, 12 noon - 1pm
Location: Gates 415

Speaker: Matthew Sotoudeh

Abstract: I will give a technique for proving that you've 'tested enough', i.e., any bugs in the program would have been exposed by your test cases. Analysis of these results gives a new completeness threshold for a class of acyclic heap-manipulating C programs. This is a second attempt at my talk from last quarter, with a refined proof technique and a new application to proving correctness of a class of compiler optimizations. If time permits, I will also describe my work on a source-level partial execution engine for C that significantly reduces the effort needed to reverse-engineer embedded device initialization code.