Fall 2020 · Stanford Software Research Lunch

Program for the fall quarter of 2020.

9/17: Organizational Lunch

Time: Thursday, September 17, 2020, 12 noon - 1pm
Location: Zoom

Organizational lunch. Come sign up to give a talk during the quarter.

9/24: A Theoretical Framework for Symbolic Quick Error Detection

Time: Thursday, September 24, 2020, 12 noon - 1pm
Location: Zoom

Speaker: Florian Lonsing

Abstract: Symbolic quick error detection (SQED) is a formal pre-silicon verification technique targeted at processor designs. It leverages bounded model checking (BMC) to check a design for counterexamples to a self-consistency property: given the instruction set architecture (ISA) of the design, executing an instruction sequence twice on the same inputs must always produce the same outputs. Self-consistency is a universal, implementation-independent property. Consequently, in contrast to traditional verification approaches that use implementation-specific assertions (often generated manually), SQED does not require a full formal design specification or manually-written properties. Case studies have shown that SQED is effective for commercial designs and that SQED substantially improves design productivity. However, until now there has been no formal characterization of its bug-finding capabilities. We aim to close this gap by laying a formal foundation for SQED. We use a transition-system processor model and define the notion of a bug using an abstract specification relation. We prove the soundness of SQED, i.e., that any bug reported by SQED is in fact a real bug in the processor. Importantly, this result holds regardless of what the actual specification relation is. We next describe conditions under which SQED is complete, that is, what kinds of bugs it is guaranteed to find. We show that for a large class of bugs, SQED can always find a trace exhibiting the bug. Ultimately, we prove full completeness of a variant of SQED that uses specialized state reset instructions. Our results enable a rigorous understanding of SQED and its bug-finding capabilities and give insights on how to optimize implementations of SQED in practice.

10/1: Automating System Configuration

Time: Thursday, October 1, 2020, 12 noon - 1pm
Location: Zoom

Speaker: Nestan Tsiskaridze

Abstract: This talk will describe our recent work on employing automated reasoning for solving the system configuration problem. I will present our new framework, which automatically finds correct configurations of a system for desired applications. The framework leverages advances in Model Checking and Satisfiability Modulo Theories (SMT) techniques and utilizes the state-of-the-art, award-winning, tools Pono and Boolector developed in our lab. The framework will be showcased on an ongoing Lake project at the AHA! Agile Hardware Center at Stanford for CGRA memory tile design. In agile design, rapid prototyping and optimization of the hardware require preserving correct execution of applications as the hardware changes. In a traditional setting, a compiler frontend needs to be updated to understand new hardware targets and configure them appropriately. Due to the scale and complexity of designs, in conjunction with decoupled workflows, this is not possible in a timely manner with manual effort, and a gap appears between the compiler and the hardware. To bridge the gap, we depend on automation. I will illustrate how we configure our latest CGRA memory tile designs at AHA! for image processing applications. Through the talk, I will share a story of a synergy between two communities, those of formal methods and system design, and how such a synergy leads us to a more efficient, automation-friendly, and automation-guided, hardware design. Finally, I will highlight some challenges and problems we encountered along this path.

10/8: Plato: An SoC Assembly Tool Using MLIR

Time: Thursday, October 8, 2020, 12 noon - 1pm
Location: Zoom

Speaker: Amalee Wilson

Abstract: I will describe my work during my internship at SiFive over the summer. The goal of the project is to create a more modern, scalable, and modular SoC assembly tool, and my work prototypes a data model which serves as the single source of truth for the SoC design. The data model is an MLIR dialect (Multi-Level Intermediate Representation, https://mlir.llvm.org/), and uses the RTL dialect and Verilog emission of CIRCT (Circuit IR Compilers and Tools, https://github.com/llvm/circt), an open source effort for using MLIR to represent hardware. Both MLIR and CIRCT will be introduced in more detail during the talk. Currently the data model / MLIR dialect can represent many of the important pieces of an SoC design at the IP composition level, and some support exists for Verilog emission through CIRCT. Two main goals for the future of the data model are as follows: to effectively represent generators in such a way that the assembly tool can interact with generators and update the data model, and to capture enough information about the SoC to generate Verilog, software components, verification collateral, and documentation. In this talk I will discuss the motivation for this single-source-of-truth data model, what the current project is capable of, and other efforts to use MLIR for hardware.

10/15: Circify: Compiling programs to circuits for cryptographic proofs, symbolic execution, and maybe more.

Time: Thursday, October 15, 2020, 12 noon - 1pm
Location: Zoom

Speaker: Alex Ozdemir

Abstract: We present an infrastructure for compiling from programming languages with stateful semantics (i.e. mutable variables and control flow) to existentially quantified circuits (i.e. non-deterministic, stateless computations). This infrastructure addresses the common compilation needs of seemingly disparate applications: symbolic execution, cryptographic proof systems, and multiparty computation. Each of these applications stands to benefit from common transformations and optimization in our infrastructure. Each of the these applications can be supported by our infrastructure through small compiler extensions. Furthermore, mixing and matching extensions enables interesting new applications and techniques. This talk concerns a project in progress. I'll lay out current progress, future plans, and would very much appreciate suggestions and ideas.

10/22: Techniques for Nonlinear Real Arithmetic

Time: Thursday, October 22, 2020, 12 noon - 1pm
Location: Zoom

Speaker: Gereon Kremer

Abstract: Satisfiability Modulo Theories (SMT) has proven to be a powerful way to formulate and solve problems from many applications. One of the most challenging theories considered in current research is nonlinear real arithmetic (NRA): inequalities over real polynomials. This talk surveys techniques to solve nonlinear NRA problems that are used in modern SMT solvers. We start with heuristic approaches (linearization and interval constraint propagation) that aim to shrink the search space until it either becomes empty or is close enough to the actual solution space that an arbitrary guess yields a solution. These methods can be seen as approximation schemes and are usually very efficient, but also vulnerable to non-termination. We then continue with incomplete methods that are algebraically motivated (e.g. Gröbner bases and virtual substitution). Though they do not rely on approximation, they may not be able to proceed at some point and thus yield inconclusive results. In the third part, we discuss a complete decision procedure (e.g. cylindrical algebraic decomposition), which however suffers from bad worst-case runtime complexity. Finally, we mention some further techniques for NRA problems as well as extensions to NRA itself.

10/29: Towards Verified Stochastic Variational Inferece for Probabilistic Programs

Time: Thursday, October 29, 2020, 12 noon - 1pm
Location: Zoom

Speaker: Wonyeol Lee

Abstract: Probabilistic programming is the idea of writing models from statistics and machine learning using program notations and reasoning about these models using generic inference engines. Recently its combination with deep learning has been explored intensely, leading to the development of deep probabilistic programming languages such as Pyro and Edward. At the core of this development lie inference engines based on stochastic variational inference algorithms, which convert a posterior-inference query into an optimisation problem and solve it approximately by gradient ascent.
In this paper, we analyse one of the most fundamental and versatile variational inference algorithms, called score estimator or REINFORCE, using tools from denotational semantics and program analysis. We formally express what this algorithm does on models denoted by programs, and expose implicit assumptions made by the algorithm. The violation of these assumptions may lead to an undefined optimisation objective or the loss of convergence guarantee of the optimisation process. We then describe rules for proving these assumptions, which can be automated by static program analyses. Following our general methodology, we have developed a static program analysis for Pyro that aims at discharging the assumption about what we call model-guide support match. Applied to the eight representative model-guide pairs from the Pyro webpage, our analysis finds a bug in one of these cases and shows that the assumptions are met in other six cases.

11/5: Evaluating Robustness to Context-Sensitive Feature Perturbations of Different Granularities

Time: Thursday, November 5, 2020, 12 noon - 1pm
Location: Zoom

Speaker: Hadrien Pouget

Abstract: We cannot guarantee that training datasets are representative of the distribution of inputs that will be encountered during deployment. So we must have confidence that our models do not over-rely on this assumption. To this end, we introduce a new method that identifies context-sensitive feature perturbations (e.g.~shape, location, texture, colour) to the inputs of image classifiers. We produce these changes by performing small adjustments to the activation values of different layers of a trained generative neural network. Perturbing at layers earlier in the generator causes changes to coarser-grained features; perturbations further on cause finer-grained changes. Unsurprisingly, we find that state-of-the-art classifiers are not robust to any such changes. More surprisingly, when it comes to coarse-grained feature changes, we find that adversarial training against pixel-space perturbations is not just unhelpful: it is counterproductive.

Food: Doordash,

11/12: Counterexample-Guided Prophecy for Model Checking Modulo the Theory of Arrays

Time: Thursday, November 12, 2020, 12 noon - 1pm
Location: Zoom

Speaker: Makai Mann

Abstract: Existing literature and “folklore” knowledge has established that prophecy variables can sometimes play the same role as universally quantified variables, making it possible to transform a system that would require quantified reasoning into one that does not. Until now, however, there has been no automatic method for applying this transformation. In this paper, we introduce a technique we call counterexample- guided prophecy. The idea is that during the refinement step of an abstraction-refinement loop, it may be possible to automatically introduce prophecy variables that aid the refinement step and may also reduce the need for quantified reasoning. We develop this idea in the context of model checking for infinite-state systems over arrays, which often requires quantified reasoning using existing approaches. We show how a standard abstraction for arrays can be augmented with counterexample-guided prophecy to obtain an algorithm that can leverage off-the-shelf solvers (which need only support quantifier-free, array-free reasoning) to solve challenging model checking problems over arrays.

Food: Doordash,

11/19: On the semantic expressiveness of recursive types

Time: Thursday, November 19, 2020, 12 noon - 1pm
Location: Zoom

Speaker: Marco Patrignani

Abstract: Recursive types extend the simply-typed lambda calculus (STLC) with the additional expressive power to enable diverging computation and to encode recursive data-types (e.g., lists). Two formulations of recursive types exist: iso-recursive and equi-recursive. The relative advantages of iso- and equi-recursion are well-studied when it comes to their impact on type-inference. However, the relative semantic expressiveness of the two formulations remains unclear so far.
This paper studies the semantic expressiveness of STLC with iso- and equi-recursive types, proving that these formulations are mph{equally expressive}. In fact, we prove that they are both as expressive as STLC with only term-level recursion. We phrase these equi-expressiveness results in terms of full abstraction of three canonical compilers between these three languages (STLC with iso-, with equi-recursive types and with term-level recursion). Our choice of languages allows us to study expressiveness when interacting over both a simply-typed and a recursively-typed interface. The three proofs all rely on a typed version of a proof technique called approximate backtranslation.
Together, our results show that there is no difference in semantic expressiveness between STLCs with iso- and equi-recursive types. In this paper, we focus on a simply-typed setting but we believe our results scale to more powerful type systems like System~F.

Food: Doordash,