Winter 2017 · Stanford Software Research Lunch

Program for the winter quarter of 2017.

1/13: Mathematical Execution

Time: Friday, January 13, 2017, 12 noon - 1pm
Location: Gates 463a

Speaker: Zhoulai Fu

Abstract: I will present Mathematical Execution (ME), a new, unconventional method for reasoning about numerical code. The idea is to reduce the problem of testing/verifying a program into the problem of minimizing a derived representing function. ME is particularly efficient for numerical code; it directs input space exploration by only executing the representing function, which avoids static or symbolic reasoning about the program semantics. We have applied ME on four instances: (1) satisfiability solving, (2) boundary value analysis, (3) coverage-based testing, and (4) path reachability. Our results are promising. On (1), for example, evaluated on floating-point constraints from SMT-Competition benchmarks, ME provides an average speedup of more than 700X over MathSat and 800X over Z3.

Food: Stefan

1/20: SEAM: A Language for Local Mutations of Graph-like Data Structures

Time: Friday, January 20, 2017, 12 noon - 1pm
Location: Gates 463a

Speaker: Manolis Papadakis

Abstract: In this talk I will present SEAM, a domain-specific language that allows programmers to describe collections of complex, interconnected data structures (e.g. unstructured meshes) over a relational data model, and program local modification operations over them. Our goal with SEAM is to make such programs easy to write and verify, while offering transparent performance and remaining competitive with manual implementations. I will describe how we've designed the language in pursuit of these goals, and our current status in its implementation.

Food: Omid

1/27: ACIDRain: Concurrency-Related Attacks on Database-Backed Web Applications

Time: Friday, January 27, 2017, 12 noon - 1pm
Location: Gates 463a

Speaker: Todd Warszawski

Abstract: In theory, database transactions protect application data from corruption and integrity violations. In practice, database transactions frequently execute under weak isolation that exposes programs to a range of concurrency anomalies, and programmers may fail to correctly employ transactions. While low transaction volumes mask many potential concurrency-related errors under normal operation, determined adversaries can exploit them programmatically for fun and profit. In this work, we formalize a new kind of attack on databse-backed applications called an ACIDRain attack, in which an adversary systematically exploits concurrency-related vulnerabilities via programmatically accessible APIs. To proactively detect the potential for ACIDRain attacks, we extend the theory of weak isolation to analyze latent potential for non-serializable behavior under concurrent web API calls. We introduce a language-agnostic method for detecting potential isolation anomalies in web applications, called Abstract Anomaly Detection (2AD), that uses dynamic traces of database accesses to efficiently reason about the space of possible concurrent interleavings. We apply a prototype 2AD analysis tool to 12 popular self-hosted eCommerce applications written in four languages and with a total deploy base of over 2M websites. We identify and verify 22 critical ACIDRain attacks that allow attackers to corrupt store inventory, over-spend gift cards, and steal inventory.

Food: Pratiksha

2/3: Scalable Global Static Analysis, Automation, and Secrecy

Time: Friday, February 3, 2017, 12 noon - 1pm
Location: Gates 463a

Speaker: Kwangkeun Yi

Abstract: I will talk about three techniques towards our goal of making scalable, semantic-based global static analysis easily available to non-expert software developers.
Though static analysis is widely deployed in practice (verification, bug-finding, maintenance, optimizations, and etc.), it is still of limited use. Developing an impactful static analyzer is difficult. Depending on its deployment models, every static analysis needs to strike a different balance between its soundness, scalability, and precision.
Our position is that sound and scalable analyzers whose precision is open as a parameter can be automatically available at least for C-like languages. I will first present our general sparse analysis framework to achieve sound, scalable, semanti-based global analysis (to globally analyze million-line C programs in about 10 hours). Given a static analysis definition as a fixpoint computation of an approximate semantics of the input program, the sparse framework guides you how to make it scalable without compromising the analysis precision. Then I will introduce our ZooBerry system to automatically implement this sparse techniques inside static analyzers. From a high-level approximate semantics definition of a C-like language and its soundness Coq proof, ZooBerry automatically generates a sparse static analyzer and its verified validator. Lastly, I will discuss static analysis of encrypted programs, to help sw developers enjoy static analysis service in clouds.

Food: Andres

2/10: Scalable Safety and Relialibity Analyses via Symbolic Model Checking

Time: Friday, February 10, 2017, 12 noon - 1pm
Location: Gates 463a

Speaker: Cristian Mattarei

Abstract: Assuring safety and reliability is fundamental when developing a safety critical system. Road, naval and avionic transportation; water and gas distribution; nuclear, eolic, and photovoltaic energy production are only some examples where it is mandatory to guarantee those properties. The continuous increasing in the design complexity of safety critical system calls for a never ending sought of new and more advanced analytical techniques. In fact, they are required to assure that undesired consequences are highly improbable.
In this talk I present a novel methodology oriented to automatize the safety and reliability analyses of critical systems. The proposed approach integrates a series of techniques, based on symbolic model checking, into the current development process of safety critical systems. More specifically, the proposed techniques improved the process by covering three main aspects. First, we have provided a significant improvement in the performance of the back-end engines, by reducing the problem to parametric model checking. Second, we have defined the first fully automated technique, based on the aforementioned technique, for the generation of hierarchical fault trees i.e., a widely used artifact in safety engineering. Third, we have introduced a novel and very efficient technique for the analysis of safety critical redundant architectures.
The presentation will thereafter concentrate on the application of the proposed methodology and resulting techniques to a series of real-world case studies, developed in collaboration with NASA and the Boeing Company.

Food: Lázaro

2/17: Context-Sensitive Data-Dependence Analysis via Linear Conjunctive Language Reachability

Time: Friday, February 17, 2017, 12 noon - 1pm
Location: Gates 463a

Speaker: Qirun Zhang

Abstract: Many program analysis problems can be formulated as graph reachability problems. In the literature, context-free language (CFL) reachability has been the most popular formulation and can be computed in subcubic time. The context-sensitive data-dependence analysis is a fundamental abstraction that can express a broad range of program analysis problems. It essentially describes an interleaved matched-parenthesis language reachability problem. The language is not context-free, and the problem is well-known to be undecidable. In practice, many program analyses adopt CFL-reachability to exactly model the matched parentheses for either context-sensitivity or structure-transmitted data-dependence, but not both. Thus, the CFL-reachability formulation for context-sensitive data-dependence analysis is inherently an approximation.
In this talk, I will introduce linear conjunctive language (LCL) reachability, a new, expressive class of graph reachability. Given a graph with n nodes and m edges, we propose an O(mn) time approximation algorithm for solving all-pairs LCL-reachability, which is asymptotically better than known CFL-reachability algorithms. We have applied the LCL-reachability framework to two existing client analyses. The experimental results show that the LCL-reachability framework is both more precise and scalable than the traditional CFL-reachability framework.

Food: Wonchan

2/24: Sound Loop Superoptimization for Google Native Client

Time: Friday, February 24, 2017, 12 noon - 1pm
Location: Gates 463a

Speaker: Berkeley Churchill

Abstract: Typical optimizing compilers perform a fixed series of optimizations on an input to generate optimized assembly code. In contrast, superoptimizers search through a space of program optimizations to generate optimal code, and previous works shows that this approach can offer significant performance improvements for straight-line programs. In this presentation we generalize superoptimization to loops. We apply our techniques to Google Native Client, a Software Fault Isolation system that ships inside the Google Chrome Web browser that allows web developers to run native code inside the browser within a secure sandbox.
Key to our results are new techniques for superoptimization of loops: we propose a new architecture for superoptimization tools that incorporates both a fully sound verification technique to ensure correctness and a bounded verification technique to guide the search to optimized code. In our evaluation we optimize 13 libc string functions, formally verify the correctness of the optimizations and report a median and average speedup of 25% over the libraries shipped by Google.

Food: Wonyeol

3/3: Reluplex: An Efficient SMT Solver for Verifying Deep Neural Networks

Time: Friday, March 3, 2017, 12 noon - 1pm
Location: Gates 463a

Speaker: Guy Katz

Abstract: Deep neural networks have emerged as a widely used and effective means for tackling complex, real-world problems. However, a major obstacle in applying them to safety-critical systems is the great difficulty in providing formal guarantees about their behavior. We present a novel, scalable, and efficient technique for verifying properties of deep neural networks (or providing counter-examples). The technique can also be used to measure a network's robustness to adversarial inputs. Our approach is based on the simplex method, extended to handle the non-convex Rectified Linear Unit (ReLU) activation function, which is a crucial ingredient in many modern neural networks. The verification procedure tackles neural networks as a whole, without making any simplifying assumptions. We evaluated our technique on a prototype deep neural network implementation of the next-generation Airborne Collision Avoidance System for unmanned aircraft (ACAS Xu). Results show that our technique can successfully prove properties of networks that are an order of magnitude larger than the largest networks verified using existing methods.
Based on joint work with Clark Barrett, David Dill, Kyle Julian and Mykel Kochenderfer. https://arxiv.org/abs/1702.01135

Food: Todd

3/17: Data pipeline programming for IoT Applications with Ravel

Time: Friday, March 17, 2017, 12 noon - 1pm
Location: Gates 463a

Speaker: Giovanni Campagna

Abstract: Most IoT applications are written using the eMbedded-Gateway-Cloud architecture. An unique challenge in writing IoT applications this way is the need for an end-to-end design that considers all the different issues of embedded, gateway (phone) and cloud. These are very different platforms and languages, different performance requirements and different expertise on the part of the programmers. We propose Ravel as a domain-specific language for the data pipeline of IoT applications, that abstracts away the differences between these platforms in a single unified Model-View-Controller paradigm. In Ravel, the programmer focuses on the data computation happening on each node, and the system automatically generates storage, network, and encryption code.
While the evaluation is not done yet, we plan to evaluate the Ravel language with a case study of 3 different IoT applications from the literature, as well as benchmarks that would show the impact of using Ravel in embedded (where performance is the most critical). At this stage, we welcome feedback on the language and on the evaluation plan.

Food: Manolis