Fall 2016 · Stanford Software Research Lunch

Program for the fall quarter of 2016.

9/30: Organizational Lunch

Time: Friday, September 30, 2016, 12 noon - 1pm
Location: Gates 463a

Organizational lunch. Come sign up to give a talk during the quarter.

Food: Stefan

10/7: Proving that Programs do not Discriminate

Time: Friday, October 7, 2016, 12 noon - 1pm
Location: Gates 463a

Speaker: Aws Albarghouthi

Abstract: Programs have become powerful arbitrators of a range of significant decisions with far-reaching societal impact -- hiring, welfare allocation, prison sentencing, policing, amongst an ever-growing list. In such scenarios, the program is carrying out a sensitive task, and could potentially be illegally discriminating -- advertently or inadvertently -- against a protected group, e.g., African Americans in the United States.
With the range and sensitivity of algorithmic decisions expanding by the day, the question of whether an algorithm is fair (unbiased) has captured the attention of a broad spectrum of experts, from law scholars to computer science theorists. Ultimately, algorithmic fairness is a question about programs and their properties: Does a program discriminate against a subset of the population? In this talk, I will view algorithmic fairness through the lens of program verification. Specifically, I will begin by formalizing the notion of fairness as a probabilistic property of programs. To enable automated verification of fairness, I will show how to reduce the probabilistic verification question to that of volume computation over first-order formulas, and describe a new symbolic volume computation algorithm. Finally, I will present results of applying FairSquare -- the first fairness verification tool -- to a variety of decision-making programs.

Food: Berkeley

10/14: Marrying Generational GC and Region Techniques for High-Throughput, Low-Latency Big Data Memory Management

Time: Friday, October 14, 2016, 12 noon - 1pm
Location: Gates 463a

Speaker: Harry Xu

Abstract: Most “Big Data” systems are written in managed languages such as Java, C#, or Scala. These systems suffer from severe memory problems due to massive volumes of objects created to process input data. Allocating and deallocating a sea of data objects puts a severe strain on existing garbage collectors (GC), leading to high memory management overhead and reduced performance. We have developed a series of techniques at UC Irvine to tackle this problem. In this talk, I will first talk about Facade (ASPLOS'15), a compiler and runtime system that can statically bound the number of data objects created in the heap. Next, I will talk about our recent work on Yak (OSDI'16), a new hybrid garbage collector that splits the managed heap into a control and a data space, and uses a generational GC and a region-based technique to manage them, respectively.

Food: Lázaro

10/21: Regent: A High-Productivity Programming Language for Implicit Parallelism with Logical Regions

Time: Friday, October 21, 2016, 12 noon - 1pm
Location: Gates 463a

Speaker: Elliott Slaughter

Abstract: Parallel (and distributed) programming is required for performance on a variety of machines. Existing parallel programming models select trade-offs that enable performance and scalability but impose responsibilities on the user which make programming difficult and error-prone, in a manner analogous to manual memory management. Implicit parallel removes many of these burdens in a manner analogous to an automatic garbage collector. Some success has been demonstrated for implicitly parallel programming models on structured codes with regular array accesses. However, no implementation of implicit parallelism has been able to successfully generate efficient codes for unstructured applications while maintaining performance equivalent to the best hand-tuned alternatives for parallel, distributed-memory machines.
Regent is a programming language for implicit parallelism which generates efficient code for structured and unstructured applications on parallel, distributed machines. By leveraging a carefully designed programming model with first-class support for partitioning computations (via tasks: functions eligible to be parallelized) and data (via logical regions: collections of data elements), an optimizing compiler for Regent is able to generate efficient code which matches the performance of the best hand-tuned codes for parallel, distributed-memory machines.

Food: Wonchan

10/28: Safety Verification of Deep Neural Networks

Time: Friday, October 28, 2016, 12 noon - 1pm
Location: Gates 463a

Speaker: Marta Kwiatkowska

Abstract: Deep neural networks have achieved impressive experimental results in image classification, but can surprisingly be unstable with respect to adversarial perturbations, that is, minimal changes to the input image that cause the network to misclassify it. With potential applications including perception modules and end-to-end controllers for self-driving cars, this raises concerns about their safety. We develop the first SMT-based automated verification framework for feed-forward multi-layer neural networks that works directly with the code of the network, exploring it layer by layer. We define safety for a region around a data point in a given layer by requiring that all points in the region are assigned the same class label. Working with a notion of a manipulation, a mapping between points that intuitively corresponds to a modification of an image, we employ discretisation to enable exhaustive search of the region. Our method can guarantee that adversarial examples are found for the given region and set of manipulations. If found, adversarial examples can be shown to human testers and/or used to fine-tune the network, and otherwise the network is declared safe for the given parameters. We implement the techniques using Z3 and evaluate them on state-of-the-art networks, including regularised and deep learning networks.

Food: Andres

11/4: Active learning for programming by example

Time: Friday, November 4, 2016, 12 noon - 1pm
Location: Gates 463a

Speaker: Pratiksha Thaker

Abstract: A key challenge in programming-by-example is to minimize the number of input-output examples a user must annotate in order for the synthesis system to determine the correct program. Prior work in the programming languages community proposes heuristics to choose an informative set of examples. In contrast, we cast the task of proposing examples as an active learning problem: we start with a prior distribution over programs and then sequentially choose inputs so as to maximally reduce entropy in the posterior at each step. We propose solutions for several technical challenges that arise in the process: how to avoid having syntactically-specified priors induce unintended distributions in semantic space; how to efficiently sample from a posterior over programs; and how to select an informative next input while taking into account program semantics. We demonstrate that casting the problem as active learning leads to better query complexity than an SMT-based query generator. Finally, we describe how our approach can be practically useful in a SQL-like domain.
Joint work with Daniel Tarlow, Marc Brockschmidt, Alex Gaunt, Pushmeet Kohli, Rishabh Singh.

Food: Stefan

11/11: Rocket: A Type-Safe Web Framework for Rust

Time: Friday, November 11, 2016, 12 noon - 1pm
Location: Gates 463a

Speaker: Sergio Benitez

Abstract: Rocket is a new web framework for Rust that uses code generation to provide a clean, simple, and flexible API that enforces type safety at every layer of the web request/response path. Rocket's philosophy is that request handling should be well-typed. In other words, a request handler should be called only if the incoming request has been validated. Rocket enables this through two mechanisms. First, data handlers parse incoming data before handing it off to a request handler. As a result, a request handler never operates on invalid data. Second, request guards verify that an incoming request satisfies some arbitrary policy. As a result, a request handler never operates under invalid assumptions. Programmers declare the use of these mechanisms by simply including types in the request handler’s arguments; Rocket’s code generation does the rest. Together, these mechanisms result in web applications that are more secure, correct, and easier to write, read, and reason about.

Food: Elliott

11/18: Isometry: A Path-Based Distributed Data Transfer System

Time: Friday, November 18, 2016, 12 noon - 1pm
Location: Gates 463a

Speaker: Zhihao Jia

Abstract: Data transfers within parallel systems have a significant impact on the performance of applications. Most existing systems generally support only data transfers between memories with a direct hardware connection and have limited facilities for handling transformations to the data’s layout in memory. As a result, to move data between memories that are not directly connected, higher levels of the software stack must explicitly divide a multi-hop transfer into a sequence of single-hop transfers and decide how and where to perform data layout conversions if needed. This approach results in inefficiencies, as the higher levels lack enough information to plan transfers as a whole, while the lower level that does the transfer sees only the individual single-hop requests.
We present Isometry, a path-based distributed data transfer system. The Isometry path planner selects an efficient path for a transfer and submits it to the Isometry runtime, which is optimized for managing and coordinating the direct data transfers. The Isometry runtime automatically pipelines sequential direct transfers within a path and can incorporate flexible scheduling policies, such as prioritizing one transfer over another. Our evaluation shows that Isometry can speed up data transfers by up to 2.2× and reduce the completion time of high priority transfers by up to 95% compared to the Realm data transfer system. We evaluate Isometry on three benchmarks and show that Isometry reduces transfer time by up to 80% and overall completion time by up to 60%.

Food: Manolis

12/2: Active Learning of Points-To Specifications

Time: Friday, December 2, 2016, 12 noon - 1pm
Location: Gates 463a

Speaker: Osbert Bastani

Abstract: Many static analyses, e.g., taint analysis, depend on points-to analysis to resolve aliasing relations. However, points-to analysis faces significant challenges when analyzing programs that use large libraries. For example, Android apps use the Android framework, which in turn uses native code and Java reflection (which cannot be statically analyzed), and deep abstractions (which hinder precision and scalability). One solution is to have a human analyst provide points-to specifications that summarize the relevant behaviors of library code, which are much easier to analyze than the library implementation.
We propose ATLAS, a tool that automatically infers points-to specifications. ATLAS synthesizes test cases that exercise the library code, and then generates points-to specifications based on the input-output examples observed from these executions. In particular, ATLAS uses a novel representation of points-to specifications as a formal language, so specification inference reduces to language learning. Then, ATLAS employs a novel language learning algorithm to infer points-to specifications. We show that ATLAS infers a large number of new specifications compared to existing, manually written specifications, and that these specifications significantly improve the points-to analysis.

Food: Todd

12/9: Cantor meets Scott - Semantic Foundations for Probabilistic Networks

Time: Friday, December 9, 2016, 12 noon - 1pm
Location: Gates 463a

Speaker: Steffen Smolka

Abstract: ProbNetKAT is a probabilistic extension of NetKAT with a denotational semantics based on Markov kernels. The language is expressive enough to generate continuous distributions, which raises the question of how to compute effectively in the language. This paper gives an new characterization of ProbNetKAT's semantics using domain theory, which provides the foundation needed to build a practical implementation. We show how to use the semantics to approximate the behavior of arbitrary ProbNetKAT programs using distributions with finite support. We develop a prototype implementation and show how to use it to solve a variety of problems including characterizing the expected congestion induced by different routing schemes and reasoning probabilistically about reachability in a network.

Food: Omid