go back to the main page

SSS Abstracts
Spring 2018

go back to the main page

CoasterX: A Case Study in Component-Driven Hybrid Systems Proof Automation

Monday, March 26^th, 2018 from 12-1 pm in GHC 6501.

Presented by Brandon Bohrer, CSD

Safety-critical cyber-physical systems (CPSs), such as automotive, rail, and aviation systems, combine discrete cyber control with continuous physical dynamics. Formal methods such as deductive proof in differential dynamic logic (dL) provide strong safety guarantees for CPSs. Verification in dL has achieved safety results in automotive, rail, and aviation domains, however (a) as with other CPS verification techniques, scaling dL proofs beyond a few dozen variables has proven difficult, and (b) modeling and verification in dL have a steep learning curve for non-experts.

We introduce component-driven proof automation, an approach to overcome these problems for CPSs built from small libraries of reusable components. First, an end-user builds a high-level component-based design in a graphical design tool, then automation exploits the component structure of the high-level model to derive a formal safety specification and proof in dL. Formal methods expertise is only required once, in the implementation of the specification generator and automated prover: the resulting design and verification toolchain can be used with no formal methods knowledge. Example applications include flight plans for unmanned drones as well as road and rail network designs.

We present the CoasterX toolchain for design and verification of roller coaster track designs as a case study in component-driven proof automation. Roller coasters are a safety-critical class of trains, characterized by gravity propulsion and closed-loop tracks with complex changes in track grade. We show how the velocity and acceleration bounds proved by CoasterX can (a) assure compliance with international safety standards and (b) explain safety violations and their resolution in real coasters, such as the Steel Phantom and its successor Phantom's Revenge, both at the Kennywood Amusement Park in Pittsburgh. We significantly advance the scalability of dL verification according to multiple metrics, achieving near order-of-magnitude improvement in the numbers of variables and component instances verified.

In Partial Fulfillment of the Speaking Requirement

Why You Should Charge Your Friends for Borrowing Your Stuff

Friday, April 6^th, 2018 from 12-1 pm in GHC 6501.

Presented by Kijung Shin, CSD

We consider goods that can be shared with k-hop neighbors (i.e., the set of nodes within k hops from an owner) on a social network. We examine incentives to buy such a good by devising game-theoretic models where each node decides whether to buy the good or free ride. First, we find that social inefficiency, specifically excessive purchase of the good, occurs in Nash equilibria. Second, the social inefficiency decreases as k increases and thus a good can be shared with more nodes. Third, and most importantly, the social inefficiency can also be significantly reduced by charging free riders an access cost and paying it to owners, leading to the conclusion that organizations and system designers should impose such a cost. These findings are supported by our theoretical analysis in terms of the price of anarchy and the price of stability; and by simulations based on synthetic and real social networks.

Joint work with Dr. Euiwoong Lee. Ms. Dhivya Eswaran, and Prof. Ariel D. Procaccia

In Partial Fulfillment of the Speaking Requirement

Designing Algorithms to Tolerate Processor Faults

Monday, April 16^th, 2018 from 12-1 pm in GHC 6501.

Presented by Charles McGuffey, CSD

We consider a parallel computational model that consists of P processors, each with a fast local ephemeral memory of limited size, and sharing a large persistent memory. The model allows for each processor to fault with bounded probability, and possibly restart. On faulting all processor state and local ephemeral memory is lost, but the persistent memory remains. This model is motivated by upcoming non-volatile memories that are as fast as existing random access memory, are accessible at the granularity of cache lines, and have the capability of surviving power outages. It is further motivated by the observation that in large parallel systems, failure of processors and their caches is not unusual. Within the model we develop a framework for developing locality efficient parallel algorithms that are resilient to failures. There are several challenges, including the need to recover from failures, the desire to do this in an asynchronous setting (i.e., not blocking other processors when one fails), and the need for synchronization primitives that are robust to failures. We describe approaches to solve these challenges based on breaking computations into what we call capsules, which have certain properties, and developing a work stealing scheduler that functions properly within the context of failures. The scheduler guarantees a time bound of O(W/P_a +D(P/P_a)*ceil(log1/f W)) in expectation, where W and D are the work and depth of the computation (in the absence of failures), P_a is the average number of processors available during the computation, and f <= 1/2 is the probability that a capsule fails. Within the model and using the proposed methods, we develop efficient algorithms for parallel sorting and other primitives.

In Partial Fulfillment of the Speaking Requirement

[CANCELED] Learning a Gene Network underlying Clinical Phenotypes under Genome-wide SNP Perturbations

Friday, April 20^th, 2018 from 12-1 pm in GHC 6501.

Presented by Calvin McCarter, MLD

Recent technologies are generating an abundance of 'omics' data, demanding new approaches for integrating data across the genome, transcriptome, and phenome to understand the genetic architecture and gene networks underlying diseases. Such methods for integrative analysis must also confront high-dimensional genomics data, comprising hundreds of thousands of SNPs and tens of thousands of gene expressions. Previous integrative methods have generally performed two-way 'omic' data analysis, examining genome-transcriptome interactions as in eQTL mapping, or genome-phenome interactions as in genome-wide association studies. Many such methods fall back on simple univariate analyses of the influence of a single SNP on a single gene expression or phenotype for statistical and computational simplicity, ignoring multiple correlated gene expressions due to gene-gene interactions in gene regulatory network. Network-based methods have been proposed in recent years, but their viability is limited by computational efficiency for high-dimensional datasets. In this work, we propose a statistical approach for multi-omic data analysis that tackles all these challenges by efficiently learning a cascade of sparse networks under genetic influence. Our statistical approach is based on Gaussian chain graph models that represent how genomes control transcriptome regulation in gene regulatory network, which in turn influence phenomes in a network of clinical traits. We propose an optimization algorithm for learning our model that is extremely efficient in both computation time and memory usage. We apply our method to analyze asthma data from patients from the Childhood Asthma Management Program (CAMP) study. With roughly 500,000 SNPs and over 10,000 genes our method learns the network cascade model for gene networks and phenotype networks under the influence of SNPs in less than two hours.

In Partial Fulfillment of the Speaking Requirement

Computing with Univalence in Cartesian Cubical Type Theory

Monday, April 23^rd, 2018 from 12-1 pm in GHC 6501.

Presented by Carlo Angiuli, CSD

Dependent type theories, such as those underlying the proof assistants Coq, Agda, and Nuprl, are simultaneously theories of constructive mathematics and of computer programming: a proof of a proposition is at the same time a program that executes according to a specification. In the past decade, mathematicians have uncovered a surprising interpretation of types as abstract topological spaces, leading to extensions of type theory known as higher inductive types and the univalence axiom. Unfortunately, as originally formulated, these extensions disrupt the computational nature of proofs.

We will describe Cartesian cubical type theory, a novel dependent type theory that supports higher inductive types and univalence while restoring the connection between proofs and programs. We will also demonstrate how univalence can simplify ordinary type-theoretic reasoning in the RedPRL proof assistant currently under development at CMU.

Joint work with Kuen-Bang Hou (Favonia) and Robert Harper

In Partial Fulfillment of the Speaking Requirement

Light-Weight In-Situ Indexing For Scientific Workloads

Friday, April 27^th, 2018 from 12-1 pm in GHC 6501.

Presented by Qing Zheng, CSD

In this talk Qing introduces the Indexed Massive Directory, a new technique for indexing data within DeltaFS. DeltaFS is a software-defined user-space file system optimized for HPC systems and workloads. The Indexed Massive Directory is a novel extension to the DeltaFS data plane, enabling in-situ indexing of massive amounts of data written to a single directory simultaneously, and in an arbitrarily large number of files. We achieve this through a memory-efficient I/O pipeline to reorder data, and a log-structured storage layout to pack small writes into large log objects, all while ensuring compute node resources are used frugally. We demonstrate the efficiency of this indexing mechanism through VPIC, a widely-used simulation code that scales to trillions of particles. With DeltaFS, we modify VPIC to create a file for each particle to receive writes of that particles output data. Dynamically indexing the directorys underlying storage allows us to achieve a 5000x speedup in VPIC's single particle trajectory queries, which require reading all data for a single particle. This speedup increases with application scale while the overhead is approximately 15% of increased I/O time.

In Partial Fulfillment of the Speaking Requirement

[CANCELED] Learning Mixtures of Multi-Output Regression Models By Correlation Clustering for Multi-View Data

Monday, April 30^th, 2018 from 12-1 pm in GHC 6501.

Presented by Eric Lei, MLD

Multi-view data are an increasingly prevalent type of dataset that allows exploitation of relationships between sets of variables. It is often interesting to analyze the correlation between two views via multi-view component analysis techniques such as Canonical Correlation Analysis (CCA). However, different parts of the data may have their own patterns of correlation, which CCA cannot reveal. To address this challenge, we propose a method called Canonical Least Squares (CLS) clustering. Somewhat like CCA, a single CLS model can be regarded as a multi-output regression model that finds latent variables to connect inputs and outputs. This method, however, also identifies partitions of data that enhance correlations in each partition, which may be useful when different correlation structures appear in different subsets of the data or when nonlinear correlations may be present. Furthermore, we introduce a supervised classification method that relies on CLS clustering. The value of these methods rests in their capability to find interpretable structure in the data to explain their predictions. We demonstrate the potential utility of the proposed approach using an example application in clinical informatics to detect and characterize slow bleeding in patients whose vital signs are monitored at the bedside. We empirically show how the proposed method can help discover and analyze multiple-to-multiple correlations, which could be nonlinear or vary throughout the population, while retaining interpretability of the resulting models.

In Partial Fulfillment of the Speaking Requirement

Convolutional Neural Network Models of V1 Responses to Complex Patterns

Friday, May 4^th, 2018 from 12-1 pm in GHC 6501.

Presented by Yimeng Zhang, CSD

In this study, we evaluated the convolutional neural network (CNN) method for modeling V1 neurons of awake macaque monkeys in response to a large set of complex pattern stimuli. CNN models outperformed all the other baseline models, such as Gabor-based standard models for V1 cells and various variants of generalized linear models. We then systematically dissected different components of the CNN and found two key factors that made CNNs outperform other models: thresholding nonlinearity and convolution. In addition, we fitted our data using a pre-trained deep CNN via transfer learning. The deep CNN's higher layers, which encode more complex patterns, outperformed lower ones, and this result was consistent with our earlier work on the complexity of V1 neural code. Our study systematically evaluates the relative merits of different CNN components in the context of V1 neuron modeling.

In Partial Fulfillment of the Speaking Requirement

Multiple Source Domain Adaptation with Adversarial Learning

Monday, May 7^th, 2018 from 12-1 pm in GHC 6501.

Presented by Han Zhao, MLD

While domain adaptation has been actively researched, most theoretical results and algorithms focus on the single-source-single-target adaptation setting. In the first part of the talk, I will discuss new generalization bounds for domain adaptation when there are multiple source domains with labeled instances and one target domain with unlabeled instances. The theory also leads to an efficient learning strategy using adversarial neural networks: I will show how to interpret it as learning feature representations that are invariant to the multiple domain shifts while still being discriminative for the learning task.

In the second part, I will discuss two models for multisource domain adaptations: the first model optimizes the worst-case bound, while the second model is a smoothed approximation of the first one and optimizes a task-adaptive bound. We also demonstrate the effectiveness of both models by conducting extensive experiments showing superior adaptation performance on a real-world application: vehicle counting.

Joint work with Shanghang Zhang, Guanhang Wu, Joao Costeira, Jose Moura and Geoff Gordon

In Partial Fulfillment of the Speaking Requirement

Synchronization Strings: Communication in the Presence of Insertions and Deletions

Friday, May 11^th, 2018 from 12-1 pm in GHC 6501.

Presented by Amirbehshad Shahrasbi, CSD

This talk will give an introduction to synchronization strings which provide a novel way of efficiently reducing synchronization errors, such as insertions and deletions, to much more benign and better understood Hamming errors, i.e., symbol erasures and substitutions. Synchronization strings have many applications. The talk will focus on using synchronization strings as a new way to generate efficient error correcting block codes for insertions and deletions. In particular, codes that, for any 0 < eps, delta < 1, achieve a rate of 1 - delta - eps while being able to efficiently decode from a delta fraction of insertions and deletions.

Further applications of synchronization strings will also be briefly discussed including a general method of simulating symbol corruption channels over any given insertion-deletion channel, an efficient and near-optimal coding scheme for interactive communication over insertion-deletion channels, and list-decodable insertion-deletion codes.

This talk is based on a joint work with Bernhard Haeupler.

In Partial Fulfillment of the Speaking Requirement

Web contact: sss+www@cs