SSS Abstracts |
Processing Massive Static and Evolving Graphs
Monday, October 28th, 2019 from 12-1 pm in GHC 6501.
Presented by Laxman Dhulipala, CSD
Today, the largest publicly available graph dataset is the Hyperlink Web graph, with over 3.5 billion vertices and 128 billion edges. Despite being publicly available for several years now, most experimental work in the literature reports results on much smaller graphs, or resorts to external or distributed memory to process the Hyperlink graph. A natural question then, is to ask whether we can efficiently solve a broad class of problems on this graph in memory. Given that existing systems capable of processing this graph only support static processing, another interesting question is whether we can efficiently represent and update this graph in memory.
In the first part of this talk I will present theoretically efficient implementations of parallel graph algorithms for a set of 20 important graph problems. Our codes, which we have publicly open-sourced as part of the Graph Based Benchmark Suite (GBBS) solve these problems on the Hyperlink Web graph using a single commodity multicore machine with a terabyte of RAM, with most of the codes running in under a minute on this graph. The running times of our implementations outperform existing state-of-the-art implementations on the largest real-world graphs, and for many of the problems that we consider, this is the first time they have been solved on graphs at this scale.
In the second part of this talk I will present Aspen, a framework for processing streaming graphs, which are graphs that evolve over time via a stream of updates (batches of edge insertions/deletions). The key idea is to represent the adjacency information using a purely-functional compressed tree data structure called a C-tree, which can be efficiently updated. Aspen achieves millisecond latency for processing small batches, and achieves throughputs of hundreds of millions of updates per second for very large batches, while providing strict serializability for both queries and updates. Using Aspen, we are able to concurrently update and run analytics on the Hyperlink Web graph using a single commodity multicore server with a terabyte of RAM.
Based on joint work with Guy Blelloch and Julian Shun.
C2DN: How to Code on the Edge for Content Delivery Network
Monday, November 18th, 2019 from 12-1 pm in GHC 6501.
Presented by Jason Yang, CSD
Content Delivery Networks (CDNs) cache and serve much of the worlds online content to users around the world. CDNs cache content in a replicated fashion to alleviate the impact of individual server unavailabilities which hurts user, CDN and content provider. Erasure codes are a resource-efficient alternative to replication. In this talk, we show how we introduce erasure coding into CDNs and build Coded CDN (C2DN). Compared to state-of-the-art CDN, C2DN reduces midgress (major CDN operation cost) by up to 30%, and provides better user latency. In addition, we show that C2DN only has a slight resource usage increase.
RADIO: A Robust Behavioral Anomaly Detection for IoT Devices in Enterprise Networks
Monday, November 25th, 2019 from 12-1 pm in GHC 6501.
Presented by Tianlong Yu, CSD
IoT devices deployed inside enterprise networks (e.g., routers, storage appliances, cameras) are emerging security threats for enterprises. However, existing enterprise security mechanisms are often found limited to address such threats. For example, existing IDS and IPS systems deployed in enterprise network heavily rely on attack signatures but zero-day vulnerabilities are common for IoT devices. Fortunately, we observe that unlike general-purpose computing devices, the normal behavior of an IoT device is limited (e.g., a camera has zooming-in, video streaming, and audio recording behaviors). Based on this insight, we revisit behavioral anomaly detection at the network layer. Designing such a system is challenging on two fronts. First, we need a behavior model tailored towards enterprise IoT devices to abstract the key characteristics of those device behaviors (e.g., commands or arguments used) from network traffic. Second, in practical enterprise settings, the network traces for learning normal behavior models are unlabeled and potentially polluted. We address these challenges in designing RADIO, a practical and robust behavioral anomaly detection system for enterprise IoT devices. We design a novel learning mechanism that can build benign behavior models (finite-state-machines) for IoT devices, from unlabeled and potentially polluted network traces. We show that our approach achieves low false positives/false negatives and is robust to pollution (false positive rate<1% and false negative rate<0.01% when 15% of the network traffic is polluted).
Parity Models: Erasure-Coded Resilience for Prediction Serving Systems
Friday, December 6th, 2019 from 12-1 pm in GHC 6501.
Presented by Jack Kosaian, CSD
Machine learning models are becoming the primary workhorses for many applications. Services deploy models through prediction serving systems that take in queries and return predictions by performing inference on models. Prediction serving systems are commonly run on many machines in cluster settings, and thus are prone to slowdowns and failures that inflate tail latency. Erasure coding is a popular technique for imparting resource-efficient resilience against data unavailability in storage and communication systems. However, existing approaches for imparting erasure-coded resilience to distributed computation apply only to a severely limited class of functions, precluding their use for many serving workloads, such as neural network inference.
We introduce parity models, a new approach for enabling erasure-coded resilience in prediction serving systems. A parity model is a neural network trained to transform erasure-coded queries into a form that enables a decoder to reconstruct slow or failed predictions. We implement parity models in ParM, a prediction serving system that makes use of erasure-coded resilience. ParM encodes multiple queries into a ``parity query'', performs inference over parity queries using parity models, and decodes approximations of unavailable predictions by using the output of a parity model. We showcase the applicability of parity models to image classification, speech recognition, and object localization tasks. Parity models enable accurate reconstructions of unavailable predictions and reduce tail latency in the presence of resource contention. These results highlight the potential of parity models to open new doors in imparting resource-efficient resilience to prediction serving systems.
A Domain Semantics for Higher-Order Recursive Processes
Monday, December 9th, 2019 from 12-1 pm in GHC 6115.
Presented by Ryan Kavanagh, CSD
The polarized SILL programming language uniformly integrates functional programming and session-based concurrency. It supports recursion, asynchronous and synchronous communication, and higher-order programs that communicate processes. In this talk, I will motivate and present a denotational semantics for a fragment of polarized SILL. I will explain why domain theory is an ideal setting for this semantics. Session types will be interpreted as domains of bidirectional communications. We use polarity and domain theory to split these into pairs of unidirectional communications in a natural way. Processes will be interpreted as continuous functions on domains of unidirectional communications.
Web contact: sss+www@cs