SCS Student Seminar Series

SCS
Student
Seminar
Series

abstracts

previous talks

scs seminars

SCS

CMU

All SSS talks can be carried out either in person or through Zoom.

Spring 2022 Schedule

2022 Spring Feb 18, 25, 28 are unavailble

Mon, Jan 10 GHC 6501 available

Fri, Jan 14 GHC 6501 available

Mon, Jan 17 GHC 6501 available

Fri, Jan 21 Remote
DriftSurf: Stable-State / Reactive-State Learning under Concept Drift
Presented by Ellango Jothimurugesan

https://cmu.zoom.us/j/93961739346?pwd=amJjWFppbjJ0dVRGMWp1enZpajFiZz09
When learning from streaming data, a change in the data distribution, also known as concept drift, can render a previously-learned model inaccurate and require training a new model. We present an adaptive learning algorithm that extends previous drift-detection-based methods by incorporating drift detection into a broader stable-state/reactive-state process. The advantage of our approach is that we can use aggressive drift detection in the stable state to achieve a high detection rate, but mitigate the false positive rate of standalone drift detection via a reactive state that reacts quickly to true drifts while eliminating most false positives. Our theoretical analysis shows that the risk of the algorithm is competitive to an algorithm with oracle knowledge of when (abrupt) drifts occur. Experiments on synthetic and real datasets with concept drifts confirm our theoretical analysis.
This talk is in partial fulfillment of the speaking requirement.

Fri, Jan 24 Remote
Training Provably Robust Neural Networks
Presented by Klas Leino

https://cmu.zoom.us/j/93404279609?pwd=a0xROWt0ZzJ2UG5wOHcxOGR2M3BiQT09
Deep networks have extensively been shown to be vulnerable to maliciously-perturbed inputs, termed "adversarial examples," through which an attacker can cause the network to make arbitrary mistakes. This raises concerns for neural networks deployed in the wild, especially in safety-critical settings, e.g., in autonomous vehicles. In turn, this has motivated a volume of work on practical defenses; among these, this talk will focus primarily on rigorous defenses that provide provable guarantees of a property termed "local robustness," which precludes an important class of adversarial examples. Specifically, we will cover an elegant and effective defense that modifies the architecture of a neural network to naturally provide provable guarantees of local robustness without introducing overhead to the network's inferences. Finally, we will discuss some of the limitations of local robustness, demonstrating that in some contexts it is too stringent (we propose some natural relaxations), while in other ways it is insufficient to capture certain realistic forms of attacks.

Fri, Jan 28 Remote
Considering Process in Algorithmic Bias Detection and Mitigation
Presented by Emily Black

https://cmu.zoom.us/j/94694048366?pwd=Zjl2VHcxdHFiK0VleDhjY0tIbTVCdz09
Artificial Intelligence (AI) systems now affect important decisions in people's lives, from the news articles they read, to whether or not they receive a loan. While the use of AI may lead to great accuracy and efficiency in the making of important decisions, recent news and research reports have shown that AI models can act unfairly: from exhibiting gender bias in hiring models, to racial bias in recidivism prediction systems. In this talk, I will discuss methods for understanding fairness issues in AI through considering the process by which models arrive at their decisions. This technique contrasts with a large portion of AI fairness literature, which focuses on studying model outcomes alone. Specifically, I will show how considering a models end-to-end decision process allows us to expand our understanding of unfair behavior---such as in my work demonstrating how model instability can lead to unfairness by having important decisions rely on arbitrary modeling choices (e.g. whether or not a person is granted a loan from a decision-making model may depend on whether some unrelated person happened to be in the training set). Secondly, I will discuss how considering process can help us find bias mitigation techniques which avoid a tradeoff between predictive utility and fairness, with a case study from my collaboration with Stanford RegLab investigating tax auditing practices.
This talk is in partial fulfillment of the speaking requirement.

Mon, Jan 31 GHC 6501 available

Feb 2 Remote
Back to Futures
Presented by Klaas Pruiksma

Concurrent (and parallel) programming offer(s) a way to increase performance without needing continual improvement to single-threaded performance. A wide variety of approaches to concurrent programming exist, but are often dealt with in an ad hoc manner, making reasoning about the behaviour of concurrent programs notoriously difficult. We present a formal explanation of futures, a particular concurrent programming construct, with foundations in logic. The close ties between this language's type system and logic allow for simpler reasoning about programs, and incidentally provide a better understanding of linear futures, which have been proposed in the context of algorithm design and analysis, but never formally explored. Moreover, by adding a construct to enforce sequentiality, we recover a language that looks much like a standard functional language augmented with futures.

Fri, Feb 04 GHC 6501 available

Mon, Feb 07 GHC 6501 available

Fri, Feb 11 GHC 6501 available

Mon, Feb 14 GHC 6501 available

Mon, Feb 21 GHC 6501 available

Fri, Mar 04 GHC 6501 available

Mon, Mar 07 GHC 6501 unavailable

Fri, Mar 11 GHC 6501 available

Mon, Mar 14 GHC 6501 available

Fri, Mar 18 GHC 6501 available

Mon, Mar 21 GHC 6501 available

Mar 22 GHC 6501
Enabling Context-sensitive Bandwidth-Efficient Visual Search on Edge Data
Presented by Shilpa Anna George

The ability to execute ad hoc search queries on freshly-captured visual data at the edge is valuable. A key challenge is a bandwidth-efficient assembly of large training sets for customizing deep neural networks (DNNs) that express the queries. In this talk, I will present the system Hawk, an interactive labeling system in which learning is transparent, recursive, and scalable. Its key insight is that even a weak model, learned on-the-fly, can be used to enrich a live sensor stream by pruning away irrelevant data. Over many rounds, Hawks tight integration of data collection, inferencing, selective transmission, labeling, and training converges to an accurate model.

Fri, Mar 25 GHC 6501 available

Mon, Mar 28 GHC 6501 available

Fri, Apr 01 GHC 6501 available

Mon, Apr 04 GHC 6501 available

Apr 8 GHC 6501
Two Studies on Peer Review: Citation Bias and arXiv Bias
Presented by Ivan Stelmakh

In this talk, we will discuss two studies (involving several thousand papers and reviewers) that scrutinize certain aspects of the peer-review process. In the first study, we investigate whether citing potential reviewers can boost the acceptance chances of a paper. For this, we intervene in the paper-reviewer assignment procedure and carefully analyze the outcomes of the review process. In the second study, we compare the pros and cons of releasing a paper on arXiv before it gets published in a conference or a journal. For this, we use surveys to estimate the expected visibility a paper gets on arXiv and the probability of the reviewer breaking the double-blind reviewing policy.

Apr 11 remote
Multi-point Queries on Concurrent Data Structures
Presented by Yuanhao Wei

https://cmu.zoom.us/j/92977908123?pwd=M2svSzQ0UXZxS1M3aUt2Y0lkM2ZOUT09

Most work on concurrent data structures has focused on supporting single-point operations, such as insert, delete, and lookup, but many applications require supporting these alongside multi-point queries, such as searching for ranges of keys, finding the first key that matches some criteria, or checking if a collection of keys are all present. In this presentation, I'll cover a general technique for adding linearizable multi-point queries to existing concurrent data structures. This technique maintains the time bounds and progress properties (e.g. lock-freedom/wait-freedom) of the original data structure. Furthermore, multi-point queries written with this technique are wait-free and take time proportional to their sequential complexity plus a contention term representing the number of update operations concurrent with the query.

Apr 15 remote
Improved quantum data analysis
Presented by Costin Badescu

Let p denote an unknown probability distribution on the set of integers [d] = {1, 2, ..., d} and let E1, ..., Em denote events, i.e. subsets of [d]. Consider the following problem: How many samples from p are necessary and sufficient to estimate the probabilities of the events p(E1), ..., p(Em) to Îµ accuracy as a function of m, d, and Îµ? The answer is known to be theta(log(m)/e^2) samples. The quantum version of this problem, where the distribution p is replaced by a quantum state and the events are replaced by binary quantum measurements, is known as "shadow tomography." Finding the sample complexity of shadow tomography is an open problem; currently, the best known lower bound matches the classical lower bound of omega(log(m)/e^2) and the best known upper bound is O(log^2(m) log(d)/Îµ^4). In this talk, I will introduce the shadow tomography problem, explain its significance for quantum learning and estimation, and sketch the main ideas behind joint work with Ryan O'Donnell which improved the upper bound on the sample complexity of shadow tomography to O(logÂ²(m) log(d)/Îµ^4). No prior knowledge of quantum theory is needed.

Mon, Apr 18 GHC 6501 available

Fri, Apr 22 GHC 6501 available

Apr 25 remote
Convertible Codes: Efficient Conversion of Coded Data in Distributed Storage
Presented by Francisco Jose Maturana Sanguineti

https://cmu.zoom.us/j/97615026876?pwd=VmViSTBrK2sxeFc0bkNHc0pGL0djQT09
Erasure codes are typically used in large-scale distributed storage systems to provide reliability against failures. In this setting, a set of k data chunks is encoded using an [n, k] code to generate n chunks that are then stored on different nodes. Recent work shows that the failure rates of storage devices vary significantly over time, and that changing the rate of the code (via a change in the parameters n and k) in response to such variations provides significant reduction in storage overhead. However, when using traditional codes, the overhead on CPU, disk IO, and network bandwidth of realizing such a change in code rate is prohibitively high. Motivated by this application, we present a new framework to formalize the notion of code conversion, i.e., the process of converting data encoded with an [n^I, k^I] code into data encoded with an [n^F, k^F] code while maintaining desired decodability properties. As part of this framework, we also introduce convertible codes, a new class of code pairs that allow for code conversions in a resource-efficient manner.

Apr 29 GHC 6501
Optimal Resource Allocation for Parallelizable Jobs
Presented by Ben Berg

Modern computer systems allow resources to be dynamically allocated to parallelizable jobs. When a job is parallelized across many servers or cores, it will complete more quickly. However, jobs typically receive diminishing returns from being allocated additional resources. Hence, given a fixed number of cores, it is not obvious how to dynamically allocate cores to a stream of incoming jobs in order to minimize the overall mean response time across jobs. For example, an optimal allocation policy must favor shorter jobs, but favoring any single job too heavily can cause the system to operate very inefficiently. Additionally, an optimal policy must decide how much, if at all, to favor more parallelizable jobs over less parallelizable jobs. In a variety of settings, we show how to derive an optimal allocation policy which minimizes mean response time. We then show that policies inspired by our theoretical results can be implemented in a modern database to reduce mean response time by a factor of 2.

May 2 GHc 6501
Learning-based Streaming Codes are Approximately Optimal for Variable-Size Frames
Presented by Michael Rudow
Providing a high quality-of-service for live communication is a pervasive challenge plagued by bursts of packet losses during transmission. Streaming codes are designed explicitly to recover lost packets in real-time for such low-latency streaming communication settings using minimal bandwidth. Applications such as live video streaming involve transmitting messages whose sizes vary over time due to compressing frames prior to transmission. The variability demands extra redundancy because consecutive large frames can be lost in a burst, leaving less bandwidth remaining for data. Mitigating the adverse effects of variability necessitates spreading the messages over multiple future packets. However, the optimal strategy for spreading depends on the sizes of future messages, which are not available due to streaming codes operating in an â€œonlineâ€ setting. Algebraic coding techniques alone are hence insufficient for designing optimal streaming codes. We present a learning-based approach that combines machine learning with algebraic coding techniques to design the first approximately optimal streaming codes for a range of parameter regimes suitable for practical applications.

May 6 GHC 6501 available

Mon, May 09 GHC 6501 available

Fri, May 13 GHC 6501 available

Mon, May 16 GHC 6501 available

Fri, May 20 GHC 6501 available

Mon, Aug 29 GHC 6501 available

Fri, Sep 02 GHC 6501 available

Mon, Sep 05 GHC 6501 available

Fri, Sep 09 GHC 6501 available

Mon, Sep 12 GHC 6501 available

Fri, Sep 16 GHC 6501 available

Mon, Sep 19 GHC 6501 available

Fri, Sep 23 GHC 6501 available

Mon, Sep 26 GHC 6501 available

Fri, Sep 30 GHC 6501 available

Mon, Oct 03 GHC 6501 available

Fri, Oct 07 GHC 6501 available

Mon, Oct 10 GHC 6501 available

Fri, Oct 14 GHC 6501 available

Mon, Oct 17 GHC 6501 available

Fri, Oct 21 GHC 6501 Not available

Mon, Oct 24 GHC 6501 available

Fri, Oct 28 GHC 6501 available

Mon, Oct 31 GHC 6501 available

Fri, Nov 04 GHC 6501 available

Mon, Nov 07 GHC 6501 available

Fri, Nov 11 GHC 6501 available

Mon, Nov 14 GHC 6501 available

Fri, Nov 18 GHC 6501 available

Mon, Nov 21 GHC 6501 available

Fri, Nov 25 GHC 6501 available

Mon, Nov 28 GHC 6501 available

Fri, Dec 02 GHC 6501 available

Mon, Dec 05 GHC 6501 available

Fri, Dec 09 GHC 6501 available

Mon, Dec 12 GHC 6501 available

Fri, Dec 16 GHC 6501 available

Mon, Dec 19 GHC 6501 available

Fri, Dec 23 GHC 6501 available

General Info

The Student Seminar Series is an informal seminar for SCS graduate students to communicate ideas. Each meeting starts at noon and lasts 1 hour. Lunch is provided by the Computer Science Department (thanks to Debbie Cavlovich!). At each meeting, a different student speaker will give an informal, 40-minute talk about his/her research, followed by questions/suggestions/brainstorming. We try to attract people with a diverse set of interests, and encourage speakers to present at a very general, accessible level.
So why are we doing this and why take part? In the best case scenario, this will lead to some interesting cross-disciplinary work among people in different fields and people may get some new ideas about their research. In the worst case scenario, a few people will practice their public speaking and the rest get together for a free lunch.

Previous years

2021Fall, 2021Spring, 2020Fall, 2020Spring, 2019Fall, 2019Spring, 2018Fall, 2018Spring, 2017Fall, 2017Spring, 2016Fall, 2016Spring, 2015Fall, 2015Spring, 2014Fall, 2014Spring, 2013Fall, 2013Spring, 2012Fall, 2012Spring, 2011Fall, 2011Spring, 2010Fall, 2010Spring, 2009Fall, 2009Spring, 2008Fall, 2008Spring, 2007Fall, 2007Spring, 2006Fall, 2006Spring, 2005Fall, 2005Spring, 2004Fall, 2004Spring, 2003Fall, 2003Spring, 2002Fall, 2002Spring, 2001Fall, 2001Spring, 2000Fall, 2000Spring, 1999Fall, 1999Spring, 1998Fall, 1998Spring,

SSS Coordinators

Juncheng Yang, CSD

Web contact: sss+www@cs