go back to the main page

SSS Abstracts
Spring 2020

go back to the main page

Gotta Have HeART: Improving Storage Efficiency by Exploiting Disk-Reliability Heterogeneity

Monday, January 27^th, 2020 from 12-1 pm in GHC 6501.

Presented by Saurabh Kadekodi, CSD

Large-scale cluster storage systems typically consist of a heterogeneous mix of storage devices with significantly varying failure rates. Despite such differences among devices, redundancy settings are generally configured in a one-scheme-for-all fashion. In this paper, we make a case for exploiting reliability heterogeneity to tailor redundancy settings to different device groups. We present HeART, an online tuning tool that guides selection of, and transitions between redundancy settings for long-term data reliability, based on observed reliability properties of each disk group. By processing disk failure data over time, HeART identifies the boundaries and steady-state failure rate for each deployed disk group (e.g., by make/model). Using this information, HeART suggests the most space-efficient redundancy option allowed that will achieve the specified target data reliability with much fewer disks than one-scheme-for-all approaches.

Time-series Insights into the Process of Passing or Failing Online University Courses using Neural-Induced Interpretable Student States

Monday, February 17^th, 2020 from 12-1 pm in GHC 6501.

Presented by Byungsoo Jeon, CSD

This work addresses a key challenge in Educational Data Mining, namely to model student behavioral trajectories in order to provide a means for identifying students most at-risk, with the goal of providing supportive interventions. While many forms of data including clickstream data or data from sensors have been used extensively in time series models for such purposes, in this paper we explore the use of textual data, which is sometimes available in the records of students at large, online universities. We propose a time series model that constructs an evolving student state representation using both clickstream data and a signal extracted from the textual notes recorded by human mentors assigned to each student. We explore how the addition of this textual data improves both the predictive power of student states for the purpose of identifying students at risk for course failure as well as for providing interpretable insights about student course engagement processes.

Progressive Compressed Records: Taking a Byte out of Deep Learning Data

Friday, March 20^th, 2020 from 12-1 pm in NSH 3002.

Presented by Michael Kuchnik, CSD

Deep learning training accesses vast amounts of data at high velocity, posing bandwidth challenges for datasets retrieved over commodity networks and storage devices. A common approach to reduce bandwidth involves resizing or compressing data prior to training. We introduce a way to dynamically reduce the overhead of fetching and transporting data with a method we term Progressive Compressed Records (PCRs). PCRs deviate from previous storage formats by combining progressive compression with an efficient on-disk layout to view a single dataset at multiple fidelities -- all without adding to the total dataset size. We show that the amount of compression a dataset can tolerate depends on the training task at hand. We then show that PCRs can enable tasks to readily access appropriate levels of compression at runtime -- resulting in a 2x speedup in training time on average.

REMOTE ATTENDANCE - https://cmu.zoom.us/meeting/register/u5YkduyqrjoqhzwavM0Ew-wPliQKFf5f7Q?_ga=2.218198745.1717780709.1584293354-9221270.1568995195

Web contact: sss+www@cs