Gates Hillman Center photo

SCS Honors Senior Thesis 2023 (Pittsburgh)

Senior thesis presentations will take place on Wednesday, May 3.
GHC 4405
(Zoom available to CMU attendees who are not in person)
All times are Eastern Daylight Time.

  10:30AM     Gavin Zhu     WORK IN PROGRESS: Incorporating Instructive Feedback in Human-AI Interaction   
  10:40AM     Kai Franz     Batched Database User-Defined Functions   
  11:00AM     Rae Ying Yee Wong     A Distributed Inference System for Deep Neural Networks   
  11:20AM     Jeff Tan     Distilling Neural Fields for Real-Time Articulated Shape Reconstruction   
  11:40AM     Erica Chiang     Characterizing the Composition of Social Media Narratives   
  12:00PM     POSTER SESSION     [GHC 4th Floor]   
  2:00PM     Naomi Spargo     A Store-Passing Translation for General References   
  2:20PM     Prashanti Anderson     Towards Partial Clustering of Zero-Mean Gaussian Mixtures   
  2:40PM     Jhih-Yi Hsieh     On Trade-Offs Between Fairness, Robustness, and Privacy Through Tilted Losses   
  3:00PM     Rohan Pandey     Semantic Composition in Visually Grounded Language Models   
  3:20PM     Konwoo Kim     Learning Shared Safety Constraints from Multi-task Demonstrations   

  

ABSTRACTS (alphabetical order by presenter)

Prashanti Anderson, paanders
Research Advisor(s): Pravesh Kothari
Towards Partial Clustering of Zero-Mean Gaussian Mixtures
Learning mixtures of Gaussians is one of the core problems of modern statistics. Recent advancements at the intersection of modern statistics and theoretical computer science give algorithms that can learn d-dimensional mixtures of k-Gaussians with dO(k) samples and time. Such sample complexity is necessary in the general case, as there exist families of mixtures that require dO(k) samples. However, this family of hard mixtures is the only known hard family, and many natural mixtures do not fall under this family. One such category of instances is zero-mean mixtures, for which we do not know any prohibitive lower bound. We provide a (d)f(k) algorithm for partial clustering of zero-mean mixtures when the mixture exhibits “max-vs-rest” spectral separation, which is a special case of general separability assumptions. 1

Erica Chiang, eschiang
Research Advisor(s): Kathleen Carley
Characterizing the composition of social media narratives
The internet has made it easier for entities to distribute information to large numbers of people, in a way that would not have been possible with physical media. This shift has introduced an opportunity for both state and non-state actors to attack online networks and actively manipulate the beliefs and ideas that are spread within communities, threatening negative consequences such as dividing the society, culture, and values of nations or organizations. In this project, I aim to address an open problem regarding the lack of concrete knowledge about the behavior of specific categories of actors (bots, new agencies, etc) that send messages or launch attacks in online networks. The project looks to explore whether there are significant differences in the types of messaging that different actors tend to use, and whether this contributes to their influence over online communities. A stronger ability to characterize actors in a social network and the types of messaging that they use will provide insight on the strengths and general behaviors of each type of actor, better equipping the research community to explore techniques for mitigating the spread of disinformation.

Kai Franz, kfranz
Research Advisor(s): Samuel Arch, Todd Mowry, Andy Pavlo
Batched Database User-Defined Functions
Database management systems (DBMSs) are typically queried in the Structured Query Language (SQL). SQL's declarative nature allows the DBMS to utilize a suite of optimization techniques to efficiently execute a given query. While SQL is effective for fast query execution, it requires users to write code in terms of relations (e.g. joins, projections, and aggregates), making it unwieldy for implementing procedural algorithms. To address this shortcoming, DBMSs allow users to write functions using an extension to SQL called procedural SQL (PL/SQL), which allows users to mix SQL with statement-by-statement imperative execution, branching, and loop constructs. The convenience of PL/SQL results in billions of daily uses; however, queries using PL/SQL experience significant slowdowns due to (1) the overhead of repeatedly context switching between SQL and PL/SQL interpreters, and (2) the DBMS's inefficiency in optimizing a mix of procedural and relational logic. To remedy this problem, we propose a method for batching procedural SQL functions. This simple enhancement both reduces the number of context switches and allows the database to effectively optimize procedural queries.

Jhih-Yi Hsieh, jhihyih
Research Advisor(s): Virginia Smith, Tian Li
On Trade-Offs Between Fairness, Robustness, and Privacy Through Tilted Losses
Fairness, robustness, and privacy are topics of concern for a wide range of applications in machine learning (ML). While prior works have focused on one or two of these aspects, the trade-offs between all three tight-knit aspects are underexplored. In this thesis, we investigate the connections between three metrics—fairness in terms of representation disparity, robustness to malicious training samples, and differential privacy, under a unified framework based on exponential tilting. More specifically, we propose a private training algorithm to optimize tilted losses proposed in prior literature, that characterizes robustness/fairness trade-offs. On a set of convex and non-convex models, our empirical results suggest that differential privacy is at odds with the benefits of tilting (i.e., promoting fairness or robustness). We also demonstrate that there is a trade-off between the effectiveness of tilting and the cost of privacy noise.

Konwoo Kim, konwook
Research Advisor(s): Steven Wu, Gokul Swamy
Learning Shared Safety Constraints from Multi-task Demonstrations
As robots move from the lab to the real world, it becomes increasingly important for us to ensure they behave safely. Unfortunately, manually specifying what safe behavior is to a robot is often a time-consuming and error-prone process. In this thesis, we take a first step towards automatically learning safety constraints from data. We approach the problem via an extension of inverse reinforcement learning (IRL) techniques. Traditionally in IRL, one is given demonstrations of desired behavior and tries to extract a reward function that would make the demonstrated behavior optimal. Instead, we assume that we are given access to optimal safe behavior and the task reward and try to extract the constraint that the expert was satisfying. Intuitively, we learn constraints that forbid highly rewarding behavior that the expert could have taken but chose not to. For example, we would learn to forbid crashing into cyclists, even though that would lead to us reaching our destination faster. Unfortunately, the constraint learning problem is rather ill-posed and prior work often learns overly conservative constraints that forbid all behavior that the expert did not take, limiting practical applicability. We therefore propose a multi-task extension of inverse constraint learning that is able to leverage diverse data to learn more reasonable constraints. We validate our method with simulation experiments on high-dimensional continuous control tasks.

Rohan Pandey, rspandey
Research Advisor(s): Louis-Philippe Morency
Semantic Composition in Visually Grounded Language Models
What is sentence meaning and its ideal representation? Much of the expressive power of human language derives from semantic composition, the mind’s ability to represent meaning relationally & hierarchically over constituents. At the same time, much sentential meaning is outside the text and requires grounding in sensory, motor, and experiential modalities to be adequately learned. Although large language models display considerable compositional ability, recent work shows that visually-grounded language models drasti- cally fail to represent compositional structure. In this thesis, we explore whether & how models compose visually grounded semantics, and how we might improve their ability to do so.
First, we propose WinogroundVQA to test the ability of generative vision-language foundation models to capture compositional distinctions, building on previous work that only tested image-text matching models.
Next, we propose methods for understanding how models perform semantic composi- tion. Syntactic Neural Module Distillation enables us to test whether a sentence’s syntax tree is a strong causal model of its embedding’s composition. We build a simple visualiza- tion tool to observe if a transformer’s attention activation flow, seen as its causal structure, aligns with a sentence’s syntax, a symbolic model of its composition. We then apply this intuition more rigorously to adapt a mechanistic interpretability approach, causal tracing, to locate neural representations important for composition in image captioning models.
Having better understood composition in visually grounded models, we now explore approaches for improving their ability. To inject a compositional inductive bias into uni- modal embeddings, we propose the Syntactic MeanPool which improves CLIP’s image-text match score on Winoground. We propose CACR, a self-supervised objective that encour- ages models to represent unimodal relations in a way that better enables cross-modal relation alignment, improving on the current Winoground state-of-the-art. We also apply a multimodal information theory framework by developing a training objective that allows vision & language representations to compose more synergistically.
In closing, we draw connections to the cognitive sciences. We explore how some of the behaviors we observe in models may have parallels in the brain since multimodal semantic representations are localized in the left anterior temporal lobe. We discuss vision-language compositional ability from a psycholinguistic perspective by testing a time-constrained ver- sion of Winoground on humans. We finally conjecture a multimodal semantics framework to formalize compositional behavior in vision-language embedding space that draws inspi- ration from homotopy type theory. We conclude with a discussion of future work needed in both deep learning and cognitive science to fully understand how the structure of language is projected onto the world, and the world is reflected in language.

Naomi Spargo, nspargo
Research Advisor(s): Karl Crary
A Store-Passing Translation for General References
Many programming languages allow the manipulation of a global, dynamically allocated memory store. Although individual memory locations may be typed after they are allocated, the entire store is untyped, meaning that programmers have no compile-time guarantees about its use. This manifests practically in the programmer’s struggle to find and debug memory errors at runtime. The theoretical challenge in typing the store results from the store’s inherently circular nature. To see this circularity, try to type an index l into the store. Depending on the current type of other memory locations, l could have type ref int, ref bool, etc. Put another way, the elements of type ref int are different in different store configurations. Types must therefore be parameterized by “the current type of locations in memory”, or, simplistically, a map from naturals (indexes into the store) to types. We then have the recursive equation type = (nat → type) → type . Most type theories do not have a type which satisfies equations of this form, and so most type theories cannot completely represent the interplay between reference types and the memory store. Dr. Crary has designed a type theory, called Istari, which he intended to be powerful enough to type the store. For my thesis, I have verified in Coq that this is so. Dr. Crary has recently finished a theorem prover built on top of Istari; by the end of my thesis, I aim to prove my metatheoretic result about Istari in Istari itself.

Jeff Tan, jefftan
Research Advisor(s): Gengshan Yang, Deva Ramanan
Distilling Neural Fields for Real-Time Articulated Shape Reconstruction
We present a method for reconstructing articulated 3D models from videos in real-time, without test-time optimiza- tion or manual 3D supervision at training time. Prior work often relies on pre-built deformable models (e.g. SMAL/SMPL), or slow per-scene optimization through dif- ferentiable rendering (e.g. dynamic NeRFs). Such methods fail to support arbitrary object categories, or are unsuit- able for real-time applications. To address the challenge of collecting large-scale 3D training data for arbitrary de- formable object categories, our key insight is to use off- the-shelf video-based dynamic NeRFs as 3D supervision to train a fast feed-forward network, turning 3D shape and motion prediction into a supervised distillation task. Our temporal-aware network uses articulated bones and blend skinning to represent arbitrary deformations, and is self- supervised on video datasets without requiring 3D shapes or viewpoints as input. Through distillation, our network learns to 3D-reconstruct unseen articulated objects at in- teractive frame rates. Our method yields higher-fidelity 3D reconstructions than prior real-time methods for animals, with the ability to render realistic images at novel view- points and poses.

Rae Ying Yee Wong, yingyeew
Research Advisor(s): Zhihao Jia
A Distributed Inference System for Deep Neural Networks
This research explores distributed inference systems for deep neural networks to minimize latency without sacrificing accuracy. One application is in dynamic DNNs where load imbalance may be a significant problem. For instance, the Mixture-of-Experts model, which has become increasingly popular for the scaling of parameter count, suffers from load imbalance and excessive all-to-all communications. We thus built an asynchronous inference system with better task allocation. Another strategy explored is writing efficient kernels with explicit caching. Finally, we investigated speculative inference where a smaller model generates possible outputs for the large model to verify. This reduces latency as verification of possible outputs has more parallelism to exploit than computation from scratch. Our system is benchmarked using popular language models such as the LLaMa model released this year.

Feiyu Gavin Zhu, feiyuz
Research Advisor: Reid Simmons
WORK IN PROGRESS: Incorporating Instructive Feedback in Human-AI Interaction
Human-in-the-loop learning is commonly used in human-AI interaction for exchanging the hidden mental state between the human partner and the AI agent. However, existing HiLL agents can only make use of limited types of feedback from humans, requiring multiple interactions with the human partner to understand seemingly simple concepts. In this study, we aim to introduce a new interaction mechanism for receiving and incorporating general instructions from humans. Doing so the human can provide the agent with more generalized feedback, which is hypothesized to accelerate learning. To de- velop AI agents with this capability, we plan to draw insights from cognitive architecture research and develop a production system framework. This framework consists of a set of production rules that can be matched and applied, and a memory system resembling human memory. We plan to evaluate this framework by comparing it to existing methods through a user study in a simulated highway environ- ment. We expect an agent using our framework can learn to drive safely with fewer interactions than the existing baseline methods, and the human partner will perceive it as a more intelligent learner.