Senior thesis presentations will take place on
Wednesday, May 3.
GHC 4405
(Zoom available to CMU attendees who are not in person)
All times are Eastern Daylight Time.
10:30AM | Gavin Zhu | WORK IN PROGRESS: Incorporating Instructive Feedback in Human-AI Interaction |
10:40AM | Kai Franz | Batched Database User-Defined Functions |
11:00AM | Rae Ying Yee Wong | A Distributed Inference System for Deep Neural Networks |
11:20AM | Jeff Tan | Distilling Neural Fields for Real-Time Articulated Shape Reconstruction |
11:40AM | Erica Chiang | Characterizing the Composition of Social Media Narratives |
12:00PM | POSTER SESSION | [GHC 4th Floor] |
2:00PM | Naomi Spargo | A Store-Passing Translation for General References |
2:20PM | Prashanti Anderson | Towards Partial Clustering of Zero-Mean Gaussian Mixtures |
2:40PM | Jhih-Yi Hsieh | On Trade-Offs Between Fairness, Robustness, and Privacy Through Tilted Losses |
3:00PM | Rohan Pandey | Semantic Composition in Visually Grounded Language Models |
3:20PM | Konwoo Kim | Learning Shared Safety Constraints from Multi-task Demonstrations |
ABSTRACTS (alphabetical order by presenter)
Prashanti Anderson, paanders
Research Advisor(s): Pravesh Kothari
Towards Partial Clustering of Zero-Mean Gaussian Mixtures
Learning mixtures of Gaussians is one of the core problems of modern statistics. Recent advancements at the intersection of modern statistics and theoretical computer science give algorithms that can learn d-dimensional mixtures of k-Gaussians with dO(k) samples and time. Such sample complexity is necessary in the general case, as there exist families of mixtures that require dO(k) samples.
However, this family of hard mixtures is the only known hard family, and many
natural mixtures do not fall under this family. One such category of instances is
zero-mean mixtures, for which we do not know any prohibitive lower bound. We
provide a (d)f(k) algorithm for partial clustering of zero-mean mixtures when
the mixture exhibits “max-vs-rest” spectral separation, which is a special case
of general separability assumptions.
1
Erica Chiang, eschiang
Research Advisor(s): Kathleen Carley
Characterizing the composition of social media narratives
The internet has made it easier for entities to distribute information to large numbers
of people, in a way that would not have been possible with physical media. This shift
has introduced an opportunity for both state and non-state actors to attack online
networks and actively manipulate the beliefs and ideas that are spread within
communities, threatening negative consequences such as dividing the society, culture,
and values of nations or organizations. In this project, I aim to address an open
problem regarding the lack of concrete knowledge about the behavior of specific
categories of actors (bots, new agencies, etc) that send messages or launch attacks in
online networks. The project looks to explore whether there are significant differences
in the
types of messaging that different actors tend to use, and whether this
contributes to their influence over online communities. A stronger ability to
characterize actors in a social network and the types of messaging that they use will
provide insight on the strengths and general behaviors of each type of actor, better
equipping the research community to explore techniques for mitigating the spread of
disinformation.
Kai Franz, kfranz
Research Advisor(s): Samuel Arch, Todd Mowry, Andy Pavlo
Batched Database User-Defined Functions
Database management systems (DBMSs) are typically queried in the Structured Query
Language (SQL). SQL's declarative nature allows the DBMS to utilize a suite of optimization
techniques to efficiently execute a given query. While SQL is effective for fast query execution,
it requires users to write code in terms of relations (e.g. joins, projections, and aggregates),
making it unwieldy for implementing procedural algorithms. To address this shortcoming,
DBMSs allow users to write functions using an extension to SQL called procedural SQL
(PL/SQL), which allows users to mix SQL with statement-by-statement imperative execution,
branching, and loop constructs. The convenience of PL/SQL results in billions of daily uses;
however, queries using PL/SQL experience significant slowdowns due to (1) the overhead of
repeatedly context switching between SQL and PL/SQL interpreters, and (2) the DBMS's
inefficiency in optimizing a mix of procedural and relational logic. To remedy this problem, we
propose a method for batching procedural SQL functions. This simple enhancement both
reduces the number of context switches and allows the database to effectively optimize
procedural queries.
Jhih-Yi Hsieh, jhihyih
Research Advisor(s): Virginia Smith, Tian Li
On Trade-Offs Between Fairness, Robustness, and Privacy
Through Tilted Losses
Fairness, robustness, and privacy are topics of concern for a wide range of applications in machine
learning (ML). While prior works have focused on one or two of these aspects, the trade-offs between
all three tight-knit aspects are underexplored. In this thesis, we investigate the connections between
three metrics—fairness in terms of representation disparity, robustness to malicious training samples,
and differential privacy, under a unified framework based on exponential tilting. More specifically, we
propose a private training algorithm to optimize tilted losses proposed in prior literature, that characterizes
robustness/fairness trade-offs. On a set of convex and non-convex models, our empirical results suggest
that differential privacy is at odds with the benefits of tilting (i.e., promoting fairness or robustness). We
also demonstrate that there is a trade-off between the effectiveness of tilting and the cost of privacy noise.
Konwoo Kim, konwook
Research Advisor(s): Steven Wu, Gokul Swamy
Learning Shared Safety Constraints from Multi-task Demonstrations
As robots move from the lab to the real world, it becomes increasingly important for
us to ensure they behave safely. Unfortunately, manually specifying what safe behavior is to a
robot is often a time-consuming and error-prone process. In this thesis, we take a first step
towards automatically learning safety constraints from data. We approach the problem via an
extension of inverse reinforcement learning (IRL) techniques. Traditionally in IRL, one is given
demonstrations of desired behavior and tries to extract a reward function that would make the
demonstrated behavior optimal. Instead, we assume that we are given access to optimal safe
behavior and the task reward and try to extract the constraint that the expert was satisfying.
Intuitively, we learn constraints that forbid highly rewarding behavior that the expert could have
taken but chose not to. For example, we would learn to forbid crashing into cyclists, even though
that would lead to us reaching our destination faster. Unfortunately, the constraint learning
problem is rather ill-posed and prior work often learns overly conservative constraints that forbid
all behavior that the expert did not take, limiting practical applicability. We therefore propose a
multi-task extension of inverse constraint learning that is able to leverage diverse data to learn
more reasonable constraints. We validate our method with simulation experiments on
high-dimensional continuous control tasks.
Rohan Pandey, rspandey
Research Advisor(s): Louis-Philippe Morency
Semantic Composition in Visually Grounded Language Models
What is sentence meaning and its ideal representation? Much of the expressive power
of human language derives from semantic composition, the mind’s ability to represent
meaning relationally & hierarchically over constituents. At the same time, much sentential
meaning is outside the text and requires grounding in sensory, motor, and experiential
modalities to be adequately learned. Although large language models display considerable
compositional ability, recent work shows that visually-grounded language models drasti-
cally fail to represent compositional structure. In this thesis, we explore whether & how
models compose visually grounded semantics, and how we might improve their ability to
do so.
First, we propose WinogroundVQA to test the ability of generative vision-language
foundation models to capture compositional distinctions, building on previous work that
only tested image-text matching models.
Next, we propose methods for understanding how models perform semantic composi-
tion. Syntactic Neural Module Distillation enables us to test whether a sentence’s syntax
tree is a strong causal model of its embedding’s composition. We build a simple visualiza-
tion tool to observe if a transformer’s attention activation flow, seen as its causal structure,
aligns with a sentence’s syntax, a symbolic model of its composition. We then apply this
intuition more rigorously to adapt a mechanistic interpretability approach, causal tracing,
to locate neural representations important for composition in image captioning models.
Having better understood composition in visually grounded models, we now explore
approaches for improving their ability. To inject a compositional inductive bias into uni-
modal embeddings, we propose the Syntactic MeanPool which improves CLIP’s image-text
match score on Winoground. We propose CACR, a self-supervised objective that encour-
ages models to represent unimodal relations in a way that better enables cross-modal
relation alignment, improving on the current Winoground state-of-the-art. We also apply
a multimodal information theory framework by developing a training objective that allows
vision & language representations to compose more synergistically.
In closing, we draw connections to the cognitive sciences. We explore how some of the
behaviors we observe in models may have parallels in the brain since multimodal semantic
representations are localized in the left anterior temporal lobe. We discuss vision-language
compositional ability from a psycholinguistic perspective by testing a time-constrained ver-
sion of Winoground on humans. We finally conjecture a multimodal semantics framework
to formalize compositional behavior in vision-language embedding space that draws inspi-
ration from homotopy type theory. We conclude with a discussion of future work needed in
both deep learning and cognitive science to fully understand how the structure of language
is projected onto the world, and the world is reflected in language.
Naomi Spargo, nspargo
Research Advisor(s): Karl Crary
A Store-Passing Translation for General References
Many programming languages allow the manipulation of a global,
dynamically allocated memory store. Although individual memory
locations may be typed after they are allocated, the
entire store is
untyped, meaning that programmers have no compile-time
guarantees about its use. This manifests practically in the
programmer’s struggle to find and debug memory errors at
runtime. The theoretical challenge in typing the store results from
the store’s inherently circular nature. To see this circularity, try to
type an index l into the store. Depending on the current type of
other memory locations, l could have type ref int, ref bool, etc. Put
another way, the elements of type ref int are
different in different
store configurations. Types must therefore be parameterized by
“the current type of locations in memory”, or, simplistically, a map
from naturals (indexes into the store) to types. We then have the
recursive equation type = (nat → type) → type .
Most type theories
do not have a type which satisfies equations of this form, and so
most type theories cannot completely represent the interplay
between reference types and the memory store. Dr. Crary has
designed a type theory, called Istari, which he intended to be
powerful enough to type the store. For my thesis, I have verified in
Coq that this is so. Dr. Crary has recently finished a theorem
prover built on top of Istari; by the end of my thesis, I aim to prove
my metatheoretic result about Istari in Istari itself.
Jeff Tan, jefftan
Research Advisor(s): Gengshan Yang, Deva Ramanan
Distilling Neural Fields for Real-Time Articulated Shape Reconstruction
We present a method for reconstructing articulated 3D models from videos in
real-time, without test-time optimiza- tion or manual 3D supervision at training time. Prior work
often relies on pre-built deformable models (e.g. SMAL/SMPL), or slow per-scene optimization
through dif- ferentiable rendering (e.g. dynamic NeRFs). Such methods fail to support arbitrary
object categories, or are unsuit- able for real-time applications. To address the challenge of
collecting large-scale 3D training data for arbitrary de- formable object categories, our key
insight is to use off- the-shelf video-based dynamic NeRFs as 3D supervision to train a fast
feed-forward network, turning 3D shape and motion prediction into a supervised distillation task.
Our temporal-aware network uses articulated bones and blend skinning to represent arbitrary
deformations, and is self- supervised on video datasets without requiring 3D shapes or
viewpoints as input. Through distillation, our network learns to 3D-reconstruct unseen
articulated objects at in- teractive frame rates. Our method yields higher-fidelity 3D
reconstructions than prior real-time methods for animals, with the ability to render realistic
images at novel view- points and poses.
Rae Ying Yee Wong, yingyeew
Research Advisor(s): Zhihao Jia
A Distributed Inference System for Deep Neural Networks
This research explores distributed inference systems for deep neural networks to minimize
latency without sacrificing accuracy. One application is in dynamic DNNs where load imbalance
may be a significant problem. For instance, the Mixture-of-Experts model, which has become
increasingly popular for the scaling of parameter count, suffers from load imbalance and
excessive all-to-all communications. We thus built an asynchronous inference system with better
task allocation. Another strategy explored is writing efficient kernels with explicit caching.
Finally, we investigated speculative inference where a smaller model generates possible outputs
for the large model to verify. This reduces latency as verification of possible outputs has more
parallelism to exploit than computation from scratch. Our system is benchmarked using popular
language models such as the LLaMa model released this year.
Feiyu Gavin Zhu, feiyuz
Research Advisor: Reid Simmons
WORK IN PROGRESS: Incorporating Instructive Feedback in Human-AI Interaction
Human-in-the-loop learning is commonly used in human-AI interaction for exchanging the hidden
mental state between the human partner and the AI agent. However, existing HiLL agents can only
make use of limited types of feedback from humans, requiring multiple interactions with the human
partner to understand seemingly simple concepts. In this study, we aim to introduce a new interaction
mechanism for receiving and incorporating general instructions from humans. Doing so the human can
provide the agent with more generalized feedback, which is hypothesized to accelerate learning. To de-
velop AI agents with this capability, we plan to draw insights from cognitive architecture research and
develop a production system framework. This framework consists of a set of production rules that can
be matched and applied, and a memory system resembling human memory. We plan to evaluate this
framework by comparing it to existing methods through a user study in a simulated highway environ-
ment. We expect an agent using our framework can learn to drive safely with fewer interactions than
the existing baseline methods, and the human partner will perceive it as a more intelligent learner.