When Legalese Meets Code CMU Research Helps Emerging Tech Companies Navigate the Law

Marylee WilliamsThursday, September 4, 2025Print this page.

SCS faculty member Travis Breaux directs CMU's Requirements Engineering Lab, where he creates tools that bridge the gaps between people in law and tech.

As Congress debated dollars and cents this summer, it also weighed an AI moratorium, which could have prevented states from passing laws regulating artificial intelligence. This moratorium didn't go into effect, but the discussion about regulating technology remains. As states and the federal government attempt to put guardrails around privacy, healthcare and energy usage, tech companies — which can range from two-person startups to thousand-person teams — must comply with laws or face penalties. But if developers and engineers don't understand why or how they need to comply, they risk running afoul of the law.

Travis Breaux, an associate professor in the School of Computer Science's Software and Societal Systems Department (S3D), directs Carnegie Mellon University's Requirements Engineering Lab. His research focuses on creating tools that translate legal text for developers and bridge the gaps between people in law and tech. In a recent paper, Breaux describes a tool that uses Python code to translate legal text and represent it in a form that's accessible to software engineers.

What makes laws so complicated, especially from a technical perspective?
Think of the fact that there's a legal profession behind the authorship of laws that trains for years to craft language that's predictive, prohibitive, enabling and protective. From a computer science perspective, when analyzing legal text, we focus on two main artifacts: ambiguity and exceptions. Ambiguity can be intentional or accidental, such as legislators making vague references to computer data because they didn't know what data they meant to include or exclude — and which the court system has to determine. But in the meantime, companies guess. Exceptions are something like those in the HIPAA privacy rule. There are exceptions to health data privacy when it comes to policing, because if somebody goes to a hospital with a gunshot wound, time is of the essence. Laws have these carve-outs that are complicated to read, but they're intended to allow society to function.

How do these legal complexities affect tech companies when they're trying to develop tools?
It's a difference in world views. The law moves slowly, and once the law is on the books, it'll stay on the books forever until it's changed. Technology moves quickly. This misalignment means laws often don't reflect current technological realities. Tech companies must interpret outdated laws and apply them to modern systems, which is incredibly challenging.

Can you share a real-world example of how challenging complying with these laws can be?
The Tilting Point case in California is a good example. California enacted a privacy law similar to Europe's General Data Protection Regulation. In this example, a mobile game had an age gate where users entered their birthdate before accessing the game, but the default age year was like 1953. Many users just accepted it without changing it, making them appear older than they were. The app shared user data with advertisers, as many do, but because of the default age, sharing the data violated the California law that restricts advertising to children. This is an interesting example because, on the one hand, there's a real motivation for this law: to keep children's data out of advertiser databases. This motivation wasn't fully comprehended, and maybe the age-verification step wasn't fully appreciated, as well as the link between that age-verification step and data processing on the back end with a third-party advertising API.

What's the biggest gap you've seen between policymakers and tech entrepreneurs?
We've been studying this area for about 20 years, and our research has evolved. Originally, we were focused on analyzing legal text and understanding how to read it, especially from an engineering perspective. After speaking to attorneys and engineers, we realized the problem is rooted in the communication between the compliance team — people literate in what the law requires — and the engineering team, who might not be as informed. The gaps that arise are in technology comprehension. The compliance team doesn't know exactly how the technology works, and the engineering team has a lot of specific knowledge about how the technology works. This leads to misunderstandings, especially when engineers interpret laws too literally and miss the broader context. For example, today we're focused on capturing the speech that people use to describe their technologies and the points of confusion that the compliance team might have when hearing that narration. We want to train models to detect these ambiguities in speech and build visualizations about how the technology works to improve communication.

What about smaller companies? How do they handle compliance challenges?
Small companies tend to be underresourced. And it depends on how small we're talking. A start up in the capital-raising phase typically won't have counsel, and might consult an outside firm or adviser. Typically, as companies mature and investors raise their expectations about the sort of policies and procedures the company uses, you might see legal influence come in. Unfortunately, we see a lot of developers in small companies relying on Reddit or forums for legal advice, which can be mixed and inaccurate.

What is the risk if they don't comply?
That's a good question, and probably one better answered by a lawyer.

Very true. But I'm asking you.
There can be fines, sometimes in the range of $1 million. Regulators will often use a noncompliance event to create an example for the industry. Sometimes there are periodic reviews of the corporate practices. Those can be invasive. Somebody might make a claim of personal harm. For example, with age verification, there could be a claim that somebody has been harmed. Maybe as an adolescent, they could buy cigarettes and they felt harm from that. They might file a lawsuit. Ultimately, there's a number of reasons why you'd want to minimize this risk.

How does your tool to help developers understand the law also bridge this gap for smaller companies?
Our tool translates legal text into a graphical representation that makes the relationships, rights and obligations in legal text explicit. It allows developers to get an early read on the law in a representation that they're more familiar with. With this representation, developers can approach legal counsel after they've had time to understand the legal requirements and apply them to their design context.

What's next for your research?
We're looking at developing lightweight techniques that help bridge communication and knowledge gaps. People with expertise can meet about some artifact that they're trying to comprehend and make sure it's compliant. This could be a software artifact, like code. We want to work with teams where they have varying degrees of expertise. Lawyers are experts in the law, but they're not skilled in understanding software artifacts. Developers are skilled at interpreting and modifying the artifacts, but they don't fully comprehend the law. We want to get those two groups together to look at these artifacts and turn lawyers into co-designers. To do this, we're leveraging technologies like speech recognition because it's much easier for people to verbally describe what they're doing rather than writing it down.

For More Information

Aaron Aupperlee | 412-268-9068 | aaupperlee@cmu.edu

Carnegie Mellon University School of Computer Science

Undergraduate

Master's

Doctoral

When Legalese Meets Code CMU Research Helps Emerging Tech Companies Navigate the Law