FUDG Framework for Syntactic Annotation
- FUDG (Fragmentary Unlabeled Dependency Grammar) is a formalism that offers a flexible way to describe the
syntactic structure of text. Beyond the traditional view of dependency syntax in which
the tokens of a sentence form nodes in a tree, FUDG allows for a distinction between nodes and
lexical items (which may be multiword expressions); provides special devices for coordination
and coreference; and facilitates underspecified (partial) annotations where producing
a complete parse would be difficult.
- GFL (Graph Fragment Language) is an ASCII-based encoding of unlabeled
dependency annotations in the FUDG formalism.
This page links to FUDG resources, including annotation guidelines, an annotation interface, annotated data,
and software to parse GFL annotations and compute inter-annotator agreement.
FUDG was created by researchers at Carnegie Mellon University and the University of Texas at Austin.
Example
Two simple dependency trees in GFL notation:
- time > flies < like < (an > arrow)
- fruit > flies > like < (a > banana)
But FUDG can express more than just attachments. The tweet
Found the scarriest mystery door in my school . I'M SO CURIOUS D:
might be annotated in GFL notation as:
Found** < (the scarriest mystery door*)
(Found* door in < (my > school))
I'M** < (SO > CURIOUS)
D:**
my = I'M
** denotes the root of each utterance within the input;
* denotes the head within a fudge expression, whose contents
must form a connected subgraph in any compatible full tree. For instance, the
second line is a fudge expression with Found as its head; the attachment
of in is underspecified (may be either Found or door).
The = operator indicates a coreference link.
Downloads
- Annotation Guidelines
- Software: Tools for checking validity of GFL annotations, converting to JSON format, visualizing the FUDG graph, measuring underspeficiation/inter-annotator agreement
- Annotation interface: A web application for annotation that integrates the checking and visualization tools.
- Annotated data: Coming soon!
Further Reading
Please cite the following if you write any papers involving the GFL/FUDG framework or data:
-
A Framework for (Under)specifying Dependency Syntax without Overloading Annotators
Nathan Schneider,
Brendan O’Connor,
Naomi Saphra,
David Bamman,
Manaal Faruqui,
Noah A. Smith,
Chris Dyer, and
Jason Baldridge.
In Proceedings of the 7th Linguistic Annotation Workshop & Interoperability with Discourse, Sofia, Bulgaria, August 2013.
(An extended version with additional technical details
is also available.)
In addition, please cite the following if you write any papers involving annotation with the GFL-Web interface:
Acknowledgments
This research was supported in part by the U. S. Army Research Laboratory and the U. S. Army Research Office under contract/grant number W911NF-10-1-0533 and by NSF grant IIS-1054319.
Contact
Please e-mail nschneid [strudel] cs.cmu.edu with questions.