Schedule

Unless noted otherwise, the meeting place is NSH 4513, starting from 1530.

Meeting Time, place Topic Speaker Track
the 12th meeting Apr 5 (Fri) Semi-automatic learning of syntactic transfer rules for machine translation Katharina Probst T2
the 4th special meeting March 15 (Fri) Rethinking the Logical Problem of Language Acquisition Brian MacWhinney P2
the 11th meeting Feb 22 (Fri) Unsupervised Morphology Learning Christian Monson T1
the 10th meeting Nov 15 (Thu) Language and Sensory-Motor processes Sonya Allin P2
the 9th meeting July 5 (Thu) Language Learning in Optimality Theory Harold C Daume P3
the 8th meeting May 24 (Thu) Learning to Read a Non-alphabetic Script - Chinese Erik Peterson P2
the 7th meeting May 10 (Thu) Grammar induction by Bayesian model merging Guy Lebanon T2
the 6th meeting Apr 26 (Thu), NSH 4513 Learning Language in Logic Michael Kohlhase T4
the 3rd Special meeting Apr 19 (Thu) Listening to the animals: What nonhuman models can tell us about the role of experience in the development of speech perception Lori Holt P2
the 2nd Special meeting Apr 6 (Fri) 1400, NSH 3002 Identifying Clues of Evaluation and Speculation in Text Janyce Wiebe T3
the 5th meeting Mar 30 (Fri) 1400, NSH 3002 Information Access to Oral Communication Klaus Ries T3
the 4th meeting Mar 8 (Thu) Language Learnability Benjamin Han T2
the 3rd meeting Feb 22 (Thu) How Children Acquire Meanings of Nouns and Verbs Rachel Chung T1
the 2nd meeting Feb 1 (Thu) Computational Approaches to Parameter-Setting Models of Language Acquisition Eric Nyberg P3
the 1st Special meeting Jan 30 (Tue) 1500, NSH 2602 Semantic Information Process of Spoken Language Allen Gorin -
the 1st meeting Jan 18 (Thu) 1600, NSH 4513 Overview of Language Acquisition Natasha Tokowicz P2

20020405, the 12th meeting

Time: Apr 5, 2002 (Friday) at 1530-1700
Place: NSH 4513
Topic: Semi-automatic learning of syntactic transfer rules for machine translation
Track: Grammar Induction (T2)
Speaker: Katharina Probst (LTI)
Abstract:
I am currently working on the AVENUE project with Alon Lavie, Jaime Carbonell, and Lori Levin. In the past year, we have explored a novel approach to machine translation that elicits bilingual data and uses these data to learn transfer rules. A bilingual user translates a set of carefully constructed sentences and specifies the word-alignments. The system then develops transfer rules from the sentences, using any knowledge that is available on the source and target language sides. The source language is always a major language such as English, and we assume sufficient knowledge of this language, i.e. we assume that we can correctly parse the sentences, etc. The target language is meant to be a minority language such as Mapudungun for which few online resources are available.

While we are still in the beginning stages of this research, we have developed some algorithms, theoretical foundations of the approach, and have discovered a number of issues that need to be addressed. In my talk, I will give a brief introduction to the AVENUE project. I will then describe the two major parts of the learning algorithm, namely the seed generation and the version space learning. I will briefly explain compositionality, an issue we are currently tackling.

Reading:

Please note that the following paper has not been accepted for publication, so please do not distribute.

Katharina Probst: Semi-Automatic Learning of Transfer Rules for Machine Translation of Low-Density Languages. Submitted to the student sessions at ACL and ESSLLI.
Paper

Optional:

  1. Katharina Probst, Lori Levin: Challenges in Automated Elicitation of a Controlled Bilingual Corpus. In: Proceedings of TMI 2002.
    Paper

  2. Katharina Probst, Ralf Brown, Jaime Carbonell, Alon Lavie, Lori Levin, and Erik Peterson. 2001.Design and Implementation of Controlled Elicitation for Machine Translation of Low-density Languages. In Proceedings of the MT2010workshop at MT Summit 2001.
    Paper

20020315, the 4th special meeting

Time: March 15, 2002 (Friday) at 1530-1700
Place: NSH 4513
Topic: Rethinking the Logical Problem of Language Acquisition
Track: Cognitive Science Perspective (P2)
Speaker: Brian MacWhinney (PSY@CMU)
Abstract:
This is a fundamental reanalysis of the so-called logical problem of "poverty of the stimulus" of Plato and Chomsky. I argue that, when one views language in terms of competition, the problem goes away.

Reading:

Rethinking the Logical Problem of Language Acquisition by Brian MacWhinney. Manuscript in preparation, please do not quote.
Paper

20020222, the 11th meeting

Time: February 22, 2002 (Friday) at 1500-1630
Place: NSH 4513
Topic: Unsupervised Morphology Learning
Track: Lexical Acquisition (T1)
Speaker: Christian Monson (LTI)
Abstract:
I am currently working on the AVENUE project under Jaime Carbonell, Alon Lavie, and Lori Levin. The AVENUE project is concerned with (semi-)automatically learning to translate between a known language (e.g. English, Spanish) and an unknown minority language (Mapudungun, Inupiak, etc.).

While others in the group (Erik Peterson, Kathrin Probst) are looking at learning sentence level transfer rules, my assignment is to learn the morphology of the unknown language. We plan, eventually, to leverage both our information on the known lanaguage and the skills of a linguistically naive bilingual informant as we learn the morphology. Before adding additional knowledge, however, we wondered how well we could learn morphology with no additional knowledge beyond a corpus of text.

Others have worked on this unsupervised learning problem before me. John Goldsmith in the June 2001 issue of Computational Linguistics describes an unsupervised learning system he has worked on and also gives references to work before him.

In my talk Friday I will describe Goldsmith's algorithm along with one or two other approaches tried in the past. I will also present the simple algorithm Jaime, Alon, and I have come up with that after initial investigation performs at least as well as Goldsmith's approach.

Readings:

20011115, the 10th meeting

Time: November 15, 2001 (Thursday) at 1630-1730
Place: NSH 4513
Topic: Language Learning in Optimality Theory
Track: Cognitive Science Perspective(P2)
Speaker: Sonya Allin (HCII)
Abstract:
Sonya Allin from the Human-Computer Interation Institute will be discussing the connections between language and sensory-motor processes. Specific topics may include:
- metaphor interpretation (and its grounding in embodied primitives).
- the relationship between the semantics of verbal aspect and sensory-motor primitives.
- possible parallels between motor skill learning and language learning.

Reading:

S. Narayanan. Talking the Talk is Like Walking the Walk. Proceedings of CogSci97, Stanford, August 1997
S. Narayanan. Moving Right Along: A Computational Model of Metaphoric Reasoning about Events. Proceedings of the National Conference on Artificial Intelligence (AAAI '99).

20010705, the 9th meeting

Time: July 5, 2001 (Thursday) at 1630-1830
Place: NSH 4513
Topic: Language Learning in Optimality Theory
Track: Linguistic Perspective (P3)
Speaker: Harold C Daume (ISI@USC)
Abstract:
The framework of competing constraints used prevalently in phonology, called Optimality Theory, has recently made its way into both syntax and semantics. In this talk, I would present a brief introduction to OT in syntax (no background OT knowledge is required), sketch some examples of constraints and then move on to the learning problem. I will present two well-known algorithms for learning Optimality Theoretic grammars, discuss their results, and discuss the computational challenges inherit in the OT framework, both for learning and for general parsing and generation.

Reading:

Tesar, B. & Smolensky, P. 1998. Learning Optimality-Theoretic grammars. Lingua, 106: 161-196. Reprinted in Sorace, A., Heycock, C. and Shillcock, R. (eds.) Language Acquisition: Knowledge Representation and Processing. Amsterdam: Elsevier.
Paper

20010524, the 8th meeting

Time: May 24, 2001 (Thursday) at 1630-1830
Place: NSH 4513
Topic: Learning to Read a Non-alphabetic Script - Chinese
Track: Cognitive Science Perspective (P2)
Speaker: Erik Peterson (LTI)
Abstract:
Unique among the world's languages, Chinese uses an entirely non-alphabet script, instead relying on morphemic glyphs often referred to as "characters" or sometimes "sinographs". In this talk I will review some papers on learning Chinese characters and how to read in Chinese, along with how this differs from learning to read English.

Readings: (2 and 3 are optional)

  1. Angel M. Y. Lin & Nobuhiko Akamatsu: "The learnability and psychological processing of reading in Chinese and reading in English." In Chen, H. C. (Ed.) (1997). Cognitive Processing of Chinese and Related Asian Languages, pp. 369-387. Hong Kong: The Chinese University Press.
  2. Hui Yang and Dan-ling Peng: "The Learning and Naming of Chinese Characters of Elementary School Children." In Chen, H. C. (Ed.) (1997). Cognitive Processing of Chinese and Related Asian Languages, pp. 323-346. Hong Kong: The Chinese University Press.
  3. Insup Taylor: "Psycholinguistic Reasons for Keeping Chinese Characters in Korean and Japanese." In Chen, H. C. (Ed.) (1997). Cognitive Processing of Chinese and Related Asian Languages, pp. 299-319. Hong Kong: The Chinese University Press.

Note: As usual copies of papers will be available at the LTI front desk.

20010510, the 7th meeting

Time: May 10, 2001 (Thursday) at 1630-1830
Place: NSH 4513
Topic: Grammar induction by Bayesian model merging
Track: Grammar Induction (T2)
Speaker: Guy Lebanon (LTI/CALD)
Abstract:
This work describes a Bayesian approach to Grammar induction. Instead of the popular EM method, the search for good grammar is done by merging smaller models that were induced earlier. Specifically, the work deals with finding the structure of HMM, Stochastic context free grammar and probabilistic attribute grammars. In my talk I will concentrate on the latter two.

The search for good grammar is done in a Bayesian approach: priors are assigned to grammars in a way which favors simpler models and the search is guided by the posterior of the grammars.

Readings: A. Stolcke. Bayesian Learning of Probabilistic Language Models. PhD thesis, University of California at Berkeley, 1994.
Paper

20010426, the 6th meeting

Time: Apr 26, 2001 (Thursday) at 1630-1830
Place: NSH 4513
Topic: Learning Language in Logic
Track: Knowledge representation/inferences for LA (T4)
Speaker: Michael Kohlhase (LTI)
Abstract:
Learning Language in Logic is a developing research area at the intersection of
  • computational logic (my area),
  • natural language processing (my hobby) and
  • machine learning (related to language acquisition?).

I will give a brief very basic introduction into (the aims) logic, computational logic, and inductive logic programming (machine learning with logic techniques) and motivate (some examples) why these techniques might be interesting to study in a natural language setting. The main thesis is that machine learning techniques can make the acquisition of linguistic- and world-knowledge that is crucial for deep (logic-based) linguistic analysis feasible in the larger scale.

Readings:

  1. S. Dzeroski, J. Cussens, and S. Manandhar. An Introduction to Inductive Logic Programming and Learning Language in Logic. In J. Cussens and S. Dzeroski, editors, Proceedings of Learning Language in Logic, LLL-99, pages 3-35, Bled, Slovenia, 30 June 1999.
  2. R. Mooney. Learning for Semantic Interpretation: Scaling Up without Dumbing Down. In J. Cussens and S. Dzeroski, editors, Proceedings of Learning Language in Logic, LLL-99, pages 57-66, Bled, Slovenia, 30 June 1999.

Note: 15 copies of the readings will be available some time after Apr 9 (Mon) at the LTI front desk. Contact Ben if you need the originals to make copies.

20010419, the 3rd special meeting

Time: Apr 19, 2001 (Thursday) at 1630-1830
Place: NSH 4513
Topic: Listening to the animals: What nonhuman models can tell us about the role of experience in the development of speech perception
Track: Cognitive Science Perspective (P3)
Speaker: Lori Holt (Psy@CMU)
Abstract:
Before they become native speakers, infants are native listeners. A great deal of evidence supports the observation that, during their first year, human infants come to respond to elements of language in a manner appropriate to the language environment in which they are being reared. Yet, little is understood about the means by which experience shapes perception of speech. I will describe how nonhuman animal models, computational methods, and human adult learning paradigms can help us to understand the role that experience plays in shaping early speech perception.

Readings: L. L. Holt, A. J. Lotto, and K. R. Kluender. Incorporating principles of general learning in theories of language acquisition. In M. Gruber, D. Higgins, K. S. Olson, and T. Wysocki, editors, Constraints, Acquisition of Spoken Language, Acquisition and the Lexicon, volume 34, pages 253-268. Chicago Linguistic Society, 1998.
Paper

20010406, the 2nd special meeting (joint presentation with LTI Seminars)

Time: Apr 6, 2001 (Friday) at 1400-1530
Place: NSH 3002
Topic: Identifying Clues of Evaluation and Speculation in Text
Track: Discourse/pragmatics Learning (T3)
Speaker: Janyce Wiebe (CS@UPitt)
Abstract:
This talk will describe work on identifying evaluative and speculative language in text. Knowledge of such language ("subjective language", Banfield 1982) would be useful in many text-processing applications, such as flame recognition, email classification, intellectual attribution in text, recognizing speaker role in radio broadcasts, mining Internet forums for reviews, clustering documents by ideological point of view, information extraction, summarization, and any other application that would benefit from knowledge of how opinionated the language is, and whether or not the writer purports to objectively present factual material. Observations derived from a corpus study will be presented, as well as work on identifying clues of subjectivity using the results of a method for clustering words according to distributional similarity (Lin 1998). Results will be presented for both sentence-level subjectivity judgements and document-level editorial classifications.
Brief Bio: Janyce Wiebe's research area is artificial intelligence, specifically discourse processing, word-sense disambiguation, and statistical natural language processing. She recently joined the CS faculty at the University of Pittsburgh, after having been a professor in CS and a researcher in the Computing Research Laboratory at New Mexico State University for eight years. Before going to NMSU, she was a post-doc at the Univeristy of Toronto; she received her PhD from SUNY Buffalo.

Readings:

  1. J. Wiebe. Learning Subjective Adjectives from Corpora. In the 17th National Conference on Artificial Intelligence (AAAI-2000), Austin, Texas, July 2000.
    Paper
    (the most relevant one)
  2. D. Lin. Automatic Retrieval and Clustering of Similar Words. In COLING-ACL98, Montreal, Canada, August 1998.
    Paper
    (the work presented uses results from this paper)

Note: Several other papers (ACL-99 and COLING-00) can be downloaded from http://www.cs.pitt.edu/~wiebe, under publications.

20010330, the 5th meeting (joint presentation with LTI Seminars)

Time: Mar 30, 2001 (Friday) at 1400-1530
Place: NSH 3002
Topic: Information Access to Oral Communication
Track: Discourse/pragmatics Learning (T3)
Speaker: Klaus Ries (LTI)
Abstract:
People are constantly engaging in oral communications, many of which have important consequences. We therefore often go through the trouble of documenting them in written form. Much too often however the communication is not documented in written form and can only be recalled from autobiographic memory. An alternative would be to just record them and access the content later. In this presentation I will show what kind of applications one could envision and what properties systems need to have in order to be useful.

One of the important problems to be solved is how to search and navigate in a database of oral communication. While traditional text based information retrieval is focussed on topical keywords topic might not be the only important information. Oral communications are also characterized by the situation they are embedded in and by the style of the communication.

I will therefore present a series of experiments that are making use of conversational style in order to access oral communication: Dialogue acts and games are detected using a multilevel hybrid HMM/NN architecture that may also be interesting for tasks such named entity tagging. For the segmentation of dialogue I will present a novel probabilistic approach that is fairly effective and I will show that one can obtain good domain independent segmentation performance using features that pertain to speaker activity rather than change in topic. I will then present a number of experiments that describe the style of oral communication at various levels of abstraction.

Finally I will give a preliminary evaluation of the effectiveness of this technique and I will propose information access to oral communication as a general theory independent challenge problem that could foster a significant amount of interesting interdisciplinary research.

Readings: A. Stolcke, K. Ries, N. Coccaro, E. Shriberg, R. Bates, D. Jurafsky, P. Taylor, R. Martin, C. Van Ess-Dykema, and M. Meteer. Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech. Computational Linguistics, 26(3):339-373, September 2000.
Paper

Note: This paper gives a good review of the state of the art in dialogue act detection to date. It may contain some formulas that are only important for the combination with a speech recognition, happily ignore those.

20010308, the 4th meeting

Time: Mar 8, 2001 (Thursday) at 1630-1830
Place: NSH 4513
Topic: Language Learnability
Track: Grammar Induction (T2)
Speaker: Benjamin Han (LTI)
Abstract:
Grammar induction is a problem of identifying a correct representation of the target language. In his influential 1967 paper, Gold gave a formal definition of the language learnability model, and proved identifiability of different classes of languages in the limit. Depending on the perspective, the proofs have far-reaching implications, including the claim that context-free languages cannot be identified in the limit without negative evidence. However it has been widely accepted that young children can pick up their mother tongues without any such evidence, thus presents a discrepancy between the theoretical and the empirical studies. In this talk I will first present Gold's original proofs, and then discuss one of the empirical studies which attempted to refute the popular no-negative-evidence belief.
Readings:
  1. E. M. Gold. Language Identification in the Limit. Information and Control, 10(5), 1967.
  2. J. N. Bohannon III and L. Stanowicz. The Issue of Negative Evidence: Adult Responses to Children's Language Errors. Developmental Psychology, 24(5):684-689, 1988.
  3. P. Gordon. Learnability and Feedback. Developmental Psychology, 26(2):217-220, 1990.
Note: 15 copies of the readings are available at the LTI front desk. Contact Ben if you need the originals to make copies.

20010222, the 3rd meeting

Time: Feb 22, 2001 (Thursday) at 1630-1830
Place: NSH 4513
Topic: How Children Acquire Meanings of Nouns and Verbs
Track: Lexical Acquisition (T1)
Speaker: Rachel Chung (Psy@UPitt)
Abstract:
In this presentation I will review the core issues in lexical acquisition, with emphasis on how meanings of words, particularly nouns and verbs, are discovered and acquired by children. I will discuss the implications of the principles-and-constraints approach to noun learning, how it may be applied to verb learning, and problems with the approach. I will also discuss the role of syntax in word meaning acquisition. The field of lexical acquisition is largely noun-biased, so I will review some of the recent efforts to explain how meanings are mapped onto verbs.
Readings:
  1. A. L. Woodward and E. M. Markman. Early Word Learning. In W. Damon, D. Kuhn, and R. Siegler, editors, Handbook of Child Psychology, Volumn 2: Cognition, Perception and Language, volume 2. Wiley & Sons, New York, 5 edition, 1998. (required)
  2. L. Gleitman. The Structural Sources of Verb Meanings. Language Acquisition, pages 3-55, 1990.
  3. P. Gordon. Level-ordering in Lexical Development. Cognition, 21:73-93, 1986.
Note: About 15 copies of the readings are already made available at the LTI front desk - please get one for yourself before they're all gone (participants first please). In the unlucky case contact Ben to get the originals so that you can make a copy yourself.

20010201, the 2nd meeting

Time: Feb 1, 2001 (Thursday) at 1630-1830
Place: NSH 4513
Topic: Computational Approaches to Parameter-Setting Models of Language Acquisition
Track: Linguistic Perspective (P3)
Speaker: Eric Nyberg (LTI)
Abstract:
In this presentation I will review the Universal Grammar / parameter-setting approach to language acquisition (mainly due to Chomsky and his followers), and discuss some possible ways to implement this approach in a computational model. I will draw heavily from my own Ph.D. thesis work, and present learning models for syntax and phonology, as well as some theoretically hard problems from the literature. I will also discuss some alternative approaches (such as genetic algorithms) and how the "cognitive argument" places a different set of constraints on what counts as an admissible theory or model.
Reading: E. H. Nyberg. A non-deterministic, success-driven model of parameter setting in language acquisition. PhD dissertation, Carnegie Mellon University, 1992. (at least Chapter 1)
PS Slides(PS)

20010130, the 1st special meeting

Time: Jan 30, 2001 (Tuesday) at 1500-1630 (different from the regular meetings)
Place: NSH 2602 (Interactive Systems Lab, different from the regular meetings)
Topic: Semantic Information Process of Spoken Language
Speaker: Allen Gorin (AT&T Research)
Abstract:
This talk discusses the next generation of voice-based user interface technology will enable easy-to-use automation of new and existing communication services. A critical issue is to move away from highly-structured menus to a more natural human-machine paradigm. In recent years, we have developed algorithms which learn to extract meaning from fluent speech via automatic acquisition and exploitation of salient words, phrases and grammar fragments from a corpus. These methods have been previously applied to the 'How may I help you?' task for automated operator services, in English, Spanish and Japanese. In this paper, we report on a new application of these language acquisition methods to a more complex customer care task. We report on empirical comparisons which quantify the increased linguistic and semantic complexity over the previous domain. Experimental results on call-type classification will be reported for this new corpus of 30K utterances from live customer traffic. This traffic is drawn from both human/human and human/machine interactions.
Readings: None
Appointments and contacts: Nadine Reaves (nr@cs.cmu.edu) or 8-5733

20010118, the 1st meeting

Time: Jan 18, 2001 (Thursday) at 1600-1800
Place: NSH 4632
Topic: Overview of Language Acquisition
Track: Cognitive Science Perspective (P2)
Speaker: Natasha Tokowicz (Psychology)
Abstract:
I will discuss the major areas of study in the field of language acquisition, from the cognitive science perspective (and psycholinguistics in particular). I will describe the major controversies in each area of research and will then describe my own fields of research, bilingualism and second language acquisition.
Readings: (you can download them from the Bibliography section)
  1. Pavlenko, A. (1999). New approaches to concepts in bilingual memory. Bilingualism, 2, 209-230.
  2. Pinker, S. (1995). Language Acquisition. In Osherson (Ed.) Language: An invitation to cognitive science (2nd ed.). pp. 135-182 (read up to the beginning of section 6.3: The course of language acquisition).

Note: In the first meeting we will discuss the potential schedule conflict with individual members.

version 0.7 ; htmlinst date: Fri Nov 7 16:49:04 2003

Webmaster: Benjamin Han (benhdj at cs dot cmu dot edu)