Introduction to Statistical Parsing

Lecturer: Detlef Prescher
Date: Monday, 4:15 - 5:45 p.m.
Location: IWR R 432, INF 368, Interdisciplinary Center for Scientific Computing, University of Heidelberg
First Slot: Monday, April 16, 2007


Tutor: Florian Winkelmeier (mail to:
Date: Thursday, 2:15 - 3:45 p.m.
Location: Pool, INF 325, Department of Computational Linguistics, University of Heidelberg
First Tutorium: Thursday, May 3, 2007


2007-08-30: The written exam produced two "sehr gut", five "gut", nine "befriedigend", one "ausreichend", and one "ungenügend", resulting in the following final grades...
2007-08-30: The series of talks produced five "sehr gut", ten "gut", and three "befriedigend"...
2007-07-16: The final exam shall be graded soon...
2007-07-12: No assignment this week (because of the final exam at July 16)
2007-07-03: Assignment 10, due July 10
2007-06-26: Assignment 9, due July 3
2007-06-26: Final date of the written exam is July 16, 2007 (our slot 13 :-)
2007-06-26: Presentation of slot 8 is now better readable...
2007-06-19: Assignment 8, due June 26
2007-06-11: No assignment this week
2007-06-01: Assignment 7, due June 20
2007-05-25: Assignment 6, due June 13
2007-05-25: Please, sign up! You can commit to this course until June 15 only...
2007-05-15: Assignment 5, due June 5 [Use this program fragment as a starting point]
2007-05-11: Assignment 4, due May 21
2007-05-05: Assignment 3, due May 14
2007-04-28: Assignment 2, due May 7
2007-04-23: Assignment 1, due April 30
2007-04-16: Website published

Course Description

The course has three parts. Part 1: We start with symbolic-parsing methods and get to know the CKY algorithm as well as various parsing strategies (shift-reduce, top-down, and left-corner). Part 2: On top of symbolic parsing, we introduce statistical parsing and get to know the count, inside, and Viterbi algorithms. Having this at hand, we concentrate on treebank training for English and discuss techniques developed by Charniak, Johnson, Collins, and others (lexicalisation, parent encoding, Markovisation of rules, data-oriented parsing). Part 3: We present parsers developed for German. It might come a bit as a surprise that techniques developed for state-of-the-art parsing of English often fail when applied to other languages. The course is accompanied by a tutorium. Here, we play a bit with treebanks for English and German.


April 16, 2007. Slot 1:
- Course description
- Schedule of student presentations
April 23, 2007. Slot 2:
- Schedule of student presentations
- Introduction to (statistical) parsing
April 30, 2007. Slot 3:
- Introduction to (statistical) parsing
Readings: Detlef Prescher, A Tutorial on the Expectation-Maximization Algorithm Including Maximum-Likelihood Estimation and EM Training of Probabilistic Context-Free Grammars, ESSLLI 2003. (selected pages only)
May 7, 2007. Slot 4:
- CKY algorithm and parse-forest algorithm
Readings: Detlef Prescher, EM-basierte maschinelle Lernverfahren für natürliche Sprachen, Doctoral Dissertation 2002. (Pages 82 to 88)
Presentation by: Cäcilia Zirn, Samuel Broscheit
May 14, 2007. Slot 5:
- Parsing strategies (shift-reduce, top-down, left-corner)
Readings: Thomas Kalt, Induction of Greedy Controllers for Deterministic Treebank Parsers, EMNLP 2004; Thomas Kalt, Control Models of Natural Language Parsing, PhD Thesis 2005. (selected pages only)
Presentation by: Thomas Wangler, Benjamin Heinzerling
May 21, 2007. Slot 6:
- Semiring parsing: the count, inside, Viterbi, and other algorithms. [1st and 2nd part of the presentation]
Readings: Joshua Goodman, Semiring Parsing. CompLing 1999; Detlef Prescher, EM-basierte maschinelle Lernverfahren für natürliche Sprachen, Doctoral Dissertation 2002. (Pages 82 to 108)
Presentation by: Martina Trognitz, Xiaoxi Pang
May 28, 2007. Pfingstmontag
June 4, 2007. Slot 7:
- Treebank Training [1st and 2nd part of the presentation]
Readings: Eugene Charniak, Tree-Bank Grammars, AAAI/IAAA 1996 [pdf version]; Mitch Marcus etal, Building a Large Annotated Corpus of English: The Penn Treebank, CompLing Journal 1993.
Presentation by: Galja Georgieva, Mateusz Dworaczek
June 11, 2007. Slot 8:
- Lexicalised parsing (and parent encoding)
Readings: Eugene Charniak, Statistical Parsing with a Context-free Grammar and Word Statistics, AAAI/IAAA 1997.
Presentation by: Katharina Wäschle, Sharon Friedrich
June 18, 2007. Slot 9
- Lexicalised parsing + Markovisation of grammar rules + detailed error analysis
Readings: Michael Collins, Three Generative, Lexicalised Models for Statistical Parsing, ACL 1997; Michael Collins, A New Statistical Parser Based on Bigram Lexical Dependencies; Michael Collins, Head-Driven Statistical Models for Natural Language Parsing, CompLing 2003. (selected pages only)
Presentation by: Eva Mujdricza
June 25, 2007. Slot 10
- Unlexicalised parsing + manual mark-up of the treebank
Readings: Dan Klein, Christopher D. Manning. Accurate Unlexicalised Parsing, ACL 2003.
Presentation by: Sybille Reuter, Galina Mihailova
July 2, 2007. Slot 11
- Data-oriented parsing
Readings: Rens Bod, Remko Scha (1996). Data-Oriented Language Processing - An Overview; Rens Bod (2001), What is the Minimal Set of Fragments that Achieves Maximal Parse Accuracy?, ACL 2001; Khalil Sima'an (2003). A Short Introduction to the DOP Model. Khalil Sima'an (2003), Lecture on Data-Oriented Parsing, ESSLLI 2003.
Presentation by: Christine Neupert, Galina Sircu
July 9, 2007. Slot 12
- Lexicalised parsing for German
Readings: Amit Dubey and Frank Keller, Probabilistic Parsing for German Using Sister-Head Dependencies, ACL 2003.
Presentation by: Carine Dombov, Irina Gossmann
July 16, 2007. Slot 13
- Written exam (30% grading)
July 23, 2007. Slot 14
- Unlexicalised parsing for German
Readings: Amit Dubey, What to Do When Lexicalization Fails: Parsing German with Suffix Analysis and Smoothing, ACL 2005.
Presentation by: Nadya Georgieva, Robert Schumann


Class participation (active/passive): 10%
Written exam: 30%
Student presentation: 60%
Note that you also have to solve 50% of the assignments!


Last updated: June 2007. Valid HTML 4.01!