Reading Group on The Elements of Statistical Learning

Language and Inference Technology Group
ILLC, University of Amsterdam
Nieuwe Achtergracht 166
1018 WV Amsterdam, The Netherlands

Time: Monday 10:0-11:00
Room: B235

Overview: We are interested in statistical-learning methods (such as nearest-neighbor methods, bootstrap and maximum-likelihood methods, boosting, neural networks, support-vector machines, co-training, and maximum-entropy modeling), and we are especially interested in application of these methods to Natural-Language Processing (NLP). So, we plan to read some selected chapters of

Trevor Hastie, Robert Tibshirani, and Jerome Friedman (2001). The Elements of Statistical Learning. Springer.
Richard Duda, Peter Hart, David Stork (2001). Pattern Classification. John Wiley and Sons, Inc.

which will provide us with the theoretical background of these methods. Additionally, we plan to discuss papers applying these methods to (tasks involving at least some) NLP.

Reading Group Members (so far): Gabriel Infante Lopez (infante@science.uva.nl), Valentin Jijkoun (jijkoun@science.uva.nl), Karin Müller (kmueller@science.uva.nl), Breanndan O Nuallain (bon@science.uva.nl), Detlef Prescher (prescher@science.uva.nl), Yoav Seginer (yseginer@science.uva.nl), and Khalil Sima'an (simaan@science.uva.nl).

Syllabus (so far):

Introduction

TUESDAY

Overview of Supervised Learning

Bayesian Decision Theory

Wray L. Buntine (1994). Operations for Learning with Graphical Models..

Maximum-Likelihood and Bayesian Parameter Estimation

THURSDAY

Nonparametric Techniques

Linear Discriminant Functions

Support Vector Machines

Christopher Burges(1998). Tutorial on Support-Vector Machines for Pattern Recognition

Thorsten Joachims (2001). A Statistical Learning Model of Text Classification with Support Vector Machines

Collins (2002). Parameter Estimation for Statistical Parsing Models: Theory and Practice of Distribution-Free Methods

Collins and Duffy (2002). New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron

[Session chair (June 16, 2003): Detlef Prescher]

Model Assesment and Selection. Chapter 7 (Hastie etal., 2001).
[Session chair: Khalil Sima'an]

Model Inference and Averaging, Chapter 8 (Hastie etal., 2001) and
Steven Abney (2002). Bootstrapping.
[Session chair: Detlef Prescher]

Boosting, Chapter 10 (Hastie etal., 2001), and
Henderson and Brill (2000). Bagging and Boosting a Treebank Parser.
[Session chair: Gabriel Infante Lopez]

Neural Networks, Chapter 11 (Hastie etal., 2001)
[Session chair: Valentin Jijkoun]

Co-Training,
Blum and Mitchell (1998). Combining Labeled and Unlabeled Data with Co-Training.
[Session chair:]

Maximum-Entropy Modeling,
Robert Malouf (2002). A comparison of algorithms for maximum entropy parameter estimation, and
Adwait Ratnaparkhi (1997). A Simple Introduction to Maximum Entropy Models for Natural Language Processing.
[Session chair:]

Last updated: June 2003.