Natural Language Processing
syllabus
textbook
messages
exercises
slides
data+
useful links
Syllabus
- Introduction:
Word statistics, Zipf's Law, corpora.
- Mathematical
Foundations: Elementary probability, Bayesian
statistics, essential information theory.
- Linguistic Essentials: Part of speech, phrase structure, semantics and pragmatics.
- Corpus Based Work: corpora, text preprocessing.
- Collocations: frequency, mean and variance, hypothesis testing.
- Statistical Inference: n-gram models, statistical estimators, combining estimators.
- Word Sense Disambiguation: Supervised disambiguation, dictionary based
disambiguation, unsupervised disambiguation.
- Markov Models: Markov models, Hidden Markov Models, HMM implementation, HMM applications, part-of-speech
tagging.
- Probabilistic Context Free Grammars: Features of PCFG,
questions for PCFG, inside and outside probabilities.
- Information Retrieval: background, vector space model term distribution models, LSI,
web search engines.
- Text Categorization: decision trees, maximum entropy modeling, perceptrons, k-nearest neighbor classification, naive Bayes.
- Clustering: Hierarchical clustering, non-hierarchical clustering.
- Advanced topics: Statistical alignment, machine translation, text summarization, question
answering.
Foundations
of Statistical Natural Language Processing by Chris Manning & Hinrich Schutze companion
website
Errors found in the book may be looked in the errata
Messages
List of exercise submissions in
csv format and
pdf.
An
equations-sheet that would be attached to the exam can be seen here.
Exercises
Slides
powepoint
slides
Data+
Jane Austen's works, 
zipped
list of stopwords
stop_words_sorted.txt
shakespeare's tragedies & comedies
shakes6.zip
porter stemmer ANSI C porter.c porter_thread_safe.c
Useful Links
NLTK Natural Language Toolkit - a comprehensive toolkit for NLP implemented in python.
Evgeniy Gabrilovich a site with lots of internet resources for NLP.
A site with all papers related to Machine Translation.
companion website for the book
statistical NLP
resources
software
tools for NLP
Text Categorisation:
A Survey nice tutorial about text categorization
Machine
Learning in Automated Text Categorization detailed introduction.
gideon
dror