# Natural Language Processing

syllabus

textbook

messages

exercises

slides

data+

useful links

## Syllabus

- Introduction:
Word statistics, Zipf's Law, corpora.
- Mathematical
Foundations: Elementary probability, Bayesian
statistics, essential information theory.
- Linguistic Essentials: Part of speech, phrase structure, semantics and pragmatics.
- Corpus Based Work: corpora, text preprocessing.
- Collocations: frequency, mean and variance, hypothesis testing.
- Statistical Inference: n-gram models, statistical estimators, combining estimators.
- Word Sense Disambiguation: Supervised disambiguation, dictionary based
disambiguation, unsupervised disambiguation.
- Markov Models: Markov models, Hidden Markov Models, HMM implementation, HMM applications, part-of-speech
tagging.
- Probabilistic Context Free Grammars: Features of PCFG,
questions for PCFG, inside and outside probabilities.
- Information Retrieval: background, vector space model term distribution models, LSI,
web search engines.
- Text Categorization: decision trees, maximum entropy modeling, perceptrons, k-nearest neighbor classification, naive Bayes.
- Clustering: Hierarchical clustering, non-hierarchical clustering.
- Advanced topics: Statistical alignment, machine translation, text summarization, question
answering.

Foundations
of Statistical Natural Language Processing by Chris Manning & Hinrich Schutze companion
website

Errors found in the book may be looked in the errata

## Exercises

## Slides

powepoint
slides

## Data+

Jane Austen's works,
zipped

list of stopwords
stop_words_sorted.txt

shakespeare's tragedies & comedies
shakes6.zip

porter stemmer ANSI C porter.c porter_thread_safe.c

## Useful Links

NLTK Natural Language Toolkit - a comprehensive toolkit for NLP implemented in python.

Free online book - Natural Language Processing with Python.

A site with all papers related to Machine Translation.

companion website for the book

statistical NLP
resources
software
tools for NLP
Text Categorisation:
A Survey nice tutorial about text categorization

Machine
Learning in Automated Text Categorization detailed introduction.

*gideon
dror** *