CS249: Paper List – Winter 2020

This is the list of suggested papers that students may want to read and present. Most of these papers have been highly influential and considered a “required reading” in NLP. Every group will have to pick and present one paper for two hours during the quarter. If you want to see a few example summary, here are sample submissions from earlier years (on a paper not listed here.)

By Instructor

~~(Week 1 Mon)~~ Class Introduction [(Slides)](…/slides/Lec01 Introduction.pptx)
(Week 1 Wed) Yoshua Bengio, et al.: A Neural Probabilistic Language Model, J. of Machine Learning Research, 2003. [(Slides)](…/slides/Lec02 Neural Language Model.pptx)
(Week 2 Mon) Tomas Mikolov, et al.: Distributed Representations of Words and Phrases and their Compositionality, NIPS 2013. [(Slides)](…/slides/Lec03 Neural Language Model.pptx)
(Week 2 Wed) Jeffrey Pennington, et al.: GloVe: Global Vectors for Word Representation, 2014. [(Slides)](…/slides/Lec04 Word Embedding.pptx)
~~(Week 3 Mon)~~ Martin Luther King, Ju. Holiday

By Students

(Week 3 Wed) Kamal Nigam, et al.: Text Classification from Labeled and Unlabeled Documents using EM, Machine Learning, 1999. [(Slides)](…/slides/Lec05 Text Classification.pptx)
(Week 4 Mon) Adam Berger, Stephen Della Pietra, Vincent Pietra: A Maximum Entropy Approach to Natural Language Processing, J of Computational Linguistics 1996.
(Week 4 Wed) Michael Collins: Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms, EMNLP 2002.
~~(Week 5 Mon)~~ Project Presentations
~~(Week 5 Wed)~~ Project Presentations
(Week 6 Mon) John Lafferty, Andrew McCallum, Fernando C.N. Pereira: Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data, ICML 2001.
(Week 6 Wed) Ryan McDonald, et al.: Non-Projective Dependency Parsing using Spanning-Tree Algorithms, EMNLP 2005.
~~(Week 7 Mon)~~ Presidents’ Day Holiday
(Week 7 Wed) Danqi Chen, Christopher D. Manning: A Fast and Accurate Dependency Parser using Neural Networks, EMNLP 2014.
(Week 8 Mon) Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio: Neural Machine Translation by Jointly Learning to Align and Translate, ICLR 2015.
(Week 8 Wed) Ashish Vaswani, et al.: Attention is All You Need, NIPS 2017.
(Week 9 Mon) Jacob Devlin, et al.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, 2018.
~~(Week 9 Wed)~~ Project Presentations
~~(Week 10 Mon)~~ Project Presentations
~~(Week 10 Wed)~~ Project Presentations

Helpful Tutorials

Maya R. Gupta, Yihua Chen: Theory and Use of the EM Algorithm, 2010.
Kevin Knight: Statistical MT Tutorial Workbook, 1999.
Charles Sutton, Andrew McCallum: An Introduction to Conditional Random Fields, 2012.
Kevin Knight: Bayesian Inference with Tears, 2009.
Bela A. Frigyik, Amol Kapila, Maya R. Gupta: Introduction to the Dirichlet Distribution and Related Processes, 2010.
Philip Resnik, Eric Hardisty: Gibbs Sampling for the Uninitiated, 2010.
Maya R. Gupta: A Measure Theory Tutorial (Measure Theory for Dummies), 2006.