Before we delve into learning about the state-of-the-art in NLP, we wanted to get a strong foundation. This NLP from scratch paper explains how you can go about training neural networks that perform well on linguistic tasks without any prior knowledge of language. (The title says “almost” from scratch because a model that truly learned from scratch wouldn’t separate sentences into words, it would learn from raw characters.)
This paper was published in 2011 and was the first major work to rival state- of-the-art performance in NLP tasks like part-of-speech tagging and chunking. Since its publication, deep neural networks have blown these benchmarks out of the water– along with just about every benchmark set by a program based on linguistic prior knowledge.