Natural Language Processing in a Nutshell
1. Abbreviated Words in NLP:
LSTM: Long Short Term Memory
Bert: Bidirectional Encoder Representations from Transformers.
POS: parts of speech.
NER: name entity recognition.
NLG: Natural Language Generation.
NLU: Natural Language Understanding.
TF IDF: Term Frequency–Inverse Document Frequency.
re: Regular expression.
LDA: Latent Dirichlet Allocation.
LSI: Latent Semantic Indexing.
NMF: Non-Negative Matrix Factorization.
NLTK: Natural Language Toolkit
2. Some Common Steps for NLP Problems:
Sentence Segmentation: break the text apart into separate sentences
Tokenization: split Sentence to words
Stemming: the process of reducing words to their word stem, for example, thinking→ think
Lemmatizing: for example worse→ bad
POS tags: Predicting Parts of Speech for Each Token
Identifying Stop Words: like “and”, “the”
Name entity recognition: detect nouns with real-world concepts.
3. Applications of NLP in The Real World:
Personal assistant applications
Managing the Advertisement
Name entity recognization
Part of speech tagging
Language model building
4. Python Library for NLP:
Gensim: is a python library specifically for Topic Modelling.
re: a python library for regular expression
allennlp: an open-source NLP research library, built on PyTorch
5. A few terms in NLP:
Named entity recognition
Corpus: A collection of texts
n-gram: tokenize sentences by n words combination
Latent Dirichlet Allocation: a technique for topic modeling.
6. Word Embedding Libraries:
7. Great Tutorials for NLTK & spaCy:
Python for NLP: Tokenization, Stemming, and Lemmatization with SpaCy Library