Natural Language Processing in a Nutshell

Natural Language Processing in a Nutshell

1. Abbreviated Words in NLP:

LSTM: Long Short Term Memory
Bert: Bidirectional Encoder Representations from Transformers.
POS: parts of speech.
NER: name entity recognition.
NLG: Natural Language Generation.
NLU: Natural Language Understanding.
TF IDF: Term Frequency–Inverse Document Frequency.
re: Regular expression.
LDA: Latent Dirichlet Allocation.
LSI: Latent Semantic Indexing.
NMF: Non-Negative Matrix Factorization.
NLTK: Natural Language Toolkit

2. Some Common Steps for NLP Problems:

Sentence Segmentation: break the text apart into separate sentences

Tokenization: split Sentence to words
Stemming: the process of reducing words to their word stem, for example, thinking→ think
Lemmatizing: for example worse→ bad
POS tags: Predicting Parts of Speech for Each Token
Identifying Stop Words: like “and”, “the”
Name entity recognition: detect nouns with real-world concepts.
Text classification
Chunking
Coreference resolution

3. Applications of NLP in The Real World:

Personal assistant applications

Fighting spam
Chatbots
Managing the Advertisement
Sentiment analysis
Text classification
Text summarization
Toxicity Classification
Name entity recognization
Part of speech tagging
Language model building
Machine translation
Spell checking
Speech recognition
Character recognition

4. Python Library for NLP:

NLTK

spaCy
Gensim: is a python library specifically for Topic Modelling.
Pattern
Stanford CoreNLP
Polyglot
TextBlob
re: a python library for regular expression
WordCloud
allennlp: an open-source NLP research library, built on PyTorch

5. A few terms in NLP:

Stop words

Punctuation
Word embedding
Word segmentation
Text summarization
Regular expression
Morphological segmentation
Named entity recognition
Corpus: A collection of texts
Document-Term Matrix
n-gram: tokenize sentences by n words combination
Latent Dirichlet Allocation: a technique for topic modeling.

6. Word Embedding Libraries:

Word2Vec

Glove
Fasttext
Genism

7. Great Tutorials for NLTK & spaCy:

https://pythonspot.com/category/nltk/

Python for NLP: Tokenization, Stemming, and Lemmatization with SpaCy Library
https://course.spacy.io/

Stay tuned!

Close Menu