Understand Various Steps in Natural Language Processing (NLP) under 5 minutes
January 23, 2023 2023-01-23 1:44Understand Various Steps in Natural Language Processing (NLP) under 5 minutes
Understand Various Steps in Natural Language Processing (NLP) under 5 minutes
Natural Language Processing or NLP for short, is broadly defined as the automatic manipulation of Languages, like speech and text by software.
The study of Natural Language Processing has been around for more than 4 decades.
In this blog, you will discover the steps in NLP
Tokenization:
The Process of breaking the string into smaller tokens is called Tokenization.
Example:COPYCOPY
My Name is Aman
after breaking this into token we getCOPYCOPY
'My' , 'Name' , 'is' , 'Aman'
Stemming:
Normalizing the words into their base form or root form is called Stemming.
Example: All the below words are considered as one:COPYCOPY
Affections, Affects, Affected, Affecting
All the above token will be converted to their root form that isCOPYCOPY
Affect
It simply tries to remove all possible and basic prefix and postfix to a work
Lemmatization:
Takes care of Morphological analysis of word
- Groups together different inflected forms of the word called lemma
- Somehow similar to stemming, as it maps several words into one common root
- Output of lemmatization is a proper word
Example:COPYCOPY
Lemmatiser should map gone, going, went into go
POS Tag: Part of Speech
Here the words are mapped with their Parts of Speech
Example:
The | Dog | Killed | the | Bat |
DT | NN | VBD | DT | NN |
List of Universal POS Tag
Name Entity Recognition:
It is used to Identify or Recognize the name of Movie/ Organisation, Location, person, and so on
Example:COPYCOPY
Google's CEO Sundar Pichai introduced the new Pixel Phone at New York
and after Name Entity Recognition it shall be
Google’s | CEO | Sundar Pichai | introduced | the | new | Pixel | Phone | at | New York |
Organisation | Person | Object | Location |
Chunking
Picking up Individual pieces of Information and grouping them into bigger pieces.