Python NLP – Stemming and Lemmatization

In this Python NLP article, we are going to learn about NLP Stemming and Lemmatization in Python, so first of all we are going to talk about these concepts and after that we create our examples.

 

 

Understanding Tokenization in NLTK

Before we dive into Stemming and Lemmatization, let’s briefly touch upon the concept of Tokenization. Tokenization involves breaking down text into smaller units, typically words or sentences, to facilitate further analysis. NLTK (Natural Language Toolkit) provides robust functionalities for tokenizing text effortlessly in Python.

 

 

Learn How to Tokenize words in NLTK with Python

 

 

What is Stemming in NLP ? 

Stemming is a process to remove affixes from a word, ending up with the stem. or in literal term we can say that stemming is the process of cutting down the branches to its stem, using stemming we can cut down a word or token to its stem or base word. for example the word eat will have variations like like eating, eaten,eats. Stemming is most commonly used by search engines for indexing words. Instead of storing all forms of a word, a search engine can store only the stems. there are different stemmers that you can use in NLTK for example we have PorterStemmer, LancasterStemmer, SnowballStemmer.

 

 

 

Python NLP – Stemming and Lemmatization

So now let’s start from PorterStemer and it is the default choice for stemming.

 

 

You can see that in the above code we are going to stem the drinking word, and the result will be drink.

 

 

 

Now let’s just use LancasterStemmer, so LancasterStemmer is just like the  PorterStemmer, but It is known to be slightly more aggressive than the PorterStemmer.

 

 

 

So this will be the result.

 

 

Another type is SnowBallStemmer, the best thing about snowball stemmer is this, that it supports 13 language.

 

 

You can check the languages that are available for SnowBallStemmer.

 

 

Run the code and this is the result.

 

 

 

Now let’s stem a word using SnowBallStemmer. you can see that after creating the SnowBallStemmer we need to specify the language that we want to use.

 

 

 

Run the code this is the result.

 

 

What is Lemmatization in NLP ?

Lemmatization process is the same as stemming process,  but it brings context to the words. so it links words with similar meaning to one word. and unlike stemming, you are always left with a valid word that means the same thing. However, the word you end up with can be completely different.

 

 

 

This is the example for Lemmatization.

 

 

 

If you run the code this will be the result.

 

 

 

let’s see the difference between stem and lemmatization.

 

 

 

If you run the code this will be the result, You can see in this example the PortStemmer just removes the es, but the lematizer found the valid root for the word.

 

Subscribe and Get Free Video Courses & Articles in your Email

 

Leave a Comment

Share via
Copy link
Powered by Social Snap
×