In this Python NLP article we are going to talk about Parts of Speech Tagging (POS) in NLP,
so first of all we are going to talk that what is Part of Speech and after that we talk about Part
of Speech Tagging (POS).
What is Parts of Speech ?
Parts-of-speech (POS) is one of the many tasks in NLP, you may have heard about Part of Speech
(POS). so In English the main parts of speech are noun, pronoun, adjective, determiner, verb,
adverb, preposition, conjunction, and interjection. Before this you will be familiar that what are
adjectives or what are adverbs and what are difference between these. Now as a human you will
know about this, but let’s think about the system where we can encode all this knowledge.
The parts-of-speech tag identifies whether a word is a noun, verb, adjective, and so on. There
are numerous applications of parts-of-speech tagging, such as information retrieval, machine
translation and so on.
Learn How to use Wordnet And Synset in NLTK with Python
What is Parts of Speech Tagging (POS) ?
Parts-of-speech tagging is the process of assigning a category (for example, noun,
verb, adjective, and so on) tag to individual tokens in a sentence. In NLTK, taggers
are present in the nltk.tag package and it is inherited by the TaggerIbase class.
So now let’s create an example
1 2 3 4 5 6 7 |
from nltk.tokenize import word_tokenize from nltk import pos_tag text=word_tokenize("python is a good language") print(pos_tag(text)) |
Run the code and this will be the result and you can see that we have done Parts of Speech Tagging
(POS) in our sentence, first we have tokenized the sentence to words and after that we have done
Parts of Speech Tagging.
1 |
[('python', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('good', 'JJ'), ('language', 'NN')] |
If you don’t know for example what is NN or what is VBZ, there is a function in NLTK that you
can use.
1 2 3 4 5 6 7 8 9 10 |
from nltk.tokenize import word_tokenize from nltk import pos_tag, help text=word_tokenize("python is a good language") print(pos_tag(text)) print(help.upenn_tagset('NNS')) |
So now in this example we are going to know about NNS, you can see that it is Noun, Common
and Plural.
1 2 3 4 5 |
[('python', 'NN'), ('is', 'VBZ'), ('a', 'DT'), ('good', 'JJ'), ('language', 'NN')] NNS: noun, common, plural undergraduates scotches bric-a-brac products bodyguards facets coasts divestitures storehouses designs clubs fragrances averages subjectivists apprehensions muses factory-jobs ... |
So in the second example we are going to use wikipedia library and we are going to extract data
from wikipedia, first of all you need to install wikipedia library, you can use pip for in the
installation, pip install wikipedia.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
from nltk.tokenize import word_tokenize from nltk import pos_tag import wikipedia grammer = wikipedia.summary('grammer') gra_token = word_tokenize(grammer) sample_pos = pos_tag(gra_token) print(sample_pos) |
So now run the code and this will be the result.
1 2 3 4 5 6 7 8 9 10 11 12 |
[('In', 'IN'), ('linguistics', 'NNS'), (',', ','), ('grammar', 'NN'), ('(', '('), ('from', 'IN'), ('Ancient', 'NNP'), ('Greek', 'NNP'), ('γραμματική', 'NNP'), (')', ')'), ('is', 'VBZ'), ('the', 'DT'), ('set', 'NN'), ('of', 'IN'), ('structural', 'J J'), ('rules', 'NNS'), ('governing', 'VBG'), ('the', 'DT'), ('composition', 'NN'), ('of', 'IN'), ('clauses', 'NNS'), (',', ','), ('phrases', 'NNS'), ('and', 'CC'), ('words', 'NNS'), ('in', 'IN'), ('a', 'DT'), ('natural', 'JJ'), ('language', 'NN'), ('.', '.'), ('The', 'DT'), ('term', 'NN'), ('refers', 'NNS'), ('also', 'RB'), ('to', 'TO'), ('the', 'DT'), ('study', 'NN'), ('of', 'IN'), ('such', 'JJ'), ('rules', 'NNS'), ('and', 'CC'), ('this', 'DT'), ('field', 'NN'), ('includes', 'VBZ'), ('phonology', 'NN'), (',', ','), ('morphology', 'NN'), ('and', 'CC'), ('syntax', 'NN'), (',', ','), ('often', 'RB'), ('complemented', 'VBN'), ('by', 'IN'), ('phonetics', 'NNS'), (',', ','), ('semantics', 'NNS'), ('and', 'CC'), ('pragmatics', 'NNS'), ('.', '.'), ('Fluent', 'JJ'), ('speakers', 'NNS'), ('of', 'IN'), ('a', 'DT'), ('language', 'NN'), ('variety', 'NN'), ('or', 'CC'), ('lect', 'NN'), ('have', 'VBP'), ('a', 'DT'), ('set', 'NN'), ('of', 'IN'), ('internalized', 'JJ'), ('rules', 'NNS'), ('which', 'WDT'), ('constitutes', 'VBZ'), ('its', 'PRP$'), ('grammar', 'NN'), ('.', '.'), ('The', 'DT'), ('vast', 'JJ'), ('majority', 'NN'), ('of', 'IN'), ('the', 'DT'), ('information', 'NN'), ('in', 'IN'), ('the', 'DT'), ('grammar', 'NN'), ('is', 'VBZ'), ('–', 'VBN'), ('at', 'IN'), ('least', 'JJS'), ('in', 'IN'), ('the', 'DT'), ('case', 'NN'), ('of', 'IN'), ('one', 'CD'), ("'s", 'POS'), ('native', 'JJ'), ('language', 'NN'), ('–', 'NN'), ('acquired', 'VBD'), ('not', 'RB'), ('by', 'IN'), ('conscious', 'JJ'), ('study', 'NN'), ('or', 'CC'), ('instruction', 'NN'), ('but', 'CC'), ('by', 'IN'), ('hearing', 'VBG'), ('other', 'JJ'), ('speakers', 'NNS'), ('.', '.'), ('Much', 'RB'), ('of', 'IN'), ('this', 'DT'), ('work', 'NN'), ('is', 'VBZ'), ('done', 'VBN'), ('during', 'IN'), ('early', 'JJ'), ('childhood', 'NN'), (';', ':'), ('learning', 'VBG'), ('a', 'DT'), ('language', 'NN'), ('later', 'RB'), ('in', 'IN'), ('life', 'NN'), ('usually', 'RB'), ('involves', 'VBZ'), ('more', 'JJR'), ('explicit', 'JJ'), ('instruction', 'NN'), ('.', '.'), ('Thus', 'RB'), (',', ','), ('grammar', 'NN'), ('is', 'VBZ'), ('the', 'DT'), ('cognitive', 'JJ'), ('information', 'NN'), ('underlying', 'VBG'), ('language', 'NN'), ('use', 'NN'), ('.', '.'), ('The', 'DT'), ('term', 'NN'), ('``', '``'), ('grammar', 'NN'), ("''", "''"), ('can', 'MD'), ('also', 'RB'), ('describe', 'VB'), ('the', 'DT'), ('rules', 'NNS'), ('which', 'WDT'), ('govern', 'VBP'), ('the', 'DT'), ('linguistic', 'JJ'), ('behavior', 'NN'), ('of', 'IN'), ('a', 'DT'), ('group', 'NN'), ('of', 'IN'), ('speakers', 'NNS'), ('.', '.'), ('For', 'IN'), ('example', 'NN'), (',', ','), ('the', 'DT'), ('term', 'NN'), ('``', '``'), ('English', 'JJ'), ('grammar', 'NN'), ("''", "''"), ('may', 'MD'), ('refer', 'VB'), ('to', 'TO'), ('the', 'DT'), ('whole', 'NN'), ('of', 'IN'), ('English', 'NNP'), ('grammar', 'NN'), (';', ':'), ('that', 'WDT'), ('is', 'VBZ'), (',', ','), ('to', 'TO'), ('the', 'DT'), ('grammars', 'NNS'), ('of', 'IN'), ('all', 'PDT'), ('the', 'DT'), ('speakers', 'NNS'), ('of', 'IN'), ('the', 'DT'), ('language', 'NN'), (',', ','), ('in', 'IN'), ('which', 'WDT'), ('case', 'NN'), ('the', 'DT'), ('term', 'NN'), ('encompasses', 'VBZ'), ('a', 'DT'), ('great', 'JJ'), ('deal', 'NN'), ('of', 'IN'), ('variation', 'NN'), ('.', '.'), ('Alternatively', 'RB'), (',', ','), ('it', 'PRP'), ('may', 'MD'), ('refer', 'VB'), ('only', 'RB'), ('to', 'TO'), ('what', 'WP'), ('is', 'VBZ'), ('common', 'JJ'), ('to', 'TO'), ('the', 'DT'), ('grammars', 'NNS'), ('of', 'IN'), ('all', 'DT'), ('or', 'CC'), ('most', 'JJS'), ('English', 'JJ'), ('speakers', 'NNS'), ('(', '('), ('such', 'JJ'), ('as', 'IN'), ('subject–verb–object', 'JJ'), ('word', 'NN'), ('order', 'NN'), ('in', 'IN'), ('simple', 'JJ'), ('declarative', 'JJ'), ('sentences', 'NNS'), (')', ')'), ('.', '.'), ('It', 'PRP'), ('may', 'MD'), ('also', 'RB'), ('refer', 'VB'), ('to', 'TO'), ('the', 'DT'), ('rules', 'NNS'), ('of', 'IN'), ('one', 'CD'), ('relatively', 'RB'), ('well-defined', 'JJ'), ('form', 'NN'), ('of', 'IN'), ('English', 'NNP'), ('(', '('), ('such', 'JJ'), ('as', 'IN'), ('standard', 'JJ'), ('English', 'NNP'), ('for', 'IN'), ('a', 'DT'), ('region', 'NN'), (')', ')'), ('.', '.'), ('A', 'DT'), ('description', 'NN'), (',', ','), ('study', 'NN'), (',', ','), ('or', 'CC'), ('analysis', 'NN'), ('of', 'IN'), ('such', 'JJ'), ('rules', 'NNS'), ('may', 'MD'), ('also', 'RB'), ('be', 'VB'), ('referred', 'VBN'), ('to', 'TO'), ('as', 'IN'), ('a', 'DT'), ('grammar', 'NN'), ('.', '.'), ('A', 'DT'), ('reference', 'NN'), ('book', 'NN'), ('describing', 'VBG'), ('the', 'DT'), ('grammar', 'NN'), ('of', 'IN'), ('a', 'DT'), ('language', 'NN'), ('is', 'VBZ'), ('called', 'VBN'), ('a', 'DT'), ('``', '``'), ('reference', 'NN'), ('grammar', 'NN'), ("''", "''"), ('or', 'CC'), ('simply', 'RB'), ('``', '``'), ('a', 'DT'), ('grammar', 'NN'), ("''", "''"), ('(', '('), ('see', 'VB'), ('History', 'NNP'), ('of', 'IN'), ('English', 'NNP'), ('grammars', 'NNS'), (')', ')'), ('.', '.'), ('A', 'DT'), ('fully', 'RB'), ('explicit', 'JJ'), ('grammar', 'NN'), ('which', 'WDT'), ('exhaustively', 'RB'), ('describes', 'VBZ'), ('the', 'DT'), ('grammatical', 'JJ'), ('constructions', 'NNS'), ('of', 'IN'), ('a', 'DT'), ('particular', 'JJ'), ('speech', 'NN'), ('variety', 'NN'), ('is', 'VBZ'), ('called', 'VBN'), ('a', 'DT'), ('descriptive', 'JJ'), ('grammar', 'NN'), ('.', '.'), ('This', 'DT'), ('kind', 'NN'), ('of', 'IN'), ('linguistic', 'JJ'), ('description', 'NN'), ('contrasts', 'NNS'), ('with', 'IN'), ('linguistic', 'JJ'), ('prescription', 'NN'), (',', ','), ('an', 'DT'), ('attempt', 'NN'), ('to', 'TO'), ('actively', 'RB'), ('discourage', 'VB'), ('or', 'CC'), ('suppress', 'VB'), ('some', 'DT'), ('grammatical', 'JJ'), ('constructions', 'NNS'), (',', ','), ('while', 'IN'), ('codifying', 'VBG'), ('and', 'CC'), ('promoting', 'VBG'), ('others', 'NNS'), (',', ','), ('either', 'RB'), ('in', 'IN'), ('an', 'DT'), ('absolute', 'JJ'), ('sense', 'NN'), ('or', 'CC'), ('about', 'IN'), ('a', 'DT'), ('standard', 'JJ'), ('variety', 'NN'), ('.', '.'), ('For', 'IN'), ('example', 'NN'), (',', ','), ('some', 'DT'), ('prescriptivists', 'NNS'), ('maintain', 'VBP'), ('that', 'IN'), ('sentences', 'NNS'), ('in', 'IN'), ('English', 'NNP'), ('should', 'MD'), ('not', 'RB'), ('end', 'VB'), ('with', 'IN'), ('prepositions', 'NNS'), (',', ','), ('a', 'DT'), ('prohibition', 'NN'), ('that', 'WDT'), ('has', 'VBZ'), ('been', 'VBN'), ('traced', 'VBN'), ('to', 'TO'), ('John', 'NNP'), ('Dryden', 'NNP'), ('(', '('), ('13', 'CD'), ('April', 'NNP'), ('1668', 'CD'), ('–', 'NNP'), ('January', 'NNP'), ('1688', 'CD'), (')', ')'), ('whose', 'WP$'), ('unexplained', 'JJ'), ('objection', 'NN'), ('to', 'TO'), ('the', 'DT'), ('practice', 'NN'), ('perhaps', 'RB'), ('led', 'VBD'), ('other', 'JJ'), ('English', 'JJ'), ('speakers', 'NNS'), ('to', 'TO'), ('avoid', 'VB'), ('the', 'DT'), ('construction', 'NN'), ('and', 'CC'), ('discourage', 'VB'), ('its', 'PRP$'), ('use', 'NN'), ('.', '.'), ('Yet', 'CC'), ('preposition', 'NN'), ('stranding', 'NN'), ('has', 'VBZ'), ('a', 'DT'), ('long', 'JJ'), ('history', 'NN'), ('in', 'IN'), ('Germanic', 'NNP'), ('languages', 'NNS'), ('like', 'IN'), ('English', 'NNP'), (',', ','), ('where', 'WRB'), ('it', 'PRP'), ('is', 'VBZ'), ('so', 'RB'), ('widespread', 'JJ'), ('as', 'IN'), ('to', 'TO'), ('be', 'VB'), ('a', 'DT'), ('standard', 'JJ'), ('usage', 'NN'), ('.', '.'), ('Outside', 'IN'), ('linguistics', 'NNS'), (',', ','), ('the', 'DT'), ('term', 'NN'), ('grammar', 'NN'), ('is', 'VBZ'), ('often', 'RB'), ('used', 'VBN'), ('in', 'IN'), ('a', 'DT'), ('rather', 'RB'), ('different', 'JJ'), ('sense', 'NN'), ('.', '.'), ('It', 'PRP'), ('may', 'MD'), ('be', 'VB'), ('used', 'VBN'), ('more', 'RBR'), ('broadly', 'RB'), ('to', 'TO'), ('include', 'VB'), ('conventions', 'NNS'), ('of', 'IN'), ('spelling', 'VBG'), ('and', 'CC'), ('punctuation', 'NN'), (',', ','), ('which', 'WDT'), ('linguists', 'VBZ'), ('would', 'MD'), ('not', 'RB'), ('typically', 'RB'), ('consider', 'VB'), ('as', 'IN'), ('part', 'NN'), ('of', 'IN'), ('grammar', 'NN'), ('but', 'CC'), ('rather', 'RB'), ('as', 'IN'), ('part', 'NN'), ('of', 'IN'), ('orthography', 'NN'), (',', ','), ('the', 'DT'), ('conventions', 'NNS'), ('used', 'VBD'), ('for', 'IN'), ('writing', 'VBG'), ('a', 'DT'), ('language', 'NN'), ('.', '.'), ('It', 'PRP'), ('may', 'MD'), ('also', 'RB'), ('be', 'VB'), ('used', 'VBN'), ('more', 'RBR'), ('narrowly', 'RB'), ('to', 'TO'), ('refer', 'VB'), ('to', 'TO'), ('a', 'DT'), ('set', 'NN'), ('of', 'IN'), ('prescriptive', 'JJ'), ('norms', 'NNS'), ('only', 'RB'), (',', ','), ('excluding', 'VBG'), ('those', 'DT'), ('aspects', 'NNS'), ('of', 'IN'), ('a', 'DT'), ('language', 'NN'), ("'s", 'POS'), ('grammar', 'NN'), ('which', 'WDT'), ('are', 'VBP'), ('not', 'RB'), ('subject', 'JJ'), ('to', 'TO'), ('variation', 'NN'), ('or', 'CC'), ('debate', 'NN'), ('on', 'IN'), ('their', 'PRP$'), ('normative', 'JJ'), ('acceptability', 'NN'), ('.', '.'), ('Jeremy', 'NNP'), ('Butterfield', 'NNP'), ('claimed', 'VBD'), ('that', 'IN'), (',', ','), ('for', 'IN'), ('non-linguists', 'NNS'), (',', ','), ('``', '``'), ('Grammar', 'NNP'), ('is', 'VBZ'), ('often', 'RB'), ('a', 'DT'), ('generic', 'JJ'), ('way', 'NN'), ('of', 'IN'), ('referring', 'VBG'), ('to', 'TO'), ('any', 'DT'), ('aspect', 'NN'), ('of', 'IN'), ('English', 'NNP'), ('that', 'IN'), ('people', 'NNS'), ('object', 'VBP'), ('to', 'TO'), ('.', '.'), ("''", "''")] |
Now we are going to separate NN and NNP from our text.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
from nltk.tokenize import word_tokenize from nltk import pos_tag import wikipedia grammer = wikipedia.summary('grammer') gra_token = word_tokenize(grammer) sample_pos = pos_tag(gra_token) all_noun = [word for word,pos in sample_pos if pos in ['NN','NNP'] ] print(all_noun) |
If you run the code this will be the result.
1 2 3 4 5 6 7 8 9 10 |
['grammar', 'Ancient', 'Greek', 'γραμματική', 'set', 'composition', 'language', 'term', 'study', 'field', 'phonology', 'morphology', 'syntax', 'language', 'variety', 'lect', 'set', 'grammar', 'majority', 'information', 'grammar', 'case', 'language', '–', 'study', 'instruction', 'work', 'childhood', 'language', 'life', 'instruction', 'grammar', 'information', 'language', 'use', 'term', 'grammar', 'behavior', 'group', 'example', 'term', 'grammar', 'whole', 'English', 'grammar', 'language', 'case', 'term', 'deal', 'variation', 'word', 'order', 'form', 'English', 'English', 'region', 'description', 'study', 'analysis', 'grammar', 'reference', 'book', 'grammar', 'language', 'reference', 'grammar', 'grammar', 'History', 'English', 'grammar', 'speech', 'variety', 'grammar', 'kind', 'description', 'prescription', 'attempt', 'sense', 'variety', 'example', 'English', 'prohibition', 'John', 'Dryden', 'April', '–', 'January', 'objection', 'practice', 'construction', 'use', 'preposition', 'stranding', 'history', 'Germanic', 'English', 'usage', 'term', 'grammar', 'sense', 'punctuation', 'part', 'grammar', 'part', 'orthography', 'language', 'set', 'language', 'grammar', 'variation', 'debate', 'acceptability', 'Jeremy', 'Butterfield', 'Grammar', 'way', 'aspect', 'English'] |
Subscribe and Get Free Video Courses & Articles in your Email
Comments are closed.