Spacy NER Example



1. In your virtual env..install spacy
   pip install --upgrade spacy.

2. Install Jupyter too,...as visualization is handy
  python -m pip install jupyter

3. Load default model for spacy

  python -m spacy download en

4. Invoke Jupyter

 jupyter notebook --no-browser --NotebookApp.token='' --ip='*'



5.
Updating existing model to include a NER. Uber is not detected by default model. Let's add it.
-----------------------------------------------------------------------------------------------NoteBook--

import spacy
import random
from spacy import displacy

nlp = spacy.load('en')
train_data = [("Uber blew through $1 million", {'entities': [(0, 4, 'ORG'),(17, 28, 'MONEY')]})]
for text,_ in train_data:
    doc=nlp(text)
    displacy.render(doc, style='ent', jupyter=True) # If u use display.serve it will try to serve at port 5k
# We see Uber is not picked up...let use the train_data to update the model
with nlp.disable_pipes (*[pipe for pipe in nlp.pipe_names if pipe != 'ner']):
    optimizer = nlp.begin_training ()
for i in range (10):
    random.shuffle (train_data)
    for text, annotations in train_data:
        nlp.update ([text], [annotations], sgd=optimizer)
#Save the Model
nlp.to_disk("/nlp/model5")
-------------------------------------------------------------------------------------------Client 

import spacy
import random
from spacy import displacy
nlp = spacy.load('/nlp/model5')
text1="Uber brought in $2.8 billion in revenue in the second quarter of 2018, but ultimately lost $891 million thanks" \
"to increased spending by the ride-hailing company, according to Bloomberg."
text2="Ride-hailing company Uber is still on track to book more than $10 billion in revenue this year".
foo22=nlp(text1)
displacy.render(foo22, style='ent', jupyter=True)
foo23=nlp(text2)
displacy.render(foo23, style='ent', jupyter=True)


P.S:
If you use displacy.serve(doc, style='ent') serving on port 5000... using the 'dep' visualizer takes long..
Use displacy.render(foo22, style='ent', jupyter=True) instead.

Comments

Popular posts from this blog

ScoreCard Model using R

The auxService:mapreduce_shuffle does not exist

Zeppelin and Anaconda