NER (Named Entity Recognition)
Data Intelligence AdvancedWhat is NER?
Named Entity Recognition (NER) is a natural language processing (NLP) technique that identifies and classifies named entities in text into predefined categories like person names, organizations, locations, dates, monetary values, and more. It’s how computers learn to “read” and understand text the way humans do.
NER extracts named entities from unstructured text. Turning words into knowable facts. When you can extract “Apple” as an organization vs. “apple” as a fruit, you’ve got NER working for you. This is essential for building knowledge graphs and intelligent search.
NER Categories
| Category | Examples |
|---|---|
| PERSON | “Elon Musk”, “Sundar Pichai” |
| ORG | “Google”, “Reliance Industries” |
| LOC | “Mumbai”, “Silicon Valley” |
| DATE | “January 15, 2024”, “Q3 2024” |
| MONEY | “$1.2 billion”, “₹50,000 crore” |
| PRODUCT | “iPhone 15”, “ChatGPT” |
NER Implementation
1. Using spaCy (Production-Ready)
import spacy
nlp = spacy.load("en_core_web_lg")
text = """Google CEO Sundar Pichai announced
a $1 billion investment in India."""
doc = nlp(text)
for ent in doc.ents:
print(f"{ent.text:20} → {ent.label_}")
# Output:
# Google → ORG
# Sundar Pichai → PERSON
# $1 billion → MONEY
# India → GPE
2. Using Hugging Face Transformers (State-of-the-Art)
from transformers import pipeline
ner = pipeline("ner", model="dslim/bert-base-NER")
text = "Apple Inc. is headquartered in Cupertino, California."
entities = ner(text)
for e in entities:
print(f"{e['word']:15} → {e['entity']} ({e['score']:.2f})")
Custom NER for Business Data
# Fine-tuned NER for product mentions
from transformers import AutoTokenizer, AutoModelForTokenClassification
model_name = "your-org/product-ner-v1"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Extract product mentions from reviews
def extract_products(review_text):
inputs = tokenizer(review_text, return_tensors="pt", truncation=True)
outputs = model(**inputs)
return decode_entities(outputs, inputs)
Use cases for scraped data:
- Extract company names from news articles
- Identify locations for geographic analysis
- Find product mentions in social media
- Build databases of people, companies, and connections
Related Terms
Need This at Scale?
Get enterprise-grade NER (Named Entity Recognition) implementation with our expert team.
Contact Us