I'm trying to use GLiNER2 for semantic entity extraction in Hindi, where I define entity types with textual descriptions and the model extracts entities based on the semantic meaning of those descriptions - not just pattern matching.
Expected Behavior
When I provide entity types with descriptions like:
entities = {
'WEAPON': 'हथियार - बंदूक, चाकू, या जो किसी को नुकसान पहुँचाने के लिए उपयोग होता है',
'DRUG': 'नशीला पदार्थ - दवा, पाउडर या कोई भी पदार्थ जो नशा करने के लिए उपयोग होता है'
}
model.extract_entities(text, entities, threshold=0.3)
The model should:
- Extract
बंदूक when entity type is WEAPON (based on description semantics)
- NOT extract
बंदूक for DRUG or other categories
- Understand new entity types it's never seen in training
Actual Behavior
Currently:
- Same entity extracted in ALL categories simultaneously
- Descriptions appear to be ignored
- Model just does pattern matching, not semantic understanding
Test Code
from gliner2 import GLiNER2
model = GLiNER2.from_pretrained('./hindi_poc/checkpoint-epoch-2') # trained on Hindi
entities = {
'WEAPON': 'हथियार - बंदूक, चाकू, या जो किसी को नुकसान पहुँचाने के लिए उपयोग होता है',
'DRUG': 'नशीला पदार्थ - दवा, पाउडर'
}
text = "आदमी के पास एक बंदूक और चाकू था"
pred = model.extract_entities(text, entities, threshold=0.3)
# Expected: WEAPON=['बंदूक', 'चाकू'], DRUG=[]
# Actual: WEAPON=['बंदूक'], DRUG=['बंदूक'], (wrong category for DRUG!)
Environment
- GLiNER2 version: from git repository
- Python: 3.10
- Model: deberta-v3-base fine-tuned on Hindi IndicNER
Questions
- Does GLiNER2 support semantic extraction with entity descriptions?
- Are descriptions supposed to guide the model to the correct entity type?
- Is this a bug or intended behavior?
I'm trying to use GLiNER2 for semantic entity extraction in Hindi, where I define entity types with textual descriptions and the model extracts entities based on the semantic meaning of those descriptions - not just pattern matching.
Expected Behavior
When I provide entity types with descriptions like:
The model should:
बंदूकwhen entity type is WEAPON (based on description semantics)बंदूकfor DRUG or other categoriesActual Behavior
Currently:
Test Code
Environment
Questions