Skip to content

## Issue: Semantic entity extraction with descriptions not working #100

Description

@iamgrootns

I'm trying to use GLiNER2 for semantic entity extraction in Hindi, where I define entity types with textual descriptions and the model extracts entities based on the semantic meaning of those descriptions - not just pattern matching.

Expected Behavior

When I provide entity types with descriptions like:

entities = {
    'WEAPON': 'हथियार - बंदूक, चाकू, या जो किसी को नुकसान पहुँचाने के लिए उपयोग होता है',
    'DRUG': 'नशीला पदार्थ - दवा, पाउडर या कोई भी पदार्थ जो नशा करने के लिए उपयोग होता है'
}
model.extract_entities(text, entities, threshold=0.3)

The model should:

  • Extract बंदूक when entity type is WEAPON (based on description semantics)
  • NOT extract बंदूक for DRUG or other categories
  • Understand new entity types it's never seen in training

Actual Behavior

Currently:

  • Same entity extracted in ALL categories simultaneously
  • Descriptions appear to be ignored
  • Model just does pattern matching, not semantic understanding

Test Code

from gliner2 import GLiNER2
model = GLiNER2.from_pretrained('./hindi_poc/checkpoint-epoch-2')  # trained on Hindi

entities = {
    'WEAPON': 'हथियार - बंदूक, चाकू, या जो किसी को नुकसान पहुँचाने के लिए उपयोग होता है',
    'DRUG': 'नशीला पदार्थ - दवा, पाउडर'
}

text = "आदमी के पास एक बंदूक और चाकू था"
pred = model.extract_entities(text, entities, threshold=0.3)
# Expected: WEAPON=['बंदूक', 'चाकू'], DRUG=[]
# Actual: WEAPON=['बंदूक'], DRUG=['बंदूक'], (wrong category for DRUG!)

Environment

  • GLiNER2 version: from git repository
  • Python: 3.10
  • Model: deberta-v3-base fine-tuned on Hindi IndicNER

Questions

  1. Does GLiNER2 support semantic extraction with entity descriptions?
  2. Are descriptions supposed to guide the model to the correct entity type?
  3. Is this a bug or intended behavior?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions