I did a simple test with a clinical text, and it is quite surprising that the base model extracted the target info, but the large one was not able to. One would expect the larger model generates equally good or better results as/than the base model. Can someone please explain why it is? Or if I did something wrong:
result = extractor = GLiNER2.from_pretrained("fastino/gliner2-base-v1").extract_json(
"Subject has an ECOG performance status score of 0, 1 or 2.",
{
"functional_performance_cognitive_criterion": [
"performance_scale::list::ECOG, Karnofsky, NYHA performance scales",
"performance_score::list::Performance score values or percentages",
"adl_status::list::Activities of daily living status",
"mobility_status::list::Mobility status",
"frailty_status::list::Frailty status",
"cognitive_status::list::Cognitive status",
"capacity_status::list::Decision making capacity status"
]
}
)
print(result)
# {'functional_performance_cognitive_criterion': [{'performance_scale': ['ECOG'], 'performance_score': ['0'], 'adl_status': [], 'mobility_status': [], 'frailty_status': [], 'cognitive_status': [], 'capacity_status': []}, {'performance_scale': ['ECOG'], 'performance_score': ['1'], 'adl_status': [], 'mobility_status': [], 'frailty_status': [], 'cognitive_status': [], 'capacity_status': []}, {'performance_scale': ['ECOG'], 'performance_score': ['2'], 'adl_status': [], 'mobility_status': [], 'frailty_status': [], 'cognitive_status': [], 'capacity_status': []}]}
result = GLiNER2.from_pretrained("fastino/gliner2-large-v1").extract_json(
"Subject has an ECOG performance status score of 0, 1 or 2.",
{
"functional_performance_cognitive_criterion": [
"performance_scale::list::ECOG, Karnofsky, NYHA performance scales",
"performance_score::list::Performance score values or percentages",
"adl_status::list::Activities of daily living status",
"mobility_status::list::Mobility status",
"frailty_status::list::Frailty status",
"cognitive_status::list::Cognitive status",
"capacity_status::list::Decision making capacity status"
]
}
)
print(result)
# {'functional_performance_cognitive_criterion': {}}
I did a simple test with a clinical text, and it is quite surprising that the base model extracted the target info, but the large one was not able to. One would expect the larger model generates equally good or better results as/than the base model. Can someone please explain why it is? Or if I did something wrong: