Skip to content

Emit {0,n} for at_most so the regex compiles on the outlines-core backend#1871

Open
lollinng wants to merge 2 commits into
dottxt-ai:mainfrom
lollinng:fix/at-most-regex-quantifier
Open

Emit {0,n} for at_most so the regex compiles on the outlines-core backend#1871
lollinng wants to merge 2 commits into
dottxt-ai:mainfrom
lollinng:fix/at-most-regex-quantifier

Conversation

@lollinng

@lollinng lollinng commented Jun 5, 2026

Copy link
Copy Markdown

Problem

Term.at_most(n) / at_most(n, term) (the QuantifyMaximum DSL term) compiles to a regex of the form (...){,n}:

return f"({to_regex(term.term)}){{,{term.max_count}}}"   # -> "(a){,5}"

Python's re accepts the lower-bound-omitted {,n} form, but the default generation backend — outlines-core (Rust regex / regex-automata) — does not (it only supports {n}, {n,}, {n,m}). So any structured generation using an at_most quantifier raises when the Index/Guide is built:

>>> from outlines_core import Index, Vocabulary
>>> v = Vocabulary.from_pretrained("gpt2")
>>> Index("(a){,3}", v)
ValueError: Failed to build DFA error building NFA ...
>>> Index("(a){0,3}", v)   # OK

The whole at_most quantifier is effectively unusable with the default backend.

Fix

Emit the equivalent (...){0,n}. Semantically identical (0 to n repetitions), but valid for both re and outlines-core. A unit test asserted the old (a){,5} output; updated to (a){0,5}.

Verification

to_regex(at_most 5)            -> "(a){0,5}"
outlines_core.Index("(a){0,5}") -> builds OK   (previously "(a){,5}" failed)
pytest tests/types/test_dsl.py -k "to_regex or at_most" -> 2 passed

QuantifyMaximum (Term.at_most / at_most) compiled to (...){,n}. Python's re
accepts the lower-bound-omitted form, but the default outlines-core backend
(Rust regex-automata) does not, so any structured generation using at_most
raised 'Failed to build DFA' when the Index was built. Emit the equivalent
(...){0,n}. Updates the test that asserted the old output.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@RobinPicard RobinPicard left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!

@RobinPicard

Copy link
Copy Markdown
Contributor

A test in test_to_regex.py is failing actually. Can you fix it please?

The QuantifyMaximum assertion in test_to_regex.py still expected the
old (a){,2} form; align it with the (a){0,2} output.
@lollinng

Copy link
Copy Markdown
Author

Fixed in 802114b. The stale assertion was in tests/types/test_to_regex.py::test_to_regex_simple (line 76) — it still expected the old (a){,2} form. Updated it to (a){0,2} to match the new output. (The earlier PR only updated the equivalent assertion in test_dsl.py.)

- assert to_regex(a) == "(a){,2}"
+ assert to_regex(a) == "(a){0,2}"

The other CI failures (ollama multimodal 400s) are unrelated to this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants