[Bug]: thinking_token_budget not enforced on re-entry after natural </think>

### Describe the bug

When a model naturally ends a thinking block (emits `</think>` on its own before exhausting the budget), the `ThinkingBudgetStateHolder` state machine fails to track subsequent thinking blocks. A second `<think>` block in the same completion is never recognized as "in think" mode, so the budget is never enforced on it.

This means a model that naturally ends one thinking block early can then open a new `<think>` block and reason indefinitely with no budget enforcement.

### Root Cause

After a natural `</think>`, `start_thinking` and `end_thinking` retain their values from Block 1 and are never reset. When Block 2's `<think>` appears, `_find_last_sequence_index` searches from position 0 (since `scan_offset` is not set for natural ends), finds the *original* Block 1 `<think>`, and since `start_thinking (0) < end_thinking (7)`, hits the "exiting think mode" branch — so Block 2 is never recognized as entering think mode.

Compare with forced-end re-entry (fixed in #43757): `scan_offset` advances past Block 1 after the forced close completes (line 461), so the re-entry `<think>` is correctly detected as new.

### Reproduction

```python
import torch
from dataclasses import dataclass
from unittest.mock import MagicMock
from vllm.v1.sample.thinking_budget_state import ThinkingBudgetStateHolder

THINK_START = 100
THINK_END = [200]
BUDGET = 10

@dataclass
class FakeReasoningConfig:
    reasoning_start_token_ids: list
    reasoning_end_token_ids: list
    enabled: bool = True

cfg = FakeReasoningConfig(
    reasoning_start_token_ids=[THINK_START],
    reasoning_end_token_ids=THINK_END,
)
holder = ThinkingBudgetStateHolder(
    reasoning_config=cfg, max_num_seqs=8,
    num_spec_tokens=0, device=torch.device("cpu"), is_pin_memory=False,
)

params = MagicMock()
params.thinking_token_budget = BUDGET
batch_update = MagicMock(removed=[], added=[(0, params, None, [])], moved=[])
holder.sync_batch(batch_update)

output = []

# Block 1: 6 tokens + natural </think>
output.append(THINK_START)
holder.update_state([list(output)], None, None)
for _ in range(6):
    output.append(60)  # think token
    holder.update_state([list(output)], None, None)
output.append(THINK_END[0])  # natural end
holder.update_state([list(output)], None, None)

# Content
for _ in range(3):
    output.append(50)
    holder.update_state([list(output)], None, None)

# Block 2: re-entry
output.append(THINK_START)
holder.update_state([list(output)], None, None)
for i in range(14):
    output.append(60)
    holder.update_state([list(output)], None, None)

state = holder._state[0]
assert state["in_end"], f"Block 2 should be budget-enforced after 14 tokens (budget={BUDGET}), but in_end={state['in_end']}"
```

### Expected behavior

Block 2 should be budget-enforced. Either:
- **Cumulative:** tokens from Block 1 (6) count toward the total, so Block 2 gets cut at 4
- **Per-block reset:** Block 2 gets a fresh budget of 10, enforced after 10 tokens

Either policy is acceptable; what's not acceptable is zero enforcement.

### Actual behavior

Block 2 never enters `in_think=True`, so the budget countdown never starts. The model can reason indefinitely in the second block.

### Environment
- vLLM: `vllm/vllm-openai:latest` (also confirmed on current `main`)
- Affects all models using `thinking_token_budget` that can produce multiple think blocks in one completion

### Related
- #43708 — forced-end re-entry (fix in PR #43757, scoped to `scan_offset > 0` case)
- #34668 — original thinking budget implementation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: thinking_token_budget not enforced on re-entry after natural </think> #45974

Describe the bug

Root Cause

Reproduction

Expected behavior

Actual behavior

Environment

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: thinking_token_budget not enforced on re-entry after natural </think> #45974

Description

Describe the bug

Root Cause

Reproduction

Expected behavior

Actual behavior

Environment

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions