Skip to content
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
146 changes: 146 additions & 0 deletions humanlm/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# Recipe: HumanLM

Train user simulators by aligning on psychological state dimensions (belief, emotion, stance, value, goal, communication) instead of imitating response text.

**Paper:** [HUMANLM: Simulating Users with State Alignment Beats Response Imitation](https://humanlm.stanford.edu/HumanLM_paper.pdf)

**Project Page:** [https://humanlm.stanford.edu/](https://humanlm.stanford.edu/)

## Environment Setup

### Install Dependencies

```bash
# Install verl with trainer support
pip install -e /path/to/verl # verl repo with trainer module

# Additional dependencies
pip install litellm datasets polars
```

### Configure API Keys
```bash
# Required for LLM-as-judge rewards (RL training)
export ANTHROPIC_API_KEY="your-key"
export OPENAI_API_KEY="your-key"
```

---

## Datasets

Official HuggingFace datasets:

| Dataset | HuggingFace Repo | `--dataset` arg |
|---------|------------------|-----------------|
| Humanual-Books | `snap-stanford/humanual-book` | `amazon` |
| Humanual-Opinion | `snap-stanford/humanual-opinion` | `reddit` |
| Humanual-Politics | `snap-stanford/humanual-politics` | `medium` |
| Humanual-News | `snap-stanford/humanual-news` | `youtube` |
| Humanual-Chat | `snap-stanford/humanual-chat` | `wildchat_english` |
| Humanual-Email | `snap-stanford/humanual-email` | `enron` |

---

## SFT Training

### Step 1: Process Dataset

Convert HuggingFace dataset to SFT format:

```bash
# No-thinking mode (response only)
python -m humanlm.process_dataset \
--dataset amazon \
--raw_dataset_repo snap-stanford/humanual-book \
--save_data_dir ./data/humanual-book \
--sft \
--no_tag

# With thinking traces (requires API key for trace generation)
python -m humanlm.process_dataset \
--dataset amazon \
--raw_dataset_repo snap-stanford/humanual-book \
--save_data_dir ./data/humanual-book \
--sft \
--thinking_sft \
--thinking_model gpt-4o-mini
```

This creates:
```
./data/humanual-book/
└── sft/
└── r_no_tag/
├── train.parquet
├── val.parquet
└── test.parquet
```

### Step 2: Run SFT Training

```bash
DATASET_DIR=./data/humanual-book bash humanlm/train_sft_humanlm.sh \
"0,1,2,3,4,5,6,7" \
amazon \
no_thinking
```

**Arguments:**
| Position | Name | Example | Description |
|----------|------|---------|-------------|
| 1 | GPU_LIST | `"0,1,2,3,4,5,6,7"` | Comma-separated GPU IDs |
| 2 | DATASET_NAME | `amazon` | Dataset identifier |
| 3 | THINKING_MODE | `no_thinking` or `thinking` | Whether to use thinking traces |

**Environment Variables:**
| Variable | Default | Description |
|----------|---------|-------------|
| `DATASET_DIR` | (required) | Path to processed data |
| `OUTPUT_ROOT` | `./outputs` | Where to save checkpoints |
| `HF_CACHE_DIR` | system default | HuggingFace cache location |

**Output:**
- Model checkpoints: `./outputs/sft_amazon_no_thinking_r_no_tag/`
- WandB project: `humanlm`

---

## RL Training (GRPO)

Before training, update ```cluster_config.sh``` with your custom project paths and your .env file.

### Train HumanLM
```bash
bash humanlm/train_rl_humanlm.sh \
"0,1,2,3,4,5,6,7" \
amazon \
train_humanlm \
"" \
base
```

### Evaluation
```bash
bash humanlm/train_rl_humanlm.sh \
"0,1,2,3,4,5,6,7" \
amazon \
eval_only \
"/path/to/checkpoint" \
humanlm
```

---

## Citation

```bibtex
@article{wu2026humanlm,
title={HUMANLM: Simulating Users with State Alignment Beats Response Imitation},
url={https://humanlm.stanford.edu/},
author={Wu, Shirley and Choi, Evelyn and Khatua, Arpandeep and
Wang, Zhanghan and He-Yueya, Joy and Weerasooriya, Tharindu Cyril and
Wei, Wei and Yang, Diyi and Leskovec, Jure and Zou, James},
year={2026}
}
```
101 changes: 101 additions & 0 deletions humanlm/chat_templates/qwen3_multi_role_template_think.jinja
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
{%- if tools %}
{{- '<|im_start|>system\n' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '\n\n' }}
{%- endif %}
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- if message.content is string %}
{%- set content = message.content %}
{%- else %}
{%- set content = '' %}
{%- endif %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '<|im_start|>' + message.role + '\n' }}
{%- if message.name %}<name>{{ message.name }}</name>
{%- endif -%}
{{- '\n' + message.content | trim + '<|im_end|>\n' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '</think>' in content %}
{%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
{%- set content = content.split('</think>')[-1].lstrip('\n') %}
{%- endif %}
{%- endif %}
{{- '<|im_start|>' + message.role + '\n' }}
{%- if message.name %}<name>{{ message.name }}</name>
{{- '\n' }}
{%- endif %}
{%- if loop.index0 > ns.last_query_index %}
{%- if loop.last or (not loop.last and reasoning_content) %}
{{- '<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
{%- else %}
{{- content }}
{%- endif %}
{%- else %}
{{- content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content) or (not loop.first) %}
{{- '\n' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '<tool_call>\n{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}\n</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- content }}
{{- '\n</tool_response>' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>user\n' }}
{%- if speak_as is defined and speak_as %}<name>{{ speak_as }}</name>
{%- endif -%}
{{- '\n' }}
{%- if enable_thinking is defined and enable_thinking is false %}
{{- '<think>\n\n</think>\n\n' }}
{%- elif enable_thinking is defined and enable_thinking is true %}
{{- '<think>' }}
{%- endif %}
{%- endif %}
18 changes: 18 additions & 0 deletions humanlm/cluster_config.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# This should have your llm judge api keys, as well as wandb api key
export ENV_FILE="$PROJECT_DIR/.env"

# Paths for RL training
export PROJECT_DIR="/path/to/shared/project"
export SCRATCH_DIR="/path/to/your/scratch/$USER"

export DATASET_DIR="$PROJECT_DIR/llm_twin/processed_data"
export MODEL_PATH="$PROJECT_DIR/llm_twin/models/Qwen3-8B"
export CACHE_DIR="$PROJECT_DIR/llm_twin/verl_cache"
export OUTPUT_DIR="$SCRATCH_DIR/humanlm_outputs/$EXP_NAME"

# Set Cache directories
export HF_HOME="$SCRATCH_DIR/hf"
export HF_DATASETS_CACHE="$HF_HOME/datasets"
export TRANSFORMERS_CACHE="$HF_HOME/transformers"
export HUGGINGFACE_HUB_CACHE="$HF_HOME/hub"
export XDG_CACHE_HOME="$HF_HOME/xdg"
2 changes: 2 additions & 0 deletions humanlm/configs/humanlm_agent_loop_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
- name: humanlm_agent
_target_: recipe.humanlm.humanlm_agent_loop.HumanLMAgentLoop
Loading
Loading