openai
diff --git a/‎.gitignore‎
Lines changed: 3 additions & 0 deletions b/‎.gitignore‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 0 deletions b/‎README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 1 addition & 1 deletion b/‎pyproject.toml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/gabriel/__init__.py‎
Lines changed: 2 additions & 0 deletions b/‎src/gabriel/__init__.py‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎src/gabriel/_version.py‎
Lines changed: 1 addition & 1 deletion b/‎src/gabriel/_version.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/gabriel/api.py‎
Lines changed: 175 additions & 0 deletions b/‎src/gabriel/api.py‎
Lines changed: 175 additions & 0 deletions
diff --git a/‎src/gabriel/prompts/persona_prompt.jinja2‎
Lines changed: 118 additions & 0 deletions b/‎src/gabriel/prompts/persona_prompt.jinja2‎
Lines changed: 118 additions & 0 deletions
@@ -0,0 +1,3 @@
+__pycache__/
+.pytest_cache/
+responses.csv
@@ -62,6 +62,7 @@ The tutorial notebook walks through these ideas step-by-step—from setting up a
 | `gabriel.compare` | Identifies similarities / differences between paired items. Output = list of differences. | Contrast op-eds from different districts; compare two ad campaigns. |
 | `gabriel.bucket` | Builds taxonomies from many terms. Output = bucket/cluster labels. | Group technologies, artworks, or HR complaints into emergent categories. |
 | `gabriel.seed` | Enforces a representative distribution / diversity of seeds. | Initialize unique personas that match US population distribution. |
+| `gabriel.poll` | Seeds personas, expands them into full biographies, and surveys them. | Simulate a synthetic opinion poll on policy, trust, and open-ended attitudes. |
 | `gabriel.ideate` | Generates many novel scientific theories and filters the cream of the crop. | Procure novel theories on inflation for potential research. |
 | `gabriel.debias` | Post-process measurements to remove inference bias. | Ensure GPT isn't guessing climate opinions in speeches based on general political lean. |
 | `gabriel.load` | Prepares a folder of text / image / audio files into a spreadsheet for use in GABRIEL. | Image directory converted into spreadsheet of file paths. |
 
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "openai-gabriel"
-version = "1.1.4"
+version = "1.1.5"
 description = "LLM-based library to measure quantitative attributes on qualitative data"
 authors = [
     {name = "Hemanth Asirvatham"},
 
@@ -23,6 +23,7 @@
     view,
     bucket,
     seed,
+    poll,
 )
 from .utils import load
 
@@ -42,6 +43,7 @@
     "compare",
     "discover",
     "seed",
+    "poll",
     "deduplicate",
     "merge",
     "filter",
 
@@ -1 +1 @@
-__version__ = "1.1.4"
+__version__ = "1.1.5"
@@ -34,6 +34,8 @@
     DiscoverConfig,
     Seed,
     SeedConfig,
+    Poll,
+    PollConfig,
     Filter,
     FilterConfig,
     Whatever,
@@ -55,6 +57,7 @@
     "rate",
     "extract",
     "seed",
+    "poll",
     "classify",
     "ideate",
     "id8",
@@ -532,6 +535,178 @@ async def seed(
     )
 
 
+async def poll(
+    df: Optional[pd.DataFrame] = None,
+    column_name: Optional[str] = None,
+    *,
+    population_description: Optional[str] = None,
+    questions: Optional[Union[str, Sequence[str]]] = None,
+    save_dir: str,
+    file_name: str = "poll_results.csv",
+    seed_file_name: str = "poll_seeds.csv",
+    persona_file_name: str = "poll_personas.csv",
+    model: str = "gpt-5.4",
+    seed_model: Optional[str] = None,
+    persona_model: Optional[str] = None,
+    poll_model: Optional[str] = None,
+    n_parallels: int = 650,
+    num_personas: int = 1000,
+    entities_per_generation: int = 50,
+    entity_batch_frac: float = 0.25,
+    existing_entities_cap: int = 100,
+    deduplicate: bool = False,
+    deduplicate_sample_seed: int = 42,
+    n_questions_per_run: int = 8,
+    seed_additional_instructions: Optional[str] = None,
+    additional_instructions: Optional[str] = None,
+    web_search: bool = False,
+    reasoning_effort: Optional[str] = None,
+    seed_template_path: Optional[str] = None,
+    persona_template_path: Optional[str] = None,
+    answer_template_path: Optional[str] = None,
+    reset_files: bool = False,
+    response_fn: Optional[Callable[..., Awaitable[Any]]] = None,
+    get_all_responses_fn: Optional[Callable[..., Awaitable[pd.DataFrame]]] = None,
+    embedding_fn: Optional[Callable[..., Awaitable[Any]]] = None,
+    get_all_embeddings_fn: Optional[Callable[..., Awaitable[Dict[str, List[float]]]]] = None,
+    **cfg_kwargs,
+) -> pd.DataFrame:
+    """Seed a synthetic population, expand it into personas, and survey them.
+
+    Example Use
+    -----------
+    Survey a representative synthetic sample of the U.S. population on one or
+    more poll questions.
+
+    Parameters
+    ----------
+    df:
+        Optional DataFrame containing precomputed respondent seeds. When
+        provided, `population_description` is ignored and the seeding stage is
+        skipped. If the DataFrame already contains a ``persona`` column, the
+        task skips directly to the polling stage and reuses those personas.
+    column_name:
+        Column in ``df`` containing the seed descriptions. If omitted, the task
+        will look for ``"seed"`` and then ``"entity"``. This is optional when
+        reusing an existing ``persona`` column.
+    population_description:
+        Natural-language description of the population to seed when ``df`` is
+        not supplied.
+    questions:
+        A single survey question or a sequence of questions. Questions are
+        answered in JSON and become columns in the returned DataFrame.
+    save_dir:
+        Directory where intermediate and final CSV artifacts are written.
+    file_name:
+        Final CSV written by the poll task.
+    seed_file_name:
+        CSV used for the seeded respondent population.
+    persona_file_name:
+        CSV used for the generated personas before question answering.
+    model:
+        Default model used for all three stages unless a stage-specific model is
+        provided.
+    seed_model / persona_model / poll_model:
+        Optional model overrides for each stage.
+    n_parallels:
+        Maximum concurrent requests for persona generation and polling.
+    num_personas:
+        Number of synthetic respondents to create when seeding a population.
+    entities_per_generation / entity_batch_frac / existing_entities_cap /
+    deduplicate / deduplicate_sample_seed:
+        Seeding controls forwarded to :class:`gabriel.tasks.seed.Seed`.
+    n_questions_per_run:
+        Maximum number of questions bundled into one polling prompt.
+    seed_additional_instructions:
+        Extra guidance appended to the seed-generation instructions.
+    additional_instructions:
+        Extra guidance appended to the poll-answering prompt.
+    web_search:
+        Enable web search augmentation for the polling stage only.
+    reasoning_effort:
+        Controls how intensely the model reasons (none/low/medium/high).
+    seed_template_path / persona_template_path / answer_template_path:
+        Optional Jinja2 template overrides for each prompt stage.
+    reset_files:
+        When ``True`` ignore saved checkpoints and regenerate all stages.
+    response_fn / get_all_responses_fn:
+        Optional overrides for the Responses API execution path.
+    embedding_fn / get_all_embeddings_fn:
+        Optional embedding overrides used by nested seed deduplication.
+    **cfg_kwargs:
+        Additional overrides applied to :class:`gabriel.tasks.poll.PollConfig`.
+        Keys matching :func:`gabriel.utils.openai_utils.get_all_responses` /
+        :func:`gabriel.utils.openai_utils.get_response` (for example
+        ``max_output_tokens``) are forwarded to model calls.
+
+    Returns
+    -------
+    pandas.DataFrame
+        DataFrame containing respondent seeds, generated personas, and one
+        column per question when questions are supplied.
+    """
+
+    save_dir = os.path.expandvars(os.path.expanduser(save_dir))
+    os.makedirs(save_dir, exist_ok=True)
+    if df is None and population_description is None:
+        final_path = os.path.join(save_dir, file_name)
+        return _load_cached_dataframe(final_path, task_name="Poll")
+
+    cfg_overrides, response_kwargs = _split_cfg_and_response_kwargs(
+        PollConfig,
+        dict(cfg_kwargs),
+        task_name="poll",
+    )
+    normalized_questions: List[str]
+    if questions is None:
+        normalized_questions = []
+    elif isinstance(questions, str):
+        normalized_questions = [questions]
+    else:
+        normalized_questions = [str(question) for question in questions]
+
+    cfg = PollConfig(
+        population_description=population_description,
+        questions=normalized_questions,
+        save_dir=save_dir,
+        file_name=file_name,
+        seed_file_name=seed_file_name,
+        persona_file_name=persona_file_name,
+        seed_model=seed_model or model,
+        persona_model=persona_model or model,
+        poll_model=poll_model or model,
+        n_parallels=n_parallels,
+        num_personas=num_personas,
+        entities_per_generation=entities_per_generation,
+        entity_batch_frac=entity_batch_frac,
+        existing_entities_cap=existing_entities_cap,
+        deduplicate=deduplicate,
+        deduplicate_sample_seed=deduplicate_sample_seed,
+        n_questions_per_run=n_questions_per_run,
+        seed_additional_instructions=seed_additional_instructions,
+        additional_instructions=additional_instructions,
+        web_search=web_search,
+        reasoning_effort=reasoning_effort,
+        **cfg_overrides,
+    )
+    task = Poll(
+        cfg,
+        seed_template_path=seed_template_path,
+        persona_template_path=persona_template_path,
+        answer_template_path=answer_template_path,
+    )
+    return await task.run(
+        df=df,
+        column_name=column_name,
+        reset_files=reset_files,
+        response_fn=response_fn,
+        get_all_responses_fn=get_all_responses_fn,
+        embedding_fn=embedding_fn,
+        get_all_embeddings_fn=get_all_embeddings_fn,
+        **response_kwargs,
+    )
+
+
 async def classify(
     df: Optional[pd.DataFrame],
     column_name: Optional[str] = None,
 
@@ -0,0 +1,118 @@
+Imagine yourself as a real individual that matches the seed below. The idea is to produce a representative and detailed persona biography that mirrors how a typical individual with these characteristics could think, feel and behave.
+
+The seed below is the basis of the persona. Preserve its core demographic and life-situation signals, and build outward from them. Use it as an anchor rather than ignoring it or drifting into a different person.
+
+BEGIN SEED
+{{ seed }}
+END SEED
+
+{% if population_description %}
+This seed was sampled from the following broader population:
+{{ population_description }}
+{% endif %}
+
+Do not fixate on race or other basic demographics; these are just for basic setup.
+Instead, focus on making a real person with a situation that is relevant to the demographics based on what you know of the population and statistics about the population.
+Regular people are not obsessed with their race, gender, or ethnicity -- they just live their unique lives, and those lives are statistically correlated with the demographics but not obsessed with them.
+Deeply consider real people. Think and recall to your mind opinions you know of from everyday people of the type implied by the seed.
+Don't bias with any of your political preconceptions or idealized views of people; recall real people and their views and experiences and what they say, then embody that.
+People have controversial views and people are sometimes deeply flawed or wrong factually; don't overlook these possibilities.
+Embrace the full range of human experiences and perspectives.
+
+Your goal is to write a roughly 750 word personality biography of this person. It should be creative, specific, consistent with the seed, and read like a short life story. Start with concrete details: age, demographic makeup, education history, etc. Proceed to a condensed life narrative, then conclude with the core beliefs and values that summarise the persona.
+You should infer and elaborate only in ways that are plausibly implied by the seed and by your knowledge of the population and environment such a person would likely inhabit.
+Do not contradict the seed. If the seed leaves something open, fill it in realistically rather than generically.
+
+Below is an example of what the personality biography should look like, in both format and content. As inspiration, but do not adhere to the content of this example. The persona should be unique, creatively generated, and specialized to the seed and context described above.
+
+Name: Jenna Perez
+Nationality: USA
+Age: 21
+Gender: Female
+Race: Hispanic
+Sexual Orientation: Heterosexual
+Parents: Annika Perez and Jose Perez (both alive)
+Parental Income: $39,213
+Siblings: 2 (older brother and younger sister)
+Romantic Partner: Boyfriend (Tom Andrews, 23, together for 3 months, lives separately)
+Children: None
+Religion: Catholic
+Religiosity: Infrequently attends church, only with family on special occasions
+Educational History: High school diploma, semester of college at Texas State Technical College
+Employment: Walmart Cashier and Uber Driver (night)
+Income: $29,421
+City of Residence: Waco, Texas
+City of Birth: San Antonio, Texas
+
+Description of Neighborhood: rundown inner city apartment complexes, largely populated by racial minorities, poor with moderate crime but not the poorest area, not safe for a woman to walk alone after dark, large public high school nearby and lots of basketball courts, nearby public housing
+
+Personal Life Story Synopsis: Jenna Perez was born in San Antonio, Texas but moved to Waco when she was 3 years old. She lived in a house with her family in a poorer neighborhood. When she was young, her older brother would tease her a lot, but she doted after her younger sister.
+Her mother was a loving woman, but had multiple struggles with depression. Her father was quieter but more strict, and would occasionally put his foot down very harshly. He was quite religious, so they would consistently go to church.
+Jenna attended a large public elementary school which was near her house, though she still rode the bus. Many of her neighbor friends went to the same school, and they became a close friend group which would last through high school.
+She was a cheery, smiley, mostly extroverted child, but became much more sullen and edgy in her middle school and early high school years. Her friend group shrunk and she spent most of her time either physically with or texting with her closest friends. She grew more distant from her family, especially from her father.
+She didn't care much for school until around 11th grade, when she began to pay attention to what she would do after school. In 12th grade, she applied to a few colleges, but only got into her state school program.
+She decided to attend, and roomed with one of her close friends from childhood. She enjoyed the party culture, but she did not click with the academic programs and could not figure out a field of study. She decided to drop out after a semester, with the plan of coming back later.
+She turned her part-time job as a grocery store clerk into her full-time job. After a while, she started to see some of her grade school friends doing well in nursing programs, on good career paths.
+She has now decided that she wants to save up and attend nursing school within the next few years. She is reasonably content with her life right now; she finds her work boring but enjoys being close to her friends.
+She likes (but does not love) her boyfriend, who she finds to be a bit aimless and disaffected. She has spent increasing amounts of time back with her family, though her relationship with her father has not been fully repaired.
+
+Greatest Achievements in Personal Life: Jenna took care of her younger sister at her place when her sister got into a bad fight with their parents. Jenna volunteered for many years as part of her church, and still does occasionally for a soup kitchen. Jenna got accepted to college.
+Greatest Achievements in Career: Jenna got accepted to college. She is up for a promotion soon at Walmart. She got good grades in her English classes in high school.
+Greatest Failures in Career:
+
+Core Values: friendship, optimism, calm, enjoyment of life
+Personal Strengths: consistent worker, positive attitude, spends time with family
+Personal Weaknesses: aimless, gossip, spends most of her free time on social media, holds grudges
+Big Five Personality Assessment Score — Extraversion: 72
+Big Five Personality Assessment Score — Agreeableness: 46
+Big Five Personality Assessment Score — Conscientiousness: 59
+Big Five Personality Assessment Score — Neuroticism: 67
+Big Five Personality Assessment Score — Openness to Experience: 26
+
+Basic Political Views: socially liberal but not progressive, not politically invested, didn't vote, dislikes Trump, went to a BLM protest with friends, opposes tax increases, supports gay marriage, supports gun rights
+Life Goals: get an associate's degree, become a nurse, get married to a stable earner, have two or three children, buy a small house in a nicer suburb of Waco
+
+Report an output similar to the example above, and in the exact format shown below. Simply fill in the <...> markers and output the whole personality biography, just as it is written in the example.
+Remember to be extremely creative in this task, and to make a persona who would reasonably be the person implied by the seed. Unlike the example, ensure that the end of the life synopsis specifies the person's relationship with the broader group implied by the seed.
+Your persona must be unique; if I were to ask you to do this task multiple times, the personas must be meaningfully different.
+Do not assume that the person will be politically liberal or conform to any such stereotype. Do not assume they are perfect people. People are complicated, with good and bad sides. This should not read like a job interview, where people say their greatest weaknesses are 'perfectionism' or 'overwork'.
+Use your extensive knowledge about complicated, flawed, real people from social media, biographies, TV shows, books, and film.
+Your created persona should embody these principles. These personas will be used in a research project which will help a lot of people. If the personas are not authentic and are instead liberal or conformist caricatures of real people, the research project will fail.
+
+Remember that the 'average' person is not perfect nor idealized. They have their struggles, their biases, and their flaws, just like anyone else. They may harbor prejudices or stereotypes, hold contradictory beliefs, or fall short of their own ideals.
+They may not always act in a way that aligns with their stated values. They might have regrets, missed opportunities, and unfulfilled dreams. They are a product of their environment, upbringing, and experiences.
+Their story might be characterized by growth and change, or by stagnation and inertia. You should strive to reflect this complexity and humanity in your character biography.
+They should feel like a real, relatable human being with a rich, multifaceted personality and life story.
+
+Name: <insert name here>
+Nationality: <insert nationality here>
+Age: <insert age here>
+Gender: <insert gender here>
+Race: <insert race here>
+Parents: <insert parents here>
+Parental Income: <insert parental income here>
+Siblings: <insert siblings here>
+Romantic Partner: <insert romantic partner here>
+Children: <insert children here>
+Religion: <insert religion here>
+Religiosity: <insert religiosity here>
+Educational History: <insert educational history here>
+Employment: <insert employment here>
+Income: <insert income here>
+City of Residence: <insert city of residence here>
+City of Birth: <insert city of birth here>
+
+Description of Neighborhood: <insert description of neighborhood here>
+Personal Life Story Synopsis: <insert life synopsis here>
+
+Core Values: <insert core values here>
+Personal Strengths: <insert personal strengths here>
+Personal Weaknesses: <insert personal weaknesses here>
+Big Five Personality Assessment Score — Extraversion: <insert extraversion score here>
+Big Five Personality Assessment Score — Agreeableness: <insert agreeableness score here>
+Big Five Personality Assessment Score — Conscientiousness: <insert conscientiousness score here>
+Big Five Personality Assessment Score — Neuroticism: <insert neuroticism score here>
+Big Five Personality Assessment Score — Openness to Experience: <insert openness to experience score here>
+
+Basic Political Views: <insert basic political views here>
+Life Goals: <insert life goals here>
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+__pycache__/`
	`2`	`+.pytest_cache/`
	`3`	`+responses.csv`
Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-__version__ = "1.1.4"`
	`1`	`+__version__ = "1.1.5"`