Skip to content

Commit 96d6aaa

Browse files
author
hemanth-asirvatham
committed
Add gabriel.poll and bump version to 1.1.5
1 parent e9d285f commit 96d6aaa

11 files changed

Lines changed: 1579 additions & 2 deletions

File tree

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
__pycache__/
2+
.pytest_cache/
3+
responses.csv

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,7 @@ The tutorial notebook walks through these ideas step-by-step—from setting up a
6262
| `gabriel.compare` | Identifies similarities / differences between paired items. Output = list of differences. | Contrast op-eds from different districts; compare two ad campaigns. |
6363
| `gabriel.bucket` | Builds taxonomies from many terms. Output = bucket/cluster labels. | Group technologies, artworks, or HR complaints into emergent categories. |
6464
| `gabriel.seed` | Enforces a representative distribution / diversity of seeds. | Initialize unique personas that match US population distribution. |
65+
| `gabriel.poll` | Seeds personas, expands them into full biographies, and surveys them. | Simulate a synthetic opinion poll on policy, trust, and open-ended attitudes. |
6566
| `gabriel.ideate` | Generates many novel scientific theories and filters the cream of the crop. | Procure novel theories on inflation for potential research. |
6667
| `gabriel.debias` | Post-process measurements to remove inference bias. | Ensure GPT isn't guessing climate opinions in speeches based on general political lean. |
6768
| `gabriel.load` | Prepares a folder of text / image / audio files into a spreadsheet for use in GABRIEL. | Image directory converted into spreadsheet of file paths. |

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "openai-gabriel"
7-
version = "1.1.4"
7+
version = "1.1.5"
88
description = "LLM-based library to measure quantitative attributes on qualitative data"
99
authors = [
1010
{name = "Hemanth Asirvatham"},

src/gabriel/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
view,
2424
bucket,
2525
seed,
26+
poll,
2627
)
2728
from .utils import load
2829

@@ -42,6 +43,7 @@
4243
"compare",
4344
"discover",
4445
"seed",
46+
"poll",
4547
"deduplicate",
4648
"merge",
4749
"filter",

src/gabriel/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "1.1.4"
1+
__version__ = "1.1.5"

src/gabriel/api.py

Lines changed: 175 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,8 @@
3434
DiscoverConfig,
3535
Seed,
3636
SeedConfig,
37+
Poll,
38+
PollConfig,
3739
Filter,
3840
FilterConfig,
3941
Whatever,
@@ -55,6 +57,7 @@
5557
"rate",
5658
"extract",
5759
"seed",
60+
"poll",
5861
"classify",
5962
"ideate",
6063
"id8",
@@ -532,6 +535,178 @@ async def seed(
532535
)
533536

534537

538+
async def poll(
539+
df: Optional[pd.DataFrame] = None,
540+
column_name: Optional[str] = None,
541+
*,
542+
population_description: Optional[str] = None,
543+
questions: Optional[Union[str, Sequence[str]]] = None,
544+
save_dir: str,
545+
file_name: str = "poll_results.csv",
546+
seed_file_name: str = "poll_seeds.csv",
547+
persona_file_name: str = "poll_personas.csv",
548+
model: str = "gpt-5.4",
549+
seed_model: Optional[str] = None,
550+
persona_model: Optional[str] = None,
551+
poll_model: Optional[str] = None,
552+
n_parallels: int = 650,
553+
num_personas: int = 1000,
554+
entities_per_generation: int = 50,
555+
entity_batch_frac: float = 0.25,
556+
existing_entities_cap: int = 100,
557+
deduplicate: bool = False,
558+
deduplicate_sample_seed: int = 42,
559+
n_questions_per_run: int = 8,
560+
seed_additional_instructions: Optional[str] = None,
561+
additional_instructions: Optional[str] = None,
562+
web_search: bool = False,
563+
reasoning_effort: Optional[str] = None,
564+
seed_template_path: Optional[str] = None,
565+
persona_template_path: Optional[str] = None,
566+
answer_template_path: Optional[str] = None,
567+
reset_files: bool = False,
568+
response_fn: Optional[Callable[..., Awaitable[Any]]] = None,
569+
get_all_responses_fn: Optional[Callable[..., Awaitable[pd.DataFrame]]] = None,
570+
embedding_fn: Optional[Callable[..., Awaitable[Any]]] = None,
571+
get_all_embeddings_fn: Optional[Callable[..., Awaitable[Dict[str, List[float]]]]] = None,
572+
**cfg_kwargs,
573+
) -> pd.DataFrame:
574+
"""Seed a synthetic population, expand it into personas, and survey them.
575+
576+
Example Use
577+
-----------
578+
Survey a representative synthetic sample of the U.S. population on one or
579+
more poll questions.
580+
581+
Parameters
582+
----------
583+
df:
584+
Optional DataFrame containing precomputed respondent seeds. When
585+
provided, `population_description` is ignored and the seeding stage is
586+
skipped. If the DataFrame already contains a ``persona`` column, the
587+
task skips directly to the polling stage and reuses those personas.
588+
column_name:
589+
Column in ``df`` containing the seed descriptions. If omitted, the task
590+
will look for ``"seed"`` and then ``"entity"``. This is optional when
591+
reusing an existing ``persona`` column.
592+
population_description:
593+
Natural-language description of the population to seed when ``df`` is
594+
not supplied.
595+
questions:
596+
A single survey question or a sequence of questions. Questions are
597+
answered in JSON and become columns in the returned DataFrame.
598+
save_dir:
599+
Directory where intermediate and final CSV artifacts are written.
600+
file_name:
601+
Final CSV written by the poll task.
602+
seed_file_name:
603+
CSV used for the seeded respondent population.
604+
persona_file_name:
605+
CSV used for the generated personas before question answering.
606+
model:
607+
Default model used for all three stages unless a stage-specific model is
608+
provided.
609+
seed_model / persona_model / poll_model:
610+
Optional model overrides for each stage.
611+
n_parallels:
612+
Maximum concurrent requests for persona generation and polling.
613+
num_personas:
614+
Number of synthetic respondents to create when seeding a population.
615+
entities_per_generation / entity_batch_frac / existing_entities_cap /
616+
deduplicate / deduplicate_sample_seed:
617+
Seeding controls forwarded to :class:`gabriel.tasks.seed.Seed`.
618+
n_questions_per_run:
619+
Maximum number of questions bundled into one polling prompt.
620+
seed_additional_instructions:
621+
Extra guidance appended to the seed-generation instructions.
622+
additional_instructions:
623+
Extra guidance appended to the poll-answering prompt.
624+
web_search:
625+
Enable web search augmentation for the polling stage only.
626+
reasoning_effort:
627+
Controls how intensely the model reasons (none/low/medium/high).
628+
seed_template_path / persona_template_path / answer_template_path:
629+
Optional Jinja2 template overrides for each prompt stage.
630+
reset_files:
631+
When ``True`` ignore saved checkpoints and regenerate all stages.
632+
response_fn / get_all_responses_fn:
633+
Optional overrides for the Responses API execution path.
634+
embedding_fn / get_all_embeddings_fn:
635+
Optional embedding overrides used by nested seed deduplication.
636+
**cfg_kwargs:
637+
Additional overrides applied to :class:`gabriel.tasks.poll.PollConfig`.
638+
Keys matching :func:`gabriel.utils.openai_utils.get_all_responses` /
639+
:func:`gabriel.utils.openai_utils.get_response` (for example
640+
``max_output_tokens``) are forwarded to model calls.
641+
642+
Returns
643+
-------
644+
pandas.DataFrame
645+
DataFrame containing respondent seeds, generated personas, and one
646+
column per question when questions are supplied.
647+
"""
648+
649+
save_dir = os.path.expandvars(os.path.expanduser(save_dir))
650+
os.makedirs(save_dir, exist_ok=True)
651+
if df is None and population_description is None:
652+
final_path = os.path.join(save_dir, file_name)
653+
return _load_cached_dataframe(final_path, task_name="Poll")
654+
655+
cfg_overrides, response_kwargs = _split_cfg_and_response_kwargs(
656+
PollConfig,
657+
dict(cfg_kwargs),
658+
task_name="poll",
659+
)
660+
normalized_questions: List[str]
661+
if questions is None:
662+
normalized_questions = []
663+
elif isinstance(questions, str):
664+
normalized_questions = [questions]
665+
else:
666+
normalized_questions = [str(question) for question in questions]
667+
668+
cfg = PollConfig(
669+
population_description=population_description,
670+
questions=normalized_questions,
671+
save_dir=save_dir,
672+
file_name=file_name,
673+
seed_file_name=seed_file_name,
674+
persona_file_name=persona_file_name,
675+
seed_model=seed_model or model,
676+
persona_model=persona_model or model,
677+
poll_model=poll_model or model,
678+
n_parallels=n_parallels,
679+
num_personas=num_personas,
680+
entities_per_generation=entities_per_generation,
681+
entity_batch_frac=entity_batch_frac,
682+
existing_entities_cap=existing_entities_cap,
683+
deduplicate=deduplicate,
684+
deduplicate_sample_seed=deduplicate_sample_seed,
685+
n_questions_per_run=n_questions_per_run,
686+
seed_additional_instructions=seed_additional_instructions,
687+
additional_instructions=additional_instructions,
688+
web_search=web_search,
689+
reasoning_effort=reasoning_effort,
690+
**cfg_overrides,
691+
)
692+
task = Poll(
693+
cfg,
694+
seed_template_path=seed_template_path,
695+
persona_template_path=persona_template_path,
696+
answer_template_path=answer_template_path,
697+
)
698+
return await task.run(
699+
df=df,
700+
column_name=column_name,
701+
reset_files=reset_files,
702+
response_fn=response_fn,
703+
get_all_responses_fn=get_all_responses_fn,
704+
embedding_fn=embedding_fn,
705+
get_all_embeddings_fn=get_all_embeddings_fn,
706+
**response_kwargs,
707+
)
708+
709+
535710
async def classify(
536711
df: Optional[pd.DataFrame],
537712
column_name: Optional[str] = None,
Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
Imagine yourself as a real individual that matches the seed below. The idea is to produce a representative and detailed persona biography that mirrors how a typical individual with these characteristics could think, feel and behave.
2+
3+
The seed below is the basis of the persona. Preserve its core demographic and life-situation signals, and build outward from them. Use it as an anchor rather than ignoring it or drifting into a different person.
4+
5+
BEGIN SEED
6+
{{ seed }}
7+
END SEED
8+
9+
{% if population_description %}
10+
This seed was sampled from the following broader population:
11+
{{ population_description }}
12+
{% endif %}
13+
14+
Do not fixate on race or other basic demographics; these are just for basic setup.
15+
Instead, focus on making a real person with a situation that is relevant to the demographics based on what you know of the population and statistics about the population.
16+
Regular people are not obsessed with their race, gender, or ethnicity -- they just live their unique lives, and those lives are statistically correlated with the demographics but not obsessed with them.
17+
Deeply consider real people. Think and recall to your mind opinions you know of from everyday people of the type implied by the seed.
18+
Don't bias with any of your political preconceptions or idealized views of people; recall real people and their views and experiences and what they say, then embody that.
19+
People have controversial views and people are sometimes deeply flawed or wrong factually; don't overlook these possibilities.
20+
Embrace the full range of human experiences and perspectives.
21+
22+
Your goal is to write a roughly 750 word personality biography of this person. It should be creative, specific, consistent with the seed, and read like a short life story. Start with concrete details: age, demographic makeup, education history, etc. Proceed to a condensed life narrative, then conclude with the core beliefs and values that summarise the persona.
23+
You should infer and elaborate only in ways that are plausibly implied by the seed and by your knowledge of the population and environment such a person would likely inhabit.
24+
Do not contradict the seed. If the seed leaves something open, fill it in realistically rather than generically.
25+
26+
Below is an example of what the personality biography should look like, in both format and content. As inspiration, but do not adhere to the content of this example. The persona should be unique, creatively generated, and specialized to the seed and context described above.
27+
28+
Name: Jenna Perez
29+
Nationality: USA
30+
Age: 21
31+
Gender: Female
32+
Race: Hispanic
33+
Sexual Orientation: Heterosexual
34+
Parents: Annika Perez and Jose Perez (both alive)
35+
Parental Income: $39,213
36+
Siblings: 2 (older brother and younger sister)
37+
Romantic Partner: Boyfriend (Tom Andrews, 23, together for 3 months, lives separately)
38+
Children: None
39+
Religion: Catholic
40+
Religiosity: Infrequently attends church, only with family on special occasions
41+
Educational History: High school diploma, semester of college at Texas State Technical College
42+
Employment: Walmart Cashier and Uber Driver (night)
43+
Income: $29,421
44+
City of Residence: Waco, Texas
45+
City of Birth: San Antonio, Texas
46+
47+
Description of Neighborhood: rundown inner city apartment complexes, largely populated by racial minorities, poor with moderate crime but not the poorest area, not safe for a woman to walk alone after dark, large public high school nearby and lots of basketball courts, nearby public housing
48+
49+
Personal Life Story Synopsis: Jenna Perez was born in San Antonio, Texas but moved to Waco when she was 3 years old. She lived in a house with her family in a poorer neighborhood. When she was young, her older brother would tease her a lot, but she doted after her younger sister.
50+
Her mother was a loving woman, but had multiple struggles with depression. Her father was quieter but more strict, and would occasionally put his foot down very harshly. He was quite religious, so they would consistently go to church.
51+
Jenna attended a large public elementary school which was near her house, though she still rode the bus. Many of her neighbor friends went to the same school, and they became a close friend group which would last through high school.
52+
She was a cheery, smiley, mostly extroverted child, but became much more sullen and edgy in her middle school and early high school years. Her friend group shrunk and she spent most of her time either physically with or texting with her closest friends. She grew more distant from her family, especially from her father.
53+
She didn't care much for school until around 11th grade, when she began to pay attention to what she would do after school. In 12th grade, she applied to a few colleges, but only got into her state school program.
54+
She decided to attend, and roomed with one of her close friends from childhood. She enjoyed the party culture, but she did not click with the academic programs and could not figure out a field of study. She decided to drop out after a semester, with the plan of coming back later.
55+
She turned her part-time job as a grocery store clerk into her full-time job. After a while, she started to see some of her grade school friends doing well in nursing programs, on good career paths.
56+
She has now decided that she wants to save up and attend nursing school within the next few years. She is reasonably content with her life right now; she finds her work boring but enjoys being close to her friends.
57+
She likes (but does not love) her boyfriend, who she finds to be a bit aimless and disaffected. She has spent increasing amounts of time back with her family, though her relationship with her father has not been fully repaired.
58+
59+
Greatest Achievements in Personal Life: Jenna took care of her younger sister at her place when her sister got into a bad fight with their parents. Jenna volunteered for many years as part of her church, and still does occasionally for a soup kitchen. Jenna got accepted to college.
60+
Greatest Achievements in Career: Jenna got accepted to college. She is up for a promotion soon at Walmart. She got good grades in her English classes in high school.
61+
Greatest Failures in Career:
62+
63+
Core Values: friendship, optimism, calm, enjoyment of life
64+
Personal Strengths: consistent worker, positive attitude, spends time with family
65+
Personal Weaknesses: aimless, gossip, spends most of her free time on social media, holds grudges
66+
Big Five Personality Assessment Score — Extraversion: 72
67+
Big Five Personality Assessment Score — Agreeableness: 46
68+
Big Five Personality Assessment Score — Conscientiousness: 59
69+
Big Five Personality Assessment Score — Neuroticism: 67
70+
Big Five Personality Assessment Score — Openness to Experience: 26
71+
72+
Basic Political Views: socially liberal but not progressive, not politically invested, didn't vote, dislikes Trump, went to a BLM protest with friends, opposes tax increases, supports gay marriage, supports gun rights
73+
Life Goals: get an associate's degree, become a nurse, get married to a stable earner, have two or three children, buy a small house in a nicer suburb of Waco
74+
75+
Report an output similar to the example above, and in the exact format shown below. Simply fill in the <...> markers and output the whole personality biography, just as it is written in the example.
76+
Remember to be extremely creative in this task, and to make a persona who would reasonably be the person implied by the seed. Unlike the example, ensure that the end of the life synopsis specifies the person's relationship with the broader group implied by the seed.
77+
Your persona must be unique; if I were to ask you to do this task multiple times, the personas must be meaningfully different.
78+
Do not assume that the person will be politically liberal or conform to any such stereotype. Do not assume they are perfect people. People are complicated, with good and bad sides. This should not read like a job interview, where people say their greatest weaknesses are 'perfectionism' or 'overwork'.
79+
Use your extensive knowledge about complicated, flawed, real people from social media, biographies, TV shows, books, and film.
80+
Your created persona should embody these principles. These personas will be used in a research project which will help a lot of people. If the personas are not authentic and are instead liberal or conformist caricatures of real people, the research project will fail.
81+
82+
Remember that the 'average' person is not perfect nor idealized. They have their struggles, their biases, and their flaws, just like anyone else. They may harbor prejudices or stereotypes, hold contradictory beliefs, or fall short of their own ideals.
83+
They may not always act in a way that aligns with their stated values. They might have regrets, missed opportunities, and unfulfilled dreams. They are a product of their environment, upbringing, and experiences.
84+
Their story might be characterized by growth and change, or by stagnation and inertia. You should strive to reflect this complexity and humanity in your character biography.
85+
They should feel like a real, relatable human being with a rich, multifaceted personality and life story.
86+
87+
Name: <insert name here>
88+
Nationality: <insert nationality here>
89+
Age: <insert age here>
90+
Gender: <insert gender here>
91+
Race: <insert race here>
92+
Parents: <insert parents here>
93+
Parental Income: <insert parental income here>
94+
Siblings: <insert siblings here>
95+
Romantic Partner: <insert romantic partner here>
96+
Children: <insert children here>
97+
Religion: <insert religion here>
98+
Religiosity: <insert religiosity here>
99+
Educational History: <insert educational history here>
100+
Employment: <insert employment here>
101+
Income: <insert income here>
102+
City of Residence: <insert city of residence here>
103+
City of Birth: <insert city of birth here>
104+
105+
Description of Neighborhood: <insert description of neighborhood here>
106+
Personal Life Story Synopsis: <insert life synopsis here>
107+
108+
Core Values: <insert core values here>
109+
Personal Strengths: <insert personal strengths here>
110+
Personal Weaknesses: <insert personal weaknesses here>
111+
Big Five Personality Assessment Score — Extraversion: <insert extraversion score here>
112+
Big Five Personality Assessment Score — Agreeableness: <insert agreeableness score here>
113+
Big Five Personality Assessment Score — Conscientiousness: <insert conscientiousness score here>
114+
Big Five Personality Assessment Score — Neuroticism: <insert neuroticism score here>
115+
Big Five Personality Assessment Score — Openness to Experience: <insert openness to experience score here>
116+
117+
Basic Political Views: <insert basic political views here>
118+
Life Goals: <insert life goals here>

0 commit comments

Comments
 (0)