Scenarios

A Scenario is a dictionary containing a key/value pair that is used to add data or content to questions in a survey, replacing a parameter in a question with a specific value. A ScenarioList is a list of Scenario objects.

Purpose

Scenarios allow you create variations and versions of questions efficiently. For example, we could create a question “How much do you enjoy {{ activity }}?” and use scenarios to replace the parameter activity with running or reading or other activities. When we add the scenarios to the question, the question will be asked multiple times, once for each scenario, with the parameter replaced by the value in the scenario. This allows us to administer multiple versions of the question together in a survey, either asynchronously (by default) or according to Surveys rules that we can specify (e.g., skip/stop logic), without having to create each question manually.

Metadata

Scenarios are also a convenient way to keep track of metadata or other information relating to our survey questions that is important to our analysis of the results. For example, say we are using scenarios to parameterize questions with pieces of {{ content }} from a dataset. In our scenarios for the content parameter, we could also include metadata about the source of the content, such as the {{ author }}, the {{ publication_date }}, or the {{ source }}. This will create columns for the additional data in the survey results without passing them to the question texts if there is no corresponding parameter in the question texts. This allows us to analyze the responses in the context of the metadata without needing to match up the data with the metadata post-survey.

Constructing a Scenario

To use a scenario, we start by creating a question that takes a parameter in double braces:

from edsl import QuestionMultipleChoice

q = QuestionMultipleChoice(
    question_name = "enjoy",
    question_text = "How much do you enjoy {{ activity }}?",
    question_options = ["Not at all", "Somewhat", "Very much"]
)

Next we create a dictionary for a value that will replace the parameter and store it in a Scenario object:

from edsl import Scenario

scenario = Scenario({"activity": "running"})

We can inspect the scenario and see that it consists of the key/value pair that we created:

scenario

This will return:

key

value

activity

running

ScenarioList

If multiple values will be used, we can create a list of Scenario objects:

from edsl import Scenario

scenarios = [Scenario({"activity": a}) for a in ["running", "reading"]]

We can inspect the scenarios:

scenarios

This will return:

[Scenario({'activity': 'running'}), Scenario({'activity': 'reading'})]

We can also create a ScenarioList object to store multiple scenarios:

from edsl import ScenarioList

scenariolist = ScenarioList([Scenario({"activity": a}) for a in ["running", "reading"]])

We can inspect it:

scenariolist

This will return:

activity

running

reading

A list of scenarios is used in the same way as a ScenarioList. The difference is that a ScenarioList is a class that can be used to create a list of scenarios from a variety of data sources, such as a list, dictionary, or a Wikipedia table (see examples below).

Using f-strings with scenarios

It is possible to use scenarios and f-strings together in a question. An f-string must be evaluated when a question is constructed, whereas a scenario is evaluated when a question is run.

For example, here we use an f-string to create different versions of a question that also takes a parameter {{ activity }}, together with a list of scenarios to replace the parameter when the questions are run. We optionally include the f-string in the question name as well as the question text in order to simultaneously create unique identifiers for the questions, which are needed in order to pass the questions that are created to a Survey. Then we use the show_prompts() method to examine the user prompts that are created when the scenarios are added to the questions:

from edsl import QuestionFreeText, ScenarioList, Scenario, Survey

questions = []
sentiments = ["enjoy", "hate", "love"]

for sentiment in sentiments:
    q = QuestionFreeText(
        question_name = f"{ sentiment }_activity",
        question_text = f"How much do you { sentiment } {{ activity }}?"
    )
    questions.append(q)

scenarios = ScenarioList(
    Scenario({"activity": activity}) for activity in ["running", "reading"]
)

survey = Survey(questions = questions)
survey.by(scenarios).show_prompts()

This will print the questions created with the f-string with the scenarios added (not that the system prompts are blank because we have not created any agents):

user_prompt

system_prompt

How much do you enjoy running?

How much do you hate running?

How much do you love running?

How much do you enjoy reading?

How much do you hate reading?

How much do you love reading?

To learn more about prompts, please see the Prompts section.

Using a Scenario

We use a scenario (or scenariolist) by adding it to a question (or a survey of questions), either when constructing the question or else when running it.

We use the by() method to add a scenario to a question when running it:

from edsl import QuestionMultipleChoice, Scenario, Agent

q = QuestionMultipleChoice(
    question_name = "enjoy",
    question_text = "How much do you enjoy {{ activity }}?",
    question_options = ["Not at all", "Somewhat", "Very much"]
)

s = Scenario({"activity": "running"})

a = Agent(traits = {"persona":"You are a human."})

results = q.by(s).by(a).run()

We can check the results to verify that the scenario has been used correctly:

results.select("activity", "enjoy")

This will print a table of the selected components of the results:

scenario.activity

answer.enjoy

running

Somewhat

Looping

We use the loop() method to add a scenario to a question when constructing it, passing it a ScenarioList. This creates a list containing a new question for each scenario that was passed. Note that we can optionally include the scenario key in the question name as well; otherwise a unique identifies is automatically added to each question name.

For example:

from edsl import QuestionMultipleChoice, ScenarioList, Scenario

q = QuestionMultipleChoice(
    question_name = "enjoy_{{ activity }}",
    question_text = "How much do you enjoy {{ activity }}?",
    question_options = ["Not at all", "Somewhat", "Very much"]
)

sl = ScenarioList(
    Scenario({"activity": a}) for a in ["running", "reading"]
)

questions = q.loop(sl)

We can inspect the questions to see that they have been created correctly:

questions

This will return:

[Question('multiple_choice', question_name = """enjoy_running""", question_text = """How much do you enjoy running?""", question_options = ['Not at all', 'Somewhat', 'Very much']),
Question('multiple_choice', question_name = """enjoy_reading""", question_text = """How much do you enjoy reading?""", question_options = ['Not at all', 'Somewhat', 'Very much'])]

We can pass the questions to a survey and run it:

from edsl import Survey, Agent

survey = Survey(questions = questions)

a = Agent(traits = {"persona": "You are a human."})

results = survey.by(a).run()

results.select("answer.*")

This will print a table of the response for each question (note that “activity” is no longer in a separate scenario field):

answer.enjoy_reading

answer.enjoy_running

Very much

Somewhat

Multiple parameters

We can also create a Scenario for multiple parameters:

from edsl import QuestionFreeText

q = QuestionFreeText(
    question_name = "counting",
    question_text = "How many {{ unit }} are in a {{ distance }}?",
)

scenario = Scenario({"unit": "inches", "distance": "mile"})

results = q.by(scenario).run()

results.select("unit", "distance", "counting")

This will print a table of the selected components of the results:

scenario.unit

scenario.distance

answer.counting

inches

mile

There are 63,360 inches in a mile.

To learn more about constructing surveys, please see the Surveys module.

Scenarios for question options

In the above examples we created scenarios in the question_text. We can also create a Scenario for question_options, e.g., in a multiple choice, checkbox, linear scale or other question type that requires them:

from edsl import QuestionMultipleChoice, Scenario

q = QuestionMultipleChoice(
    question_name = "capital_of_france",
    question_text = "What is the capital of France?",
    question_options = "{{ question_options }}"
)

s = Scenario({'question_options': ['Paris', 'London', 'Berlin', 'Madrid']})

results = q.by(s).run()

results.select("answer.*")

Output:

answer.capital_of_france

Paris

Combining Scenarios

We can combine multiple scenarios into a single Scenario object:

from edsl import Scenario

scenario1 = Scenario({"food": "apple"})
scenario2 = Scenario({"drink": "water"})

combined_scenario = scenario1 + scenario2

combined_scenario

This will return:

key

value

food

drink

apple

water

We can also combine ScenarioList objects:

from edsl import ScenarioList

scenariolist1 = ScenarioList([Scenario({"food": "apple"}), Scenario({"drink": "water"})])
scenariolist2 = ScenarioList([Scenario({"color": "red"}), Scenario({"shape": "circle"})])

combined_scenariolist = scenariolist1 + scenariolist2

combined_scenariolist

This will return:

food

drink

color

shape

apple

water

red

circle

We can create a cross product of ScenarioList objects (combine the scenarios in each list with each other):

from edsl import ScenarioList

scenariolist1 = ScenarioList([Scenario({"food": "apple"}), Scenario({"drink": "water"})])
scenariolist2 = ScenarioList([Scenario({"color": "red"}), Scenario({"shape": "circle"})])

cross_product_scenariolist = scenariolist1 * scenariolist2

cross_product_scenariolist

This will return:

food

drink

color

shape

apple

red

apple

circle

red

water

water

circle

Creating scenarios from a dataset

There are a variety of methods for creating and working with scenarios generated from datasets and different data types.

Turning results into scenarios

The method to_scenario_list() can be used to turn the results of a survey into a list of scenarios.

Example usage:

Say we have some results from a survey where we asked agents to choose a random number between 1 and 1000:

from edsl import QuestionNumerical, Agent

q_random = QuestionNumerical(
    question_name = "random",
    question_text = "Choose a random number between 1 and 1000."
)

agents = [Agent({"persona":p}) for p in ["Child", "Magician", "Olympic breakdancer"]]

results = q_random.by(agents).run()
results.select("persona", "random")

Our results are:

agent.persona

answer.random

Child

7

Magician

472

Olympic breakdancer

529

We can use the to_scenario_list() method turn components of the results into a list of scenarios to use in a new survey:

scenarios = results.select("persona", "random").to_scenario_list() # excluding other columns of the results

scenarios

We can inspect the scenarios to see that they have been created correctly:

persona

random

Child

7

Magician

472

Olympic breakdancer

529

PDFs as textual scenarios

The ScenarioList method from_pdf(‘path/to/pdf’) is a convenient way to extract information from large files. It allows you to read in a PDF and automatically create a list of textual scenarios for the pages of the file. Each scenario has the following keys: filename, page, text which can be used as a parameter in a question (or stored as metadat), and renamed as desired.

How it works: Add a placeholder {{ text }} to a question text to use the text of a PDF page as a parameter in the question. When you run the survey with the PDF scenarios, the text of each page will be inserted into the question text in place of the placeholder.

Example usage:

from edsl import QuestionFreeText, ScenarioList, Survey

# Create a survey of questions parameterized by the {{ text }} of the PDF pages:
q1 = QuestionFreeText(
    question_name = "themes",
    question_text = "Identify the key themes mentioned on this page: {{ text }}",
)

q2 = QuestionFreeText(
    question_name = "idea",
    question_text = "Identify the most important idea on this page: {{ text }}",
)

survey = Survey([q1, q2])

scenarios = ScenarioList.from_pdf("path/to/pdf_file.pdf")

# Run the survey with the pages of the PDF as scenarios:
results = survey.by(scenarios).run()

# To print the page and text of each PDF page scenario together with the answers to the question:
results.select("page", "text", "answer.*")

See a demo notebook of this method in the notebooks section of the docs index: “Extracting information from PDFs”.

Image scenarios

The Scenario method from_image(‘<filepath>.png’) converts a PNG into into a scenario that can be used with an image model (e.g., gpt-4o). This method generates a scenario with a single key - <filepath> - that can be used in a question text the same as scenarios from other data sources.

Example usage:

from edsl import Scenario

s = Scenario.from_image("logo.png") # Replace with your own local file

Here we use the example scenario, which is the Expected Parrot logo:

from edsl import Scenario

s = Scenario.example(has_image = True)

We can verify the scenario key (the filepath for the image from which the scenario was generated):

s.keys()

Output:

['logo']

We can add the key to questions as we do scenarios from other data sources:

from edsl import Model, QuestionFreeText, QuestionList, Survey

m = Model("gpt-4o")

q1 = QuestionFreeText(
    question_name = "identify",
    question_text = "What animal is in this picture: {{ logo }}" # The scenario key is the filepath
)

q2 = QuestionList(
    question_name = "colors",
    question_text = "What colors do you see in this picture: {{ logo }}"
)

survey = Survey([q1, q2])

results = survey.by(s).run()

results.select("logo", "identify", "colors")

Output using the Expected Parrot logo:

answer.identify

answer.colors

The image shows a large letter “E” followed by a pair of square brackets containing an illustration of a parrot. The parrot is green with a yellow beak and some red and blue coloring on its body. This combination suggests the mathematical notation for the expected value, often denoted as “E” followed by a random variable in brackets, commonly used in probability and statistics.

[‘gray’, ‘green’, ‘orange’, ‘pink’, ‘blue’, ‘black’]

See an example of this method in the notebooks section of the docs index: Using images in a survey.

Creating a scenario list from a list

The ScenarioList method from_list() creates a list of scenarios for a specified key and list of values that is passed to it.

Example usage:

from edsl import ScenarioList

scenariolist = ScenarioList.from_list("item", ["color", "food", "animal"])

scenariolist

This will return:

item

color

food

animal

Creating a scenario list from a dictionary

The Scenario method from_dict() creates a scenario for a dictionary that is passed to it.

The ScenarioList method from_nested_dict() creates a list of scenarios for a specified key and nested dictionary.

Example usage:

# Example dictionary
d = {"item": ["color", "food", "animal"]}


from edsl import Scenario

scenario = Scenario.from_dict(d)

scenario

This will return a single scenario for the list of items in the dict:

key

value

item:0

color

item:1

food

item:2

animal

If we instead want to create a scenario for each item in the list individually:

from edsl import ScenarioList

scenariolist = ScenarioList.from_nested_dict(d)

scenariolist

This will return:

item

color

food

animal

Creating a scenario list from a Wikipedia table

The ScenarioList method from_wikipedia_table(‘url’) can be used to create a list of scenarios from a Wikipedia table.

Example usage:

from edsl import ScenarioList

scenarios = ScenarioList.from_wikipedia("https://en.wikipedia.org/wiki/1990s_in_film", 3)

scenarios

This will return a list of scenarios for the first table on the Wikipedia page:

Rank

Title

Studios

Worldwide gross

Year

1

Titanic

Paramount Pictures/20th Century Fox

$1,843,201,268

1997

2

Star Wars: Episode I - The Phantom Menace

20th Century Fox

$924,317,558

1999

3

Jurassic Park

Universal Pictures

$914,691,118

1993

4

Independence Day

20th Century Fox

$817,400,891

1996

5

The Lion King

Walt Disney Studios

$763,455,561

1994

6

Forrest Gump

Paramount Pictures

$677,387,716

1994

7

The Sixth Sense

Walt Disney Studios

$672,806,292

1999

8

The Lost World: Jurassic Park

Universal Pictures

$618,638,999

1997

9

Men in Black

Sony Pictures/Columbia Pictures

$589,390,539

1997

10

Armageddon

Walt Disney Studios

$553,709,788

1998

11

Terminator 2: Judgment Day

TriStar Pictures

$519,843,345

1991

12

Ghost

Paramount Pictures

$505,702,588

1990

13

Aladdin

Walt Disney Studios

$504,050,219

1992

14

Twister

Warner Bros./Universal Pictures

$494,471,524

1996

15

Toy Story 2

Walt Disney Studios

$485,015,179

1999

16

Saving Private Ryan

DreamWorks Pictures/Paramount Pictures

$481,840,909

1998

17

Home Alone

20th Century Fox

$476,684,675

1990

18

The Matrix

Warner Bros.

$463,517,383

1999

19

Pretty Woman

Walt Disney Studios

$463,406,268

1990

20

Mission: Impossible

Paramount Pictures

$457,696,359

1996

21

Tarzan

Walt Disney Studios

$448,191,819

1999

22

Mrs. Doubtfire

20th Century Fox

$441,286,195

1993

23

Dances with Wolves

Orion Pictures

$424,208,848

1990

24

The Mummy

Universal Pictures

$415,933,406

1999

25

The Bodyguard

Warner Bros.

$411,006,740

1992

26

Robin Hood: Prince of Thieves

Warner Bros.

$390,493,908

1991

27

Godzilla

TriStar Pictures

$379,014,294

1998

28

True Lies

20th Century Fox

$378,882,411

1994

29

Toy Story

Walt Disney Studios

$373,554,033

1995

30

There’s Something About Mary

20th Century Fox

$369,884,651

1998

31

The Fugitive

Warner Bros.

$368,875,760

1993

32

Die Hard with a Vengeance

20th Century Fox/Cinergi Pictures

$366,101,666

1995

33

Notting Hill

PolyGram Filmed Entertainment

$363,889,678

1999

34

A Bug’s Life

Walt Disney Studios

$363,398,565

1998

35

The World Is Not Enough

Metro-Goldwyn-Mayer Pictures

$361,832,400

1999

36

Home Alone 2: Lost in New York

20th Century Fox

$358,994,850

1992

37

American Beauty

DreamWorks Pictures

$356,296,601

1999

38

Apollo 13

Universal Pictures/Imagine Entertainment

$355,237,933

1995

39

Basic Instinct

TriStar Pictures

$352,927,224

1992

40

GoldenEye

MGM/United Artists

$352,194,034

1995

41

The Mask

New Line Cinema

$351,583,407

1994

42

Speed

20th Century Fox

$350,448,145

1994

43

Deep Impact

Paramount Pictures/DreamWorks Pictures

$349,464,664

1998

44

Beauty and the Beast

Walt Disney Studios

$346,317,207

1991

45

Pocahontas

Walt Disney Studios

$346,079,773

1995

46

The Flintstones

Universal Pictures

$341,631,208

1994

47

Batman Forever

Warner Bros.

$336,529,144

1995

48

The Rock

Walt Disney Studios

$335,062,621

1996

49

Tomorrow Never Dies

MGM/United Artists

$333,011,068

1997

50

Seven

New Line Cinema

$327,311,859

1995

The parameters let us know the keys that can be used in the question text or stored as metadata. (They can be edited as needed - e.g., using the rename method discussed above.)

scenarios.parameters

This will return:

{'Rank', 'Ref.', 'Studios', 'Title', 'Worldwide gross', 'Year'}

The scenarios can be used to ask questions about the data in the table:

from edsl import QuestionList

q_leads = QuestionList(
    question_name = "leads",
    question_text = "Who are the lead actors or actresses in {{ Title }}?"
)

results = q_leads.by(scenarios).run()

(
    results
    .sort_by("Title")
    .select("Title", "leads")
)

Output:

Title

Leads

A Bug’s Life

Dave Foley, Kevin Spacey, Julia Louis-Dreyfus, Hayden Panettiere, Phyllis Diller, Richard Kind, David Hyde Pierce

Aladdin

Mena Massoud, Naomi Scott, Will Smith

American Beauty

Kevin Spacey, Annette Bening, Thora Birch, Mena Suvari, Wes Bentley, Chris Cooper

Apollo 13

Tom Hanks, Kevin Bacon, Bill Paxton

Armageddon

Bruce Willis, Billy Bob Thornton, Liv Tyler, Ben Affleck

Basic Instinct

Michael Douglas, Sharon Stone

Batman Forever

Val Kilmer, Tommy Lee Jones, Jim Carrey, Nicole Kidman, Chris O’Donnell

Beauty and the Beast

Emma Watson, Dan Stevens, Luke Evans, Kevin Kline, Josh Gad

Dances with Wolves

Kevin Costner, Mary McDonnell, Graham Greene, Rodney A. Grant

Deep Impact

Téa Leoni, Morgan Freeman, Elijah Wood, Robert Duvall

Die Hard with a Vengeance

Bruce Willis, Samuel L. Jackson, Jeremy Irons

Forrest Gump

Tom Hanks, Robin Wright, Gary Sinise, Mykelti Williamson, Sally Field

Ghost

Patrick Swayze, Demi Moore, Whoopi Goldberg

Godzilla

Matthew Broderick, Jean Reno, Bryan Cranston, Aaron Taylor-Johnson, Elizabeth Olsen, Kyle Chandler, Vera Farmiga, Millie Bobby Brown

GoldenEye

Pierce Brosnan, Sean Bean, Izabella Scorupco, Famke Janssen

Home Alone

Macaulay Culkin, Joe Pesci, Daniel Stern, Catherine O’Hara, John Heard

Home Alone 2: Lost in New York

Macaulay Culkin, Joe Pesci, Daniel Stern, Catherine O’Hara, John Heard

Independence Day

Will Smith, Bill Pullman, Jeff Goldblum

Jurassic Park

Sam Neill, Laura Dern, Jeff Goldblum, Richard Attenborough

Men in Black

Tommy Lee Jones, Will Smith

Mission: Impossible

Tom Cruise, Ving Rhames, Simon Pegg, Rebecca Ferguson, Jeremy Renner

Mrs. Doubtfire

Robin Williams, Sally Field, Pierce Brosnan, Lisa Jakub, Matthew Lawrence, Mara Wilson

Notting Hill

Julia Roberts, Hugh Grant

Pocahontas

Irene Bedard, Mel Gibson, Judy Kuhn, David Ogden Stiers, Russell Means, Christian Bale

Pretty Woman

Richard Gere, Julia Roberts

Robin Hood: Prince of Thieves

Kevin Costner, Morgan Freeman, Mary Elizabeth Mastrantonio, Christian Slater, Alan Rickman

Saving Private Ryan

Tom Hanks, Matt Damon, Tom Sizemore, Edward Burns, Barry Pepper, Adam Goldberg, Vin Diesel, Giovanni Ribisi, Jeremy Davies

Seven

Brad Pitt, Morgan Freeman, Gwyneth Paltrow

Speed

Keanu Reeves, Sandra Bullock, Dennis Hopper

Star Wars: Episode I - The Phantom Menace

Liam Neeson, Ewan McGregor, Natalie Portman, Jake Lloyd

Tarzan

Johnny Weissmuller, Maureen O’Sullivan

Terminator 2: Judgment Day

Arnold Schwarzenegger, Linda Hamilton, Edward Furlong, Robert Patrick

The Bodyguard

Kevin Costner, Whitney Houston

The Flintstones

John Goodman, Elizabeth Perkins, Rick Moranis, Rosie O’Donnell

The Fugitive

Harrison Ford, Tommy Lee Jones

The Lion King

Matthew Broderick, James Earl Jones, Jeremy Irons, Moira Kelly, Nathan Lane, Ernie Sabella, Rowan Atkinson, Whoopi Goldberg

The Lost World: Jurassic Park

Jeff Goldblum, Julianne Moore, Pete Postlethwaite

The Mask

Jim Carrey, Cameron Diaz

The Matrix

Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss

The Mummy

Brendan Fraser, Rachel Weisz, John Hannah, Arnold Vosloo

The Rock

Sean Connery, Nicolas Cage, Ed Harris

The Sixth Sense

Bruce Willis, Haley Joel Osment, Toni Collette, Olivia Williams

The World Is Not Enough

Pierce Brosnan, Sophie Marceau, Denise Richards, Robert Carlyle

There’s Something About Mary

Cameron Diaz, Ben Stiller, Matt Dillon

Titanic

Leonardo DiCaprio, Kate Winslet

Tomorrow Never Dies

Pierce Brosnan, Michelle Yeoh, Jonathan Pryce, Teri Hatcher

Toy Story

Tom Hanks, Tim Allen

Toy Story 2

Tom Hanks, Tim Allen, Joan Cusack

True Lies

Arnold Schwarzenegger, Jamie Lee Curtis

Twister

Helen Hunt, Bill Paxton

Creating a scenario list from a CSV

The ScenarioList method from_csv(‘<filepath>.csv’) creates a list of scenarios from a CSV file. The method reads the CSV file and creates a scenario for each row in the file, with the keys as the column names and the values as the row values.

For example, say we have a CSV file containing the following data:

message,user,source,date
I can't log in...,Alice,Customer support,2022-01-01
I need help with my bill...,Bob,Phone,2022-01-02
I have a safety concern...,Charlie,Email,2022-01-03
I need help with a product...,David,Chat,2022-01-04

We can create a list of scenarios from the CSV file:

from edsl import ScenarioList

scenariolist = ScenarioList.from_csv("<filepath>.csv")

scenariolist

This will return a scenario for each row:

Message

User

Source

Date

I can’t log in…

Alice

Customer support

2022-01-01

I need help with my bill…

Bob

Phone

2022-01-02

I have a safety concern…

Charlie

Email

2022-01-03

I need help with a product…

David

Chat

2022-01-04

If the scenario keys are not valid Python identifiers, we can use the give_valid_names() method to convert them to valid identifiers.

For example, our CSV file might contain a header row that is question texts:

"What is the message?","Who is the user?","What is the source?","What is the date?"
"I can't log in...","Alice","Customer support","2022-01-01"
"I need help with my bill...","Bob","Phone","2022-01-02"
"I have a safety concern...","Charlie","Email","2022-01-03"
"I need help with a product...","David","Chat","2022-01-04"

We can create a list of scenarios from the CSV file:

from edsl import ScenarioList

scenariolist = ScenarioList.from_csv("<filepath>.csv")

scenariolist = scenariolist.give_valid_names()

scenariolist

This will return scenarios with non-Pythonic identifiers:

What is the message?

Who is the user?

What is the source?

What is the date?

I can’t log in…

Alice

Customer support

2022-01-01

I need help with my bill…

Bob

Phone

2022-01-02

I have a safety concern…

Charlie

Email

2022-01-03

I need help with a product…

David

Chat

2022-01-04

We can then use the give_valid_names() method to convert the keys to valid identifiers:

scenariolist.give_valid_names()

scenariolist

This will return scenarios with valid identifiers (removing stop words and using underscores):

message

user

source

date

I can’t log in…

Alice

Customer support

2022-01-01

I need help with my bill…

Bob

Phone

2022-01-02

I have a safety concern…

Charlie

Email

2022-01-03

I need help with a product…

David

Chat

2022-01-04

Methods for un/pivoting and grouping scenarios

There are a variety of methods for modifying scenarios and scenario lists.

Unpivoting a scenario list

The ScenarioList method unpivot() can be used to unpivot a scenario list based on one or more specified identifiers. It takes a list of id_vars which are the names of the key/value pairs to keep in each new scenario, and a list of value_vars which are the names of the key/value pairs to unpivot.

For example, say we have a scenario list for the above CSV file:

from edsl import ScenarioList

scenariolist = ScenarioList.from_csv("<filepath>.csv")

scenariolist

We can call the unpivot the scenario list:

scenariolist.unpivot(id_vars = ["user"], value_vars = ["source", "date", "message"])

scenariolist

This will return a list of scenarios with the source, date, and message key/value pairs unpivoted:

user

variable

value

Alice

source

Customer support

Alice

date

2022-01-01

Alice

message

I can’t log in…

Bob

source

Phone

Bob

date

2022-01-02

Bob

message

I need help with my bill…

Charlie

source

Email

Charlie

date

2022-01-03

Charlie

message

I have a safety concern…

David

source

Chat

David

date

2022-01-04

David

message

I need help with a product…

Pivoting a scenario list

We can call the pivot() method to reverse the unpivot operation:

scenariolist.pivot(id_vars = ["user"], var_name="variable", value_name="value")

scenariolist

This will return a list of scenarios with the source, date, and message key/value pairs pivoted back to their original form:

user

source

date

message

Alice

Customer support

2022-01-01

I can’t log in…

Bob

Phone

2022-01-02

I need help with my bill…

Charlie

Email

2022-01-03

I have a safety concern…

David

Chat

2022-01-04

I need help with a product…

Grouping scenarios

The group_by() method can be used to group scenarios by one or more specified keys and apply a function to the values of the specified variables.

Example usage:

from edsl import ScenarioList

def avg_sum(a, b):
    return {'avg_a': sum(a) / len(a), 'sum_b': sum(b)}

scenariolist = ScenarioList([
    Scenario({'group': 'A', 'year': 2020, 'a': 10, 'b': 20}),
    Scenario({'group': 'A', 'year': 2021, 'a': 15, 'b': 25}),
    Scenario({'group': 'B', 'year': 2020, 'a': 12, 'b': 22}),
    Scenario({'group': 'B', 'year': 2021, 'a': 17, 'b': 27})
])

scenariolist.group_by(id_vars=['group'], variables=['a', 'b'], func=avg_sum)

This will return a list of scenarios with the a and b key/value pairs grouped by the group key and the avg_a and sum_b key/value pairs calculated by the avg_sum function:

group

avg_a

sum_b

A

12.5

45

B

14.5

49

Data labeling tasks

Scenarios are particularly useful for conducting data labeling or data coding tasks, where the task can be designed as a survey of questions about each piece of data in a dataset.

For example, say we have a dataset of text messages that we want to sort by topic. We can perform this task by using a language model to answer questions such as “What is the primary topic of this message: {{ message }}?” or “Does this message mention a safety issue? {{ message }}”, where each text message is inserted in the message placeholder of the question text.

Here we use scenarios to conduct the task:

from edsl import QuestionMultipleChoice, Survey, Scenario

# Create a question with that takes a parameter
q1 = QuestionMultipleChoice(
    question_name = "topic",
    question_text = "What is the topic of this message: {{ message }}?",
    question_options = ["Safety", "Product support", "Billing", "Login issue", "Other"]
)

q2 = QuestionMultipleChoice(
    question_name = "safety",
    question_text = "Does this message mention a safety issue? {{ message }}?",
    question_options = ["Yes", "No", "Unclear"]
)

# Create a list of scenarios for the parameter
messages = [
    "I can't log in...",
    "I need help with my bill...",
    "I have a safety concern...",
    "I need help with a product..."
    ]
scenarios = [Scenario({"message": message}) for message in messages]

# Create a survey with the question
survey = Survey(questions = [q1, q2])

# Run the survey with the scenarios
results = survey.by(scenarios).run()

We can then analyze the results to see how the agent answered the questions for each scenario:

results.select("message", "safety", "topic")

This will print a table of the scenarios and the answers to the questions for each scenario:

message

safety

topic

I can’t log in…

No

Login issue

I need help with a product…

No

Product support

I need help with my bill…

No

Billing

I have a safety concern…

Yes

Safety

Adding metadata

If we have metadata about the messages that we want to keep track of, we can add it to the scenarios as well. This will create additional columns for the metadata in the results dataset, but without the need to include it in our question texts. Here we modify the above example to use a dataset of messages with metadata. Note that the question texts are unchanged:

from edsl import QuestionMultipleChoice, Survey, ScenarioList, Scenario

# Create a question with a parameter
q1 = QuestionMultipleChoice(
    question_name = "topic",
    question_text = "What is the topic of this message: {{ message }}?",
    question_options = ["Safety", "Product support", "Billing", "Login issue", "Other"]
)

q2 = QuestionMultipleChoice(
    question_name = "safety",
    question_text = "Does this message mention a safety issue? {{ message }}?",
    question_options = ["Yes", "No", "Unclear"]
)

# Create scenarios for the sets of parameters
user_messages = [
    {"message": "I can't log in...", "user": "Alice", "source": "Customer support", "date": "2022-01-01"},
    {"message": "I need help with my bill...", "user": "Bob", "source": "Phone", "date": "2022-01-02"},
    {"message": "I have a safety concern...", "user": "Charlie", "source": "Email", "date": "2022-01-03"},
    {"message": "I need help with a product...", "user": "David", "source": "Chat", "date": "2022-01-04"}
]

scenarios = ScenarioList(
    Scenario.from_dict(m) for m in user_messages
)

# Create a survey with the question
survey = Survey(questions = [q1, q2])

# Run the survey with the scenarios
results = survey.by(scenarios).run()

# Inspect the results
results.select("scenario.*", "answer.*")

We can see how the agent answered the questions for each scenario, together with the metadata that was not included in the question text:

user

source

message

date

topic

safety

Alice

Customer support

I can’t log in…

2022-01-01

Login issue

No

Bob

Phone

I need help with my bill…

2022-01-02

Billing

No

Charlie

Email

I have a safety concern…

2022-01-03

Safety

Yes

David

Chat

I need help with a product…

2022-01-04

Product support

No

To learn more about accessing, analyzing and visualizing survey results, please see the Results section.

Slicing/chunking content into scenarios

We can use the Scenario method chunk() to slice a text scenario into a ScenarioList based on num_words or num_lines.

Example usage:

my_haiku = """
This is a long text.
Pages and pages, oh my!
I need to chunk it.
"""

text_scenario = Scenario({"my_text": my_haiku})

word_chunks_scenariolist = text_scenario.chunk(
    "my_text",
    num_words = 5, # use num_words or num_lines but not both
    include_original = True, # optional
    hash_original = True # optional
)
word_chunks_scenariolist

This will return:

my_text

my_text_chunk

my_text_original

This is a long text.

0

4aec42eda32b7f32bde8be6a6bc11125

Pages and pages, oh my!

1

4aec42eda32b7f32bde8be6a6bc11125

I need to chunk it.

2

4aec42eda32b7f32bde8be6a6bc11125

Scenario class

A Scenario is a dictionary with a key/value to parameterize a question.

class edsl.scenarios.Scenario.DisplayJSON(input_dict: dict)[source]

Bases: object

__init__(input_dict: dict)[source]
class edsl.scenarios.Scenario.DisplayYAML(input_dict: dict)[source]

Bases: object

__init__(input_dict: dict)[source]
class edsl.scenarios.Scenario.Scenario(data: dict | None = None, name: str = None)[source]

Bases: Base, UserDict, ScenarioHtmlMixin

A Scenario is a dictionary of keys/values that can be used to parameterize questions.

__init__(data: dict | None = None, name: str = None)[source]

Initialize a new Scenario.

Parameters:
  • data – A dictionary of keys/values for parameterizing questions.

  • name – The name of the scenario.

chunk(field, num_words: int | None = None, num_lines: int | None = None, include_original=False, hash_original=False) ScenarioList[source]

Split a field into chunks of a given size.

Parameters:
  • field – The field to split.

  • num_words – The number of words in each chunk.

  • num_lines – The number of lines in each chunk.

  • include_original – Whether to include the original field in the new scenarios.

  • hash_original – Whether to hash the original field in the new scenarios.

If you specify include_original=True, the original field will be included in the new scenarios with an “_original” suffix.

Either num_words or num_lines must be specified, but not both.

The hash_original parameter is useful if you do not want to store the original text, but still want a unique identifier for it.

Example:

>>> s = Scenario({"text": "This is a test.\nThis is a test.\n\nThis is a test."})
>>> s.chunk("text", num_lines = 1)
ScenarioList([Scenario({'text': 'This is a test.', 'text_chunk': 0}), Scenario({'text': 'This is a test.', 'text_chunk': 1}), Scenario({'text': '', 'text_chunk': 2}), Scenario({'text': 'This is a test.', 'text_chunk': 3})])
>>> s.chunk("text", num_words = 2)
ScenarioList([Scenario({'text': 'This is', 'text_chunk': 0}), Scenario({'text': 'a test.', 'text_chunk': 1}), Scenario({'text': 'This is', 'text_chunk': 2}), Scenario({'text': 'a test.', 'text_chunk': 3}), Scenario({'text': 'This is', 'text_chunk': 4}), Scenario({'text': 'a test.', 'text_chunk': 5})])
>>> s = Scenario({"text": "Hello World"})
>>> s.chunk("text", num_words = 1, include_original = True)
ScenarioList([Scenario({'text': 'Hello', 'text_chunk': 0, 'text_original': 'Hello World'}), Scenario({'text': 'World', 'text_chunk': 1, 'text_original': 'Hello World'})])
>>> s = Scenario({"text": "Hello World"})
>>> s.chunk("text", num_words = 1, include_original = True, hash_original = True)
ScenarioList([Scenario({'text': 'Hello', 'text_chunk': 0, 'text_original': 'b10a8db164e0754105b7a99be72e3fe5'}), Scenario({'text': 'World', 'text_chunk': 1, 'text_original': 'b10a8db164e0754105b7a99be72e3fe5'})])
>>> s.chunk("text")
Traceback (most recent call last):
...
ValueError: You must specify either num_words or num_lines.
>>> s.chunk("text", num_words = 1, num_lines = 1)
Traceback (most recent call last):
...
ValueError: You must specify either num_words or num_lines, but not both.
code() List[str][source]

Return the code for the scenario.

drop(list_of_keys: List[str]) Scenario[source]

Drop a subset of keys from a scenario.

Parameters:

list_of_keys – The keys to drop.

Example:

>>> s = Scenario({"food": "wood chips", "drink": "water"})
>>> s.drop(["food"])
Scenario({'drink': 'water'})
classmethod example(randomize: bool = False, has_image=False) Scenario[source]

Returns an example Scenario instance.

Parameters:

randomize – If True, adds a random string to the value of the example key.

classmethod from_dict(d: dict) Scenario[source]

Convert a dictionary to a scenario.

Example:

>>> Scenario.from_dict({"food": "wood chips"})
Scenario({'food': 'wood chips'})
classmethod from_docx(docx_path: str) Scenario[source]

Creates a scenario from the text of a docx file.

Parameters:

docx_path – The path to the docx file.

Example:

>>> from docx import Document
>>> doc = Document()
>>> _ = doc.add_heading("EDSL Survey")
>>> _ = doc.add_paragraph("This is a test.")
>>> doc.save("test.docx")
>>> s = Scenario.from_docx("test.docx")
>>> s
Scenario({'file_path': 'test.docx', 'text': 'EDSL Survey\nThis is a test.'})
>>> import os; os.remove("test.docx")
classmethod from_file(file_path: str, field_name: str) Scenario[source]

Creates a scenario from a file.

>>> import tempfile
>>> with tempfile.NamedTemporaryFile(suffix=".txt", mode="w") as f:
...     _ = f.write("This is a test.")
...     _ = f.flush()
...     s = Scenario.from_file(f.name, "file")
>>> s
Scenario({'file': FileStore(path='...', ...)})
classmethod from_image(image_path: str, image_name: str | None = None) Scenario[source]

Creates a scenario with a base64 encoding of an image.

Args:

image_path (str): Path to the image file.

Returns:

Scenario: A new Scenario instance with image information.

classmethod from_pdf(pdf_path: str)[source]
classmethod from_url(url: str, field_name: str | None = 'text') Scenario[source]

Creates a scenario from a URL.

Parameters:
  • url – The URL to create the scenario from.

  • field_name – The field name to use for the text.

property has_jinja_braces: bool[source]

Return whether the scenario has jinja braces. This matters for rendering.

>>> s = Scenario({"food": "I love {{wood chips}}"})
>>> s.has_jinja_braces
True
json()[source]
keep(list_of_keys: List[str]) Scenario[source]

Keep a subset of keys from a scenario.

Parameters:

list_of_keys – The keys to keep.

Example:

>>> s = Scenario({"food": "wood chips", "drink": "water"})
>>> s.keep(["food"])
Scenario({'food': 'wood chips'})
new_column_names(new_names: List[str]) Scenario[source]

Rename the keys of a scenario.

>>> s = Scenario({"food": "wood chips"})
>>> s.new_column_names(["food_preference"])
Scenario({'food_preference': 'wood chips'})
rename(old_name_or_replacement_dict: str | dict, new_name: str | None = None) Scenario[source]

Rename the keys of a scenario.

Parameters:
  • old_name_or_replacement_dict – A dictionary of old keys to new keys OR a string of the old key.

  • new_name – The new name of the key.

Example:

>>> s = Scenario({"food": "wood chips"})
>>> s.rename({"food": "food_preference"})
Scenario({'food_preference': 'wood chips'})
>>> s = Scenario({"food": "wood chips"})
>>> s.rename("food", "snack")
Scenario({'snack': 'wood chips'})
replicate(n: int) ScenarioList[source]

Replicate a scenario n times to return a ScenarioList.

Parameters:

n – The number of times to replicate the scenario.

Example:

>>> s = Scenario({"food": "wood chips"})
>>> s.replicate(2)
ScenarioList([Scenario({'food': 'wood chips'}), Scenario({'food': 'wood chips'})])
select(list_of_keys: Collection[str]) Scenario[source]

Select a subset of keys from a scenario.

Parameters:

list_of_keys – The keys to select.

Example:

>>> s = Scenario({"food": "wood chips", "drink": "water"})
>>> s.select(["food"])
Scenario({'food': 'wood chips'})
table(tablefmt: str = 'grid') str[source]

Display a scenario as a table.

to_dataset() Dataset[source]

Convert a scenario to a dataset.

>>> s = Scenario({"food": "wood chips"})
>>> s.to_dataset()
Dataset([{'key': ['food']}, {'value': ['wood chips']}])
to_dict(add_edsl_version: bool = True) dict[source]

Convert a scenario to a dictionary.

Example:

>>> s = Scenario({"food": "wood chips"})
>>> s.to_dict()
{'food': 'wood chips', 'edsl_version': '...', 'edsl_class_name': 'Scenario'}
>>> s.to_dict(add_edsl_version = False)
{'food': 'wood chips'}
yaml()[source]

ScenarioList class

A list of Scenarios to be used in a survey.

class edsl.scenarios.ScenarioList.ScenarioList(data: list | None = None, codebook: dict[str, str] | None = None)[source]

Bases: Base, UserList, ScenarioListMixin

Class for creating a list of scenarios to be used in a survey.

__init__(data: list | None = None, codebook: dict[str, str] | None = None)[source]

Initialize the ScenarioList class.

add_list(name: str, values: List[Any]) ScenarioList[source]

Add a list of values to a ScenarioList.

Example:

>>> s = ScenarioList([Scenario({'name': 'Alice'}), Scenario({'name': 'Bob'})])
>>> s.add_list('age', [30, 25])
ScenarioList([Scenario({'name': 'Alice', 'age': 30}), Scenario({'name': 'Bob', 'age': 25})])
add_value(name: str, value: Any) ScenarioList[source]

Add a value to all scenarios in a ScenarioList.

Example:

>>> s = ScenarioList([Scenario({'name': 'Alice'}), Scenario({'name': 'Bob'})])
>>> s.add_value('age', 30)
ScenarioList([Scenario({'name': 'Alice', 'age': 30}), Scenario({'name': 'Bob', 'age': 30})])
chunk(field, num_words: int | None = None, num_lines: int | None = None, include_original=False, hash_original=False) ScenarioList[source]

Chunk the scenarios based on a field.

Example:

>>> s = ScenarioList([Scenario({'text': 'The quick brown fox jumps over the lazy dog.'})])
>>> s.chunk('text', num_words=3)
ScenarioList([Scenario({'text': 'The quick brown', 'text_chunk': 0}), Scenario({'text': 'fox jumps over', 'text_chunk': 1}), Scenario({'text': 'the lazy dog.', 'text_chunk': 2})])
code() str[source]

Create the Python code representation of a survey.

concatenate(fields: List[str], separator: str = ';') ScenarioList[source]

Concatenate specified fields into a single field.

Parameters:
  • fields – The fields to concatenate.

  • separator – The separator to use.

Returns:

ScenarioList: A new ScenarioList with concatenated fields.

Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 2, 'c': 3}), Scenario({'a': 4, 'b': 5, 'c': 6})])
>>> s.concatenate(['a', 'b', 'c'])
ScenarioList([Scenario({'concat_a_b_c': '1;2;3'}), Scenario({'concat_a_b_c': '4;5;6'})])
drop(*fields: str) ScenarioList[source]

Drop fields from the scenarios.

Example:

>>> s = ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})])
>>> s.drop('a')
ScenarioList([Scenario({'b': 1}), Scenario({'b': 2})])
duplicate() ScenarioList[source]

Return a copy of the ScenarioList.

>>> sl = ScenarioList.example()
>>> sl_copy = sl.duplicate()
>>> sl == sl_copy
True
>>> sl is sl_copy
False
classmethod example(randomize: bool = False) ScenarioList[source]

Return an example ScenarioList instance.

Params randomize:

If True, use Scenario’s randomize method to randomize the values.

expand(expand_field: str, number_field: bool = False) ScenarioList[source]

Expand the ScenarioList by a field.

Parameters:
  • expand_field – The field to expand.

  • number_field – Whether to add a field with the index of the value

Example:

>>> s = ScenarioList( [ Scenario({'a':1, 'b':[1,2]}) ] )
>>> s.expand('b')
ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})])
>>> s.expand('b', number_field=True)
ScenarioList([Scenario({'a': 1, 'b': 1, 'b_number': 1}), Scenario({'a': 1, 'b': 2, 'b_number': 2})])
filter(expression: str) ScenarioList[source]

Filter a list of scenarios based on an expression.

Parameters:

expression – The expression to filter by.

Example:

>>> s = ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})])
>>> s.filter("b == 2")
ScenarioList([Scenario({'a': 1, 'b': 2})])
classmethod from_csv(source: str | 'ParseResult') ScenarioList[source]

Create a ScenarioList from a CSV file or URL.

classmethod from_delimited_file(source: str | 'ParseResult', delimiter: str = ',') ScenarioList[source]

Create a ScenarioList from a delimited file (CSV/TSV) or URL.

classmethod from_dict(data) ScenarioList[source]

Create a ScenarioList from a dictionary.

classmethod from_excel(filename: str, sheet_name: str | None = None) ScenarioList[source]

Create a ScenarioList from an Excel file.

If the Excel file contains multiple sheets and no sheet_name is provided, the method will print the available sheets and require the user to specify one.

Example:

>>> import tempfile
>>> import os
>>> import pandas as pd
>>> with tempfile.NamedTemporaryFile(delete=False, suffix='.xlsx') as f:
...     df1 = pd.DataFrame({
...         'name': ['Alice', 'Bob'],
...         'age': [30, 25],
...         'location': ['New York', 'Los Angeles']
...     })
...     df2 = pd.DataFrame({
...         'name': ['Charlie', 'David'],
...         'age': [35, 40],
...         'location': ['Chicago', 'Boston']
...     })
...     with pd.ExcelWriter(f.name) as writer:
...         df1.to_excel(writer, sheet_name='Sheet1', index=False)
...         df2.to_excel(writer, sheet_name='Sheet2', index=False)
...     temp_filename = f.name
>>> scenario_list = ScenarioList.from_excel(temp_filename, sheet_name='Sheet1')
>>> len(scenario_list)
2
>>> scenario_list[0]['name']
'Alice'
>>> scenario_list = ScenarioList.from_excel(temp_filename)  # Should raise an error and list sheets
Traceback (most recent call last):
...
ValueError: Please provide a sheet name to load data from.
classmethod from_google_doc(url: str) ScenarioList[source]

Create a ScenarioList from a Google Doc.

This method downloads the Google Doc as a Word file (.docx), saves it to a temporary file, and then reads it using the from_docx class method.

Args:

url (str): The URL to the Google Doc.

Returns:

ScenarioList: An instance of the ScenarioList class.

classmethod from_google_sheet(url: str, sheet_name: str = None) ScenarioList[source]

Create a ScenarioList from a Google Sheet.

This method downloads the Google Sheet as an Excel file, saves it to a temporary file, and then reads it using the from_excel class method.

Args:

url (str): The URL to the Google Sheet. sheet_name (str, optional): The name of the sheet to load. If None, the method will behave

the same as from_excel regarding multiple sheets.

Returns:

ScenarioList: An instance of the ScenarioList class.

classmethod from_latex(tex_file_path: str)[source]
classmethod from_list(name: str, values: list, func: Callable | None = None) ScenarioList[source]

Create a ScenarioList from a list of values.

Parameters:
  • name – The name of the field.

  • values – The list of values.

  • func – An optional function to apply to the values.

Example:

>>> ScenarioList.from_list('name', ['Alice', 'Bob'])
ScenarioList([Scenario({'name': 'Alice'}), Scenario({'name': 'Bob'})])
classmethod from_list_of_tuples(*names: str, values: List[Tuple]) ScenarioList[source]
classmethod from_nested_dict(data: dict) ScenarioList[source]

Create a ScenarioList from a nested dictionary.

classmethod from_pandas(df) ScenarioList[source]

Create a ScenarioList from a pandas DataFrame.

Example:

>>> import pandas as pd
>>> df = pd.DataFrame({'name': ['Alice', 'Bob'], 'age': [30, 25], 'location': ['New York', 'Los Angeles']})
>>> ScenarioList.from_pandas(df)
ScenarioList([Scenario({'name': 'Alice', 'age': 30, 'location': 'New York'}), Scenario({'name': 'Bob', 'age': 25, 'location': 'Los Angeles'})])
classmethod from_sqlite(filepath: str, table: str)[source]

Create a ScenarioList from a SQLite database.

classmethod from_tsv(source: str | 'ParseResult') ScenarioList[source]

Create a ScenarioList from a TSV file or URL.

from_urls(urls: list[str], field_name: str | None = 'text') ScenarioList[source]

Create a ScenarioList from a list of URLs.

Parameters:
  • urls – A list of URLs.

  • field_name – The name of the field to store the text from the URLs.

classmethod from_wikipedia(url: str, table_index: int = 0)[source]

Extracts a table from a Wikipedia page.

Parameters:

url (str): The URL of the Wikipedia page. table_index (int): The index of the table to extract (default is 0).

Returns:

pd.DataFrame: A DataFrame containing the extracted table.

# # Example usage # url = “https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)” # df = from_wikipedia(url, 0)

# if not df.empty: # print(df.head()) # else: # print(“Failed to extract table.”)

classmethod gen(scenario_dicts_list: List[dict]) ScenarioList[source]

Create a ScenarioList from a list of dictionaries.

Example:

>>> ScenarioList.gen([{'name': 'Alice'}, {'name': 'Bob'}])
ScenarioList([Scenario({'name': 'Alice'}), Scenario({'name': 'Bob'})])
give_valid_names(existing_codebook: dict = None) ScenarioList[source]

Give valid names to the scenario keys, using an existing codebook if provided.

Args:
existing_codebook (dict, optional): Existing mapping of original keys to valid names.

Defaults to None.

Returns:

ScenarioList: A new ScenarioList with valid variable names and updated codebook.

>>> s = ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})])
>>> s.give_valid_names()
ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})])
>>> s = ScenarioList([Scenario({'are you there John?': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})])
>>> s.give_valid_names()
ScenarioList([Scenario({'john': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})])
>>> s.give_valid_names({'are you there John?': 'custom_name'})
ScenarioList([Scenario({'custom_name': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})])
group_by(id_vars: List[str], variables: List[str], func: Callable) ScenarioList[source]

Group the ScenarioList by id_vars and apply a function to the specified variables.

Parameters:
  • id_vars – Fields to use as identifier variables

  • variables – Fields to group and aggregate

  • func – Function to apply to the grouped variables

Returns: ScenarioList: A new ScenarioList with the grouped and aggregated results

Example: >>> def avg_sum(a, b): … return {‘avg_a’: sum(a) / len(a), ‘sum_b’: sum(b)} >>> s = ScenarioList([ … Scenario({‘group’: ‘A’, ‘year’: 2020, ‘a’: 10, ‘b’: 20}), … Scenario({‘group’: ‘A’, ‘year’: 2021, ‘a’: 15, ‘b’: 25}), … Scenario({‘group’: ‘B’, ‘year’: 2020, ‘a’: 12, ‘b’: 22}), … Scenario({‘group’: ‘B’, ‘year’: 2021, ‘a’: 17, ‘b’: 27}) … ]) >>> s.group_by(id_vars=[‘group’], variables=[‘a’, ‘b’], func=avg_sum) ScenarioList([Scenario({‘group’: ‘A’, ‘avg_a’: 12.5, ‘sum_b’: 45}), Scenario({‘group’: ‘B’, ‘avg_a’: 14.5, ‘sum_b’: 49})])

property has_jinja_braces: bool[source]

Check if the ScenarioList has Jinja braces.

html(filename: str | None = None, cta: str = 'Open in browser', return_link: bool = False)[source]
keep(*fields: str) ScenarioList[source]

Keep only the specified fields in the scenarios.

Parameters:

fields – The fields to keep.

Example:

>>> s = ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})])
>>> s.keep('a')
ScenarioList([Scenario({'a': 1}), Scenario({'a': 1})])
left_join(other: ScenarioList, by: str | list[str]) ScenarioList[source]

Perform a left join with another ScenarioList, following SQL join semantics.

Args:

other: The ScenarioList to join with by: String or list of strings representing the key(s) to join on. Cannot be empty.

>>> s1 = ScenarioList([Scenario({'name': 'Alice', 'age': 30}), Scenario({'name': 'Bob', 'age': 25})])
>>> s2 = ScenarioList([Scenario({'name': 'Alice', 'location': 'New York'}), Scenario({'name': 'Charlie', 'location': 'Los Angeles'})])
>>> s3 = s1.left_join(s2, 'name')
>>> s3 == ScenarioList([Scenario({'age': 30, 'location': 'New York', 'name': 'Alice'}), Scenario({'age': 25, 'location': None, 'name': 'Bob'})])
True
mutate(new_var_string: str, functions_dict: dict[str, Callable] | None = None) ScenarioList[source]

Return a new ScenarioList with a new variable added.

Parameters:
  • new_var_string – A string with the new variable assignment.

  • functions_dict – A dictionary of functions to use in the assignment.

Example:

>>> s = ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})])
>>> s.mutate("c = a + b")
ScenarioList([Scenario({'a': 1, 'b': 2, 'c': 3}), Scenario({'a': 1, 'b': 1, 'c': 2})])
num_observations()[source]

Return the number of observations in the dataset.

>>> from edsl.results.Results import Results
>>> Results.example().num_observations()
4
order_by(*fields: str, reverse: bool = False) ScenarioList[source]

Order the scenarios by one or more fields.

Parameters:
  • fields – The fields to order by.

  • reverse – Whether to reverse the order.

Example:

>>> s = ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})])
>>> s.order_by('b', 'a')
ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})])
property parameters: set[source]

Return the set of parameters in the ScenarioList

Example:

>>> s = ScenarioList([Scenario({'a': 1}), Scenario({'b': 2})])
>>> s.parameters == {'a', 'b'}
True
pivot(id_vars: List[str] = None, var_name='variable', value_name='value') ScenarioList[source]

Pivot the ScenarioList from long to wide format.

Parameters: id_vars (list): Fields to use as identifier variables var_name (str): Name of the variable column (default: ‘variable’) value_name (str): Name of the value column (default: ‘value’)

Example: >>> s = ScenarioList([ … Scenario({‘id’: 1, ‘year’: 2020, ‘variable’: ‘a’, ‘value’: 10}), … Scenario({‘id’: 1, ‘year’: 2020, ‘variable’: ‘b’, ‘value’: 20}), … Scenario({‘id’: 2, ‘year’: 2021, ‘variable’: ‘a’, ‘value’: 15}), … Scenario({‘id’: 2, ‘year’: 2021, ‘variable’: ‘b’, ‘value’: 25}) … ]) >>> s.pivot(id_vars=[‘id’, ‘year’]) ScenarioList([Scenario({‘id’: 1, ‘year’: 2020, ‘a’: 10, ‘b’: 20}), Scenario({‘id’: 2, ‘year’: 2021, ‘a’: 15, ‘b’: 25})])

print_long()[source]

Print the results in a long format. >>> from edsl.results import Results >>> r = Results.example() >>> r.select(‘how_feeling’).print_long() answer.how_feeling: OK answer.how_feeling: Great answer.how_feeling: Terrible answer.how_feeling: OK

relevant_columns(data_type: str | None = None, remove_prefix=False) list[source]

Return the set of keys that are present in the dataset.

Parameters:
  • data_type – The data type to filter by.

  • remove_prefix – Whether to remove the prefix from the column names.

>>> from edsl.results.Dataset import Dataset
>>> d = Dataset([{'a.b':[1,2,3,4]}])
>>> d.relevant_columns()
['a.b']
>>> d.relevant_columns(remove_prefix=True)
['b']
>>> d = Dataset([{'a':[1,2,3,4]}, {'b':[5,6,7,8]}])
>>> d.relevant_columns()
['a', 'b']
>>> from edsl.results import Results; Results.example().select('how_feeling', 'how_feeling_yesterday').relevant_columns()
['answer.how_feeling', 'answer.how_feeling_yesterday']
>>> from edsl.results import Results
>>> sorted(Results.example().select().relevant_columns(data_type = "model"))
['model.frequency_penalty', ...]
>>> Results.example().relevant_columns(data_type = "flimflam")
Traceback (most recent call last):
...
ValueError: No columns found for data type: flimflam. Available data types are: ...
rename(replacement_dict: dict) ScenarioList[source]

Rename the fields in the scenarios.

Parameters:

replacement_dict – A dictionary with the old names as keys and the new names as values.

Example:

>>> s = ScenarioList([Scenario({'name': 'Alice', 'age': 30}), Scenario({'name': 'Bob', 'age': 25})])
>>> s.rename({'name': 'first_name', 'age': 'years'})
ScenarioList([Scenario({'first_name': 'Alice', 'years': 30}), Scenario({'first_name': 'Bob', 'years': 25})])
reorder_keys(new_order: List[str]) ScenarioList[source]

Reorder the keys in the scenarios.

Parameters:

new_order – The new order of the keys.

Example:

>>> s = ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 3, 'b': 4})])
>>> s.reorder_keys(['b', 'a'])
ScenarioList([Scenario({'b': 2, 'a': 1}), Scenario({'b': 4, 'a': 3})])
>>> s.reorder_keys(['a', 'b', 'c'])
Traceback (most recent call last):
...
AssertionError
sample(n: int, seed: str | None = None) ScenarioList[source]

Return a random sample from the ScenarioList

>>> s = ScenarioList.from_list("a", [1,2,3,4,5,6])
>>> s.sample(3, seed = "edsl")
ScenarioList([Scenario({'a': 2}), Scenario({'a': 1}), Scenario({'a': 3})])
select(*fields: str) ScenarioList[source]

Selects scenarios with only the references fields.

Parameters:

fields – The fields to select.

Example:

>>> s = ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})])
>>> s.select('a')
ScenarioList([Scenario({'a': 1}), Scenario({'a': 1})])
sem_filter(language_predicate: str) ScenarioList[source]

Filter the ScenarioList based on a language predicate.

Parameters:

language_predicate – The language predicate to use.

Inspired by: @misc{patel2024semanticoperators,

title={Semantic Operators: A Declarative Model for Rich, AI-based Analytics Over Text Data}, author={Liana Patel and Siddharth Jha and Parth Asawa and Melissa Pan and Carlos Guestrin and Matei Zaharia}, year={2024}, eprint={2407.11418}, archivePrefix={arXiv}, primaryClass={cs.DB}, url={https://arxiv.org/abs/2407.11418}, }

shuffle(seed: str | None = None) ScenarioList[source]

Shuffle the ScenarioList.

>>> s = ScenarioList.from_list("a", [1,2,3,4])
>>> s.shuffle(seed = "1234")
ScenarioList([Scenario({'a': 1}), Scenario({'a': 4}), Scenario({'a': 3}), Scenario({'a': 2})])
sql(query: str, transpose: bool = None, transpose_by: str = None, remove_prefix: bool = True) pd.DataFrame | str[source]

Execute a SQL query and return the results as a DataFrame.

Args:

query: The SQL query to execute shape: The shape of the data in the database (wide or long) remove_prefix: Whether to remove the prefix from the column names transpose: Whether to transpose the DataFrame transpose_by: The column to use as the index when transposing csv: Whether to return the DataFrame as a CSV string to_list: Whether to return the results as a list to_latex: Whether to return the results as LaTeX filename: Optional filename to save the results to

Returns:

DataFrame, CSV string, list, or LaTeX string depending on parameters

table(*fields: str, tablefmt: Literal['plain', 'simple', 'github', 'grid', 'fancy_grid', 'pipe', 'orgtbl', 'rst', 'mediawiki', 'html', 'latex', 'latex_raw', 'latex_booktabs', 'tsv'] | None = None, pretty_labels: dict[str, str] | None = None) str[source]

Return the ScenarioList as a table.

tally(*fields: str | None, top_n: int | None = None, output='Dataset') dict | Dataset[source]

Tally the values of a field or perform a cross-tab of multiple fields.

Parameters:

fields – The field(s) to tally, multiple fields for cross-tabulation.

>>> from edsl.results import Results
>>> r = Results.example()
>>> r.select('how_feeling').tally('answer.how_feeling', output = "dict")
{'OK': 2, 'Great': 1, 'Terrible': 1}
>>> from edsl.results.Dataset import Dataset
>>> expected = Dataset([{'answer.how_feeling': ['OK', 'Great', 'Terrible']}, {'count': [2, 1, 1]}])
>>> r.select('how_feeling').tally('answer.how_feeling', output = "Dataset") == expected
True
times(other: ScenarioList) ScenarioList[source]

Takes the cross product of two ScenarioLists.

Example:

>>> s1 = ScenarioList([Scenario({'a': 1}), Scenario({'a': 2})])
>>> s2 = ScenarioList([Scenario({'b': 1}), Scenario({'b': 2})])
>>> s1.times(s2)
ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2}), Scenario({'a': 2, 'b': 1}), Scenario({'a': 2, 'b': 2})])
to(survey: 'Survey' | 'QuestionBase') Jobs[source]

Create a Jobs object from a ScenarioList and a Survey object.

Parameters:

survey – The Survey object to use for the Jobs object.

Example: >>> from edsl import Survey >>> from edsl.jobs.Jobs import Jobs >>> from edsl import ScenarioList >>> isinstance(ScenarioList.example().to(Survey.example()), Jobs) True

to_agent_list(remove_prefix: bool = True)[source]

Convert the results to a list of dictionaries, one per agent.

Parameters:

remove_prefix – Whether to remove the prefix from the column names.

>>> from edsl.results import Results
>>> r = Results.example()
>>> r.select('how_feeling').to_agent_list()
AgentList([Agent(traits = {'how_feeling': 'OK'}), Agent(traits = {'how_feeling': 'Great'}), Agent(traits = {'how_feeling': 'Terrible'}), Agent(traits = {'how_feeling': 'OK'})])
to_csv(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None) FileStore[source]

Export the results to a FileStore instance containing CSV data.

Args:

filename: Optional filename for the CSV (defaults to “results.csv”) remove_prefix: Whether to remove the prefix from column names pretty_labels: Dictionary mapping original column names to pretty labels

Returns:

FileStore: Instance containing the CSV data

to_dataset() Dataset[source]

Convert the ScenarioList to a Dataset.

>>> s = ScenarioList.from_list("a", [1,2,3])
>>> s.to_dataset()
Dataset([{'a': [1, 2, 3]}])
>>> s = ScenarioList.from_list("a", [1,2,3]).add_list("b", [4,5,6])
>>> s.to_dataset()
Dataset([{'a': [1, 2, 3]}, {'b': [4, 5, 6]}])
to_dict(sort: bool = False, add_edsl_version: bool = True) dict[source]
>>> s = ScenarioList([Scenario({'food': 'wood chips'}), Scenario({'food': 'wood-fired pizza'})])
>>> s.to_dict()
{'scenarios': [{'food': 'wood chips', 'edsl_version': '...', 'edsl_class_name': 'Scenario'}, {'food': 'wood-fired pizza', 'edsl_version': '...', 'edsl_class_name': 'Scenario'}], 'edsl_version': '...', 'edsl_class_name': 'ScenarioList'}
to_dicts(remove_prefix: bool = True) list[dict][source]

Convert the results to a list of dictionaries.

Parameters:

remove_prefix – Whether to remove the prefix from the column names.

>>> from edsl.results import Results
>>> r = Results.example()
>>> r.select('how_feeling').to_dicts()
[{'how_feeling': 'OK'}, {'how_feeling': 'Great'}, {'how_feeling': 'Terrible'}, {'how_feeling': 'OK'}]
to_excel(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None, sheet_name: str | None = None) FileStore[source]

Export the results to a FileStore instance containing Excel data.

Args:

filename: Optional filename for the Excel file (defaults to “results.xlsx”) remove_prefix: Whether to remove the prefix from column names pretty_labels: Dictionary mapping original column names to pretty labels sheet_name: Name of the worksheet (defaults to “Results”)

Returns:

FileStore: Instance containing the Excel data

to_jsonl(filename: str | None = None) FileStore[source]

Export the results to a FileStore instance containing JSONL data.

Args:

filename: Optional filename for the JSONL file (defaults to “results.jsonl”)

Returns:

FileStore: Instance containing the JSONL data

to_key_value(field: str, value=None) dict | set[source]

Return the set of values in the field.

Parameters:
  • field – The field to extract values from.

  • value – An optional field to use as the value in the key-value pair.

Example:

>>> s = ScenarioList([Scenario({'name': 'Alice'}), Scenario({'name': 'Bob'})])
>>> s.to_key_value('name') == {'Alice', 'Bob'}
True
to_list(flatten=False, remove_none=False, unzipped=False) list[list][source]

Convert the results to a list of lists.

Parameters:
  • flatten – Whether to flatten the list of lists.

  • remove_none – Whether to remove None values from the list.

>>> from edsl.results import Results
>>> Results.example().select('how_feeling', 'how_feeling_yesterday')
Dataset([{'answer.how_feeling': ['OK', 'Great', 'Terrible', 'OK']}, {'answer.how_feeling_yesterday': ['Great', 'Good', 'OK', 'Terrible']}])
>>> Results.example().select('how_feeling', 'how_feeling_yesterday').to_list()
[('OK', 'Great'), ('Great', 'Good'), ('Terrible', 'OK'), ('OK', 'Terrible')]
>>> r = Results.example()
>>> r.select('how_feeling').to_list()
['OK', 'Great', 'Terrible', 'OK']
>>> from edsl.results.Dataset import Dataset
>>> Dataset([{'a.b': [[1, 9], 2, 3, 4]}]).select('a.b').to_list(flatten = True)
[1, 9, 2, 3, 4]
>>> from edsl.results.Dataset import Dataset
>>> Dataset([{'a.b': [[1, 9], 2, 3, 4]}, {'c': [6, 2, 3, 4]}]).select('a.b', 'c').to_list(flatten = True)
Traceback (most recent call last):
...
ValueError: Cannot flatten a list of lists when there are multiple columns selected.
to_pandas(remove_prefix: bool = False, lists_as_strings=False) DataFrame[source]

Convert the results to a pandas DataFrame, ensuring that lists remain as lists.

Parameters:

remove_prefix – Whether to remove the prefix from the column names.

to_scenario_list(remove_prefix: bool = True) list[dict][source]

Convert the results to a list of dictionaries, one per scenario.

Parameters:

remove_prefix – Whether to remove the prefix from the column names.

>>> from edsl.results import Results
>>> r = Results.example()
>>> r.select('how_feeling').to_scenario_list()
ScenarioList([Scenario({'how_feeling': 'OK'}), Scenario({'how_feeling': 'Great'}), Scenario({'how_feeling': 'Terrible'}), Scenario({'how_feeling': 'OK'})])
transform(field: str, func: Callable, new_name: str | None = None) ScenarioList[source]

Transform a field using a function.

Parameters:
  • field – The field to transform.

  • func – The function to apply to the field.

  • new_name – An optional new name for the transformed field.

>>> s = ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})])
>>> s.transform('b', lambda x: x + 1)
ScenarioList([Scenario({'a': 1, 'b': 3}), Scenario({'a': 1, 'b': 2})])
tree(node_list: List[str] | None = None) str[source]

Return the ScenarioList as a tree.

Parameters:

node_list – The list of nodes to include in the tree.

unique() ScenarioList[source]

Return a list of unique scenarios.

>>> s = ScenarioList([Scenario({'a': 1}), Scenario({'a': 1}), Scenario({'a': 2})])
>>> s.unique()
ScenarioList([Scenario({'a': 1}), Scenario({'a': 2})])
unpack(field: str, new_names: List[str] | None = None, keep_original=True) ScenarioList[source]

Unpack a field into multiple fields.

Example:

>>> s = ScenarioList([Scenario({'a': 1, 'b': [2, True]}), Scenario({'a': 3, 'b': [3, False]})])
>>> s.unpack('b')
ScenarioList([Scenario({'a': 1, 'b': [2, True], 'b_0': 2, 'b_1': True}), Scenario({'a': 3, 'b': [3, False], 'b_0': 3, 'b_1': False})])
>>> s.unpack('b', new_names=['c', 'd'], keep_original=False)
ScenarioList([Scenario({'a': 1, 'c': 2, 'd': True}), Scenario({'a': 3, 'c': 3, 'd': False})])
unpack_dict(field: str, prefix: str | None = None, drop_field: bool = False) ScenarioList[source]

Unpack a dictionary field into separate fields.

Parameters:
  • field – The field to unpack.

  • prefix – An optional prefix to add to the new fields.

  • drop_field – Whether to drop the original field.

Example:

>>> s = ScenarioList([Scenario({'a': 1, 'b': {'c': 2, 'd': 3}})])
>>> s.unpack_dict('b')
ScenarioList([Scenario({'a': 1, 'b': {'c': 2, 'd': 3}, 'c': 2, 'd': 3})])
>>> s.unpack_dict('b', prefix='new_')
ScenarioList([Scenario({'a': 1, 'b': {'c': 2, 'd': 3}, 'new_c': 2, 'new_d': 3})])
unpivot(id_vars: List[str] | None = None, value_vars: List[str] | None = None) ScenarioList[source]

Unpivot the ScenarioList, allowing for id variables to be specified.

Parameters: id_vars (list): Fields to use as identifier variables (kept in each entry) value_vars (list): Fields to unpivot. If None, all fields not in id_vars will be used.

Example: >>> s = ScenarioList([ … Scenario({‘id’: 1, ‘year’: 2020, ‘a’: 10, ‘b’: 20}), … Scenario({‘id’: 2, ‘year’: 2021, ‘a’: 15, ‘b’: 25}) … ]) >>> s.unpivot(id_vars=[‘id’, ‘year’], value_vars=[‘a’, ‘b’]) ScenarioList([Scenario({‘id’: 1, ‘year’: 2020, ‘variable’: ‘a’, ‘value’: 10}), Scenario({‘id’: 1, ‘year’: 2020, ‘variable’: ‘b’, ‘value’: 20}), Scenario({‘id’: 2, ‘year’: 2021, ‘variable’: ‘a’, ‘value’: 15}), Scenario({‘id’: 2, ‘year’: 2021, ‘variable’: ‘b’, ‘value’: 25})])

class edsl.scenarios.ScenarioList.ScenarioListMixin[source]

Bases: ScenarioListPdfMixin, ScenarioListExportMixin

html(filename: str | None = None, cta: str = 'Open in browser', return_link: bool = False)[source]
num_observations()[source]

Return the number of observations in the dataset.

>>> from edsl.results.Results import Results
>>> Results.example().num_observations()
4
print_long()[source]

Print the results in a long format. >>> from edsl.results import Results >>> r = Results.example() >>> r.select(‘how_feeling’).print_long() answer.how_feeling: OK answer.how_feeling: Great answer.how_feeling: Terrible answer.how_feeling: OK

relevant_columns(data_type: str | None = None, remove_prefix=False) list[source]

Return the set of keys that are present in the dataset.

Parameters:
  • data_type – The data type to filter by.

  • remove_prefix – Whether to remove the prefix from the column names.

>>> from edsl.results.Dataset import Dataset
>>> d = Dataset([{'a.b':[1,2,3,4]}])
>>> d.relevant_columns()
['a.b']
>>> d.relevant_columns(remove_prefix=True)
['b']
>>> d = Dataset([{'a':[1,2,3,4]}, {'b':[5,6,7,8]}])
>>> d.relevant_columns()
['a', 'b']
>>> from edsl.results import Results; Results.example().select('how_feeling', 'how_feeling_yesterday').relevant_columns()
['answer.how_feeling', 'answer.how_feeling_yesterday']
>>> from edsl.results import Results
>>> sorted(Results.example().select().relevant_columns(data_type = "model"))
['model.frequency_penalty', ...]
>>> Results.example().relevant_columns(data_type = "flimflam")
Traceback (most recent call last):
...
ValueError: No columns found for data type: flimflam. Available data types are: ...
sql(query: str, transpose: bool = None, transpose_by: str = None, remove_prefix: bool = True) pd.DataFrame | str[source]

Execute a SQL query and return the results as a DataFrame.

Args:

query: The SQL query to execute shape: The shape of the data in the database (wide or long) remove_prefix: Whether to remove the prefix from the column names transpose: Whether to transpose the DataFrame transpose_by: The column to use as the index when transposing csv: Whether to return the DataFrame as a CSV string to_list: Whether to return the results as a list to_latex: Whether to return the results as LaTeX filename: Optional filename to save the results to

Returns:

DataFrame, CSV string, list, or LaTeX string depending on parameters

tally(*fields: str | None, top_n: int | None = None, output='Dataset') dict | Dataset[source]

Tally the values of a field or perform a cross-tab of multiple fields.

Parameters:

fields – The field(s) to tally, multiple fields for cross-tabulation.

>>> from edsl.results import Results
>>> r = Results.example()
>>> r.select('how_feeling').tally('answer.how_feeling', output = "dict")
{'OK': 2, 'Great': 1, 'Terrible': 1}
>>> from edsl.results.Dataset import Dataset
>>> expected = Dataset([{'answer.how_feeling': ['OK', 'Great', 'Terrible']}, {'count': [2, 1, 1]}])
>>> r.select('how_feeling').tally('answer.how_feeling', output = "Dataset") == expected
True
to_agent_list(remove_prefix: bool = True)[source]

Convert the results to a list of dictionaries, one per agent.

Parameters:

remove_prefix – Whether to remove the prefix from the column names.

>>> from edsl.results import Results
>>> r = Results.example()
>>> r.select('how_feeling').to_agent_list()
AgentList([Agent(traits = {'how_feeling': 'OK'}), Agent(traits = {'how_feeling': 'Great'}), Agent(traits = {'how_feeling': 'Terrible'}), Agent(traits = {'how_feeling': 'OK'})])
to_csv(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None) FileStore[source]

Export the results to a FileStore instance containing CSV data.

Args:

filename: Optional filename for the CSV (defaults to “results.csv”) remove_prefix: Whether to remove the prefix from column names pretty_labels: Dictionary mapping original column names to pretty labels

Returns:

FileStore: Instance containing the CSV data

to_dicts(remove_prefix: bool = True) list[dict][source]

Convert the results to a list of dictionaries.

Parameters:

remove_prefix – Whether to remove the prefix from the column names.

>>> from edsl.results import Results
>>> r = Results.example()
>>> r.select('how_feeling').to_dicts()
[{'how_feeling': 'OK'}, {'how_feeling': 'Great'}, {'how_feeling': 'Terrible'}, {'how_feeling': 'OK'}]
to_excel(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None, sheet_name: str | None = None) FileStore[source]

Export the results to a FileStore instance containing Excel data.

Args:

filename: Optional filename for the Excel file (defaults to “results.xlsx”) remove_prefix: Whether to remove the prefix from column names pretty_labels: Dictionary mapping original column names to pretty labels sheet_name: Name of the worksheet (defaults to “Results”)

Returns:

FileStore: Instance containing the Excel data

to_jsonl(filename: str | None = None) FileStore[source]

Export the results to a FileStore instance containing JSONL data.

Args:

filename: Optional filename for the JSONL file (defaults to “results.jsonl”)

Returns:

FileStore: Instance containing the JSONL data

to_list(flatten=False, remove_none=False, unzipped=False) list[list][source]

Convert the results to a list of lists.

Parameters:
  • flatten – Whether to flatten the list of lists.

  • remove_none – Whether to remove None values from the list.

>>> from edsl.results import Results
>>> Results.example().select('how_feeling', 'how_feeling_yesterday')
Dataset([{'answer.how_feeling': ['OK', 'Great', 'Terrible', 'OK']}, {'answer.how_feeling_yesterday': ['Great', 'Good', 'OK', 'Terrible']}])
>>> Results.example().select('how_feeling', 'how_feeling_yesterday').to_list()
[('OK', 'Great'), ('Great', 'Good'), ('Terrible', 'OK'), ('OK', 'Terrible')]
>>> r = Results.example()
>>> r.select('how_feeling').to_list()
['OK', 'Great', 'Terrible', 'OK']
>>> from edsl.results.Dataset import Dataset
>>> Dataset([{'a.b': [[1, 9], 2, 3, 4]}]).select('a.b').to_list(flatten = True)
[1, 9, 2, 3, 4]
>>> from edsl.results.Dataset import Dataset
>>> Dataset([{'a.b': [[1, 9], 2, 3, 4]}, {'c': [6, 2, 3, 4]}]).select('a.b', 'c').to_list(flatten = True)
Traceback (most recent call last):
...
ValueError: Cannot flatten a list of lists when there are multiple columns selected.
to_pandas(remove_prefix: bool = False, lists_as_strings=False) DataFrame[source]

Convert the results to a pandas DataFrame, ensuring that lists remain as lists.

Parameters:

remove_prefix – Whether to remove the prefix from the column names.

to_scenario_list(remove_prefix: bool = True) list[dict][source]

Convert the results to a list of dictionaries, one per scenario.

Parameters:

remove_prefix – Whether to remove the prefix from the column names.

>>> from edsl.results import Results
>>> r = Results.example()
>>> r.select('how_feeling').to_scenario_list()
ScenarioList([Scenario({'how_feeling': 'OK'}), Scenario({'how_feeling': 'Great'}), Scenario({'how_feeling': 'Terrible'}), Scenario({'how_feeling': 'OK'})])