Scenarios

A Scenario is a dictionary containing one or more key/value pairs that is used to add data or content to questions in a survey, replacing a parameter in a question with a specific value (e.g., numerical or textual) or content (e.g., an image or PDF). A ScenarioList is a list of Scenario objects.

Purpose

Scenarios allow you create variations and versions of questions efficiently. For example, we could create a question “How much do you enjoy {{ scenario.activity }}?” and use scenarios to replace the parameter activity with running or reading or other activities. Similarly, we could create a question “What do you see in this image? {{ scenario.image }}” and use scenarios to replace the parameter image with different images.

How it works

Adding scenarios to a question–or to multiple questions at once in a survey–causes it to be administered multiple times, once for each scenario, with the parameter(s) replaced by the value(s) in the scenario. This allows us to administer multiple versions of a question together, either asynchronously (by default) or according to survey rules that we can specify (e.g., skip/stop logic), without having to create each version of a question manually.

Metadata

Scenarios are also a convenient way to keep track of metadata or other information relating to a survey that is important to an analysis of the results. For example, say we are using scenarios to parameterize question texts with pieces of {{ scenario.content }} from a dataset. In the scenarios that we create for the content parameter we could also include key/value pairs for metadata about the content, such as the {{ scenario.author }}, {{ scenario.publication_date }}, or {{ scenario.source }}. This will automatically include the data in the survey results but without requiring us to also parameterize the question texts those fields. This allows us to analyze the responses in the context of the metadata and avoid having to match up the data with the metadata post-survey. Please see more details on this feature in examples below.

Constructing a Scenario

To use a scenario, we start by creating a question that takes a parameter in double braces:

from edsl import QuestionMultipleChoice

q = QuestionMultipleChoice(
  question_name = "enjoy",
  question_text = "How much do you enjoy {{ scenario.activity }}?",
  question_options = ["Not at all", "Somewhat", "Very much"]
)

Next we create a dictionary for a value that will replace the parameter and store it in a Scenario object:

from edsl import Scenario

scenario = Scenario({"activity": "running"})

We can inspect the scenario and see that it consists of the key/value pair that we created:

scenario

This will return:

key

value

activity

running

ScenarioList

If multiple values will be used with a question or survey, we can create a list of Scenario objects that will be passed to the question or survey together. For example, here we create a list of scenarios and inspect them:

from edsl import Scenario

scenarios = [Scenario({"activity": a}) for a in ["running", "reading"]]
scenarios

Output:

[Scenario({'activity': 'running'}), Scenario({'activity': 'reading'})]

Alternatively, we can create a ScenarioList object. A list of scenarios is used in the same way as a ScenarioList; the difference is that a ScenarioList is a class that can be used to create a list of scenarios from a variety of data sources, such as a list, dictionary, a Wikipedia table or a PDF. These special methods are discussed below.

For example, here we create a ScenarioList for the same list as above:

from edsl import Scenario, ScenarioList

scenariolist = ScenarioList(Scenario({"activity": a}) for a in ["running", "reading"])
scenariolist

Output:

activity

running

reading

Special methods for creating scenarios

Special methods are available for creating a Scenario or ScenarioList from various data source types:

  • The constructor method from_pdf() can be used to create a single scenario for a PDF or a scenario list where each page of a PDF is stored as an individual scenario.

  • The constructor method from_directory() can be used to create a scenario list from all files in a directory, where each file is wrapped in a Scenario object with a specified key (default is “content”).

  • The constructor methods from_list(), from_csv, from_nested_dict() and from_wikipedia_table() will create a scenario list from a list, CSV, nested dictionary or Wikipedia table.

For example, the following code will create the same scenario list as above:

from edsl import ScenarioList

scenariolist = ScenarioList.from_list("activity", ["running", "reading"])

Example of creating a scenario list from files in a directory:

from edsl import ScenarioList, QuestionFreeText

# Create a ScenarioList from all image files in a directory
# Each file will be wrapped in a Scenario with key "content"
scenarios = ScenarioList.from_directory("images_folder/*.png")

# Or specify a custom key name
scenarios = ScenarioList.from_directory("images_folder", key_name="image")

# Create a question that uses the scenario key
q = QuestionFreeText(
    question_name="image_description",
    question_text="Please describe this image: {{ scenario.image }}"
)

# Run the question with the scenarios
results = q.by(scenarios).run()

Examples for each of these methods is provided below, and in this notebook.

Using a scenario

We use a Scenario or ScenarioList by adding it to a question or survey of questions, either when we are constructing questions or when running them. The most common situation is to add a scenario to a question when running it. This is done by passing the Scenario or ScenarioList object to the by() method or a question or survey and then chaining the run() method.

For example, here we call the by() method on the example question created above and pass a scenario list at the same time that we run it:

from edsl import QuestionMultipleChoice, Scenario, ScenarioList, Agent, Model

q = QuestionMultipleChoice(
  question_name = "enjoy",
  question_text = "How much do you enjoy {{ scenario.activity }}?",
  question_options = ["Not at all", "Somewhat", "Very much"]
)

s = ScenarioList(Scenario({"activity":a}) for a in ["running", "sleeping"])

a = Agent(traits = {"persona":"You are a human."})

m = Model("gemini-1.5-flash")

results = q.by(s).by(a).by(m).run()

We can check the results to verify that the scenario has been used correctly:

results.select("activity", "enjoy")

This will print a table of the selected components of the results:

scenario.activity

answer.enjoy

running

Somewhat

sleeping

Very much

Looping

We use the loop() method to add scenarios to a question when constructing the question. This method takes a ScenarioList and returns a list of new questions for each scenario that was passed. We can optionally include the scenario key in the question name as well as the question text. This allows us to control the question names when the new questions are created; otherwise a number is automatically added to the original question name in order to ensure uniqueness. Note that we do not include the scenario. prefix when looping.

For example:

from edsl import QuestionMultipleChoice, ScenarioList

q = QuestionMultipleChoice(
  question_name = "enjoy_{{ scenario.activity }}",
  question_text = "How much do you enjoy {{ scenario.activity }}?",
  question_options = ["Not at all", "Somewhat", "Very much"]
)

activities = ["running", "reading"]

sl = ScenarioList.from_list("activity", activities)

questions = q.loop(sl)

We can inspect the questions to see that they have been created correctly:

questions

This will return:

[Question('multiple_choice', question_name = """enjoy_running""", question_text = """How much do you enjoy running?""", question_options = ['Not at all', 'Somewhat', 'Very much']),
Question('multiple_choice', question_name = """enjoy_reading""", question_text = """How much do you enjoy reading?""", question_options = ['Not at all', 'Somewhat', 'Very much'])]

We can pass the questions to a survey and run it:

from edsl import Survey, Agent

survey = Survey(questions = questions)

a = Agent(traits = {"persona": "You are a human."})

results = survey.by(a).run()

results.select("answer.*")

This will print a table of the response for each question. Note that “activity” is no longer in a separate scenario field; instead, there is a single column for each question that was constructed with the scenarios:

answer.enjoy_reading

answer.enjoy_running

Very much

Somewhat

Note: The loop() method cannot be used with image or PDF scenarios, as these are not evaluated when the question is constructed. Instead, use the by() method to add these types of scenarios when running a survey (see image scenario examples below).

Multiple parameters

We can also create a Scenario for multiple parameters at once:

from edsl import QuestionFreeText, Scenario

q = QuestionFreeText(
  question_name = "counting",
  question_text = "How many {{ scenario.unit }} are in a {{ scenario.distance }}?",
)

scenario = Scenario({"unit": "inches", "distance": "mile"})

results = q.by(scenario).run()

results.select("unit", "distance", "counting")

This will print a table of the selected components of the results:

scenario.unit

scenario.distance

answer.counting

inches

mile

There are 63,360 inches in a mile.

To learn more about constructing surveys, please see the Surveys module.

Scenarios for question options

In the above examples we created scenarios in the question_text. We can also create a Scenario for question_options, e.g., in a multiple choice, checkbox, linear scale or other question type that requires them. Note that we do not include the scenario. prefix when using sceanrios for question options.

from edsl import QuestionMultipleChoice, Scenario

q = QuestionMultipleChoice(
  question_name = "capital_of_france",
  question_text = "What is the capital of France?",
  question_options = "{{ scenario.question_options }}"
)

s = Scenario({'question_options': ['Paris', 'London', 'Berlin', 'Madrid']})

results = q.by(s).run()

results.select("answer.*")

Output:

answer.capital_of_france

Paris

Combining Scenarios

We can combine multiple scenarios into a single Scenario object:

from edsl import Scenario

scenario1 = Scenario({"food": "apple"})
scenario2 = Scenario({"drink": "water"})

combined_scenario = scenario1 + scenario2

combined_scenario

This will return:

key

value

food

drink

apple

water

We can also combine ScenarioList objects:

from edsl import Scenario, ScenarioList

scenariolist1 = ScenarioList([Scenario({"food": "apple"}), Scenario({"drink": "water"})])
scenariolist2 = ScenarioList([Scenario({"color": "red"}), Scenario({"shape": "circle"})])

combined_scenariolist = scenariolist1 + scenariolist2

combined_scenariolist

This will return:

food

drink

color

shape

apple

nan

nan

nan

nan

water

nan

nan

nan

nan

nan

red

nan

nan

circle

nan

We can create a cross product of ScenarioList objects (combine the scenarios in each list with each other):

from edsl import Scenario, ScenarioList

scenariolist1 = ScenarioList([Scenario({"food": "apple"}), Scenario({"drink": "water"})])
scenariolist2 = ScenarioList([Scenario({"color": "red"}), Scenario({"shape": "circle"})])

cross_product_scenariolist = scenariolist1 * scenariolist2

cross_product_scenariolist

This will return:

food

drink

color

shape

apple

nan

nan

red

apple

nan

circle

nan

nan

water

nan

red

nan

water

circle

nan

Concatenating scenarios

There are several ScenarioList methods for concatenating scenarios.

The method concatenate() can be used to concatenate specified fields into a single string field; the default separator is a semicolon:

from edsl import Scenario, ScenarioList

sl = ScenarioList([
  Scenario({"a":1, "b":2, "c":3}),
  Scenario({"a":4, "b":5, "c":6})
])

slc = sl.concatenate(["a", "b"])

slc

This will return:

c

concat_a_b

3

1;2

6

4;5

We can specify a different separator:

slc = sl.concatenate(["a", "b"], separator = " ")

slc

This will return:

c

concat_a_b

3

1,2

6

4,5

The method concatenate_to_list() can be used to concatenate specified fields into a single list field:

from edsl import Scenario, ScenarioList

sl = ScenarioList([
  Scenario({"a":1, "b":2, "c":3}),
  Scenario({"a":4, "b":5, "c":6})
])

slc = sl.concatenate_to_list(["a", "b"])

slc

This will return:

c

concat_a_b

3

[1,2]

6

[4,5]

The method concatenate_to_set() can be used to concatenate specified fields into a single set field:

from edsl import Scenario, ScenarioList

sl = ScenarioList([
  Scenario({"a":1, "b":2, "c":3}),
  Scenario({"a":4, "b":5, "c":6})
])

slc = sl.concatenate_to_list(["a", "b"])

slc

This will return:

c

concat_a_b

3

{1,2}

6

{4,5}

The method collapse() can be used to collapse a scenario list by grouping on all fields except a specified field:

from edsl import ScenarioList

s = ScenarioList([
  Scenario({'category': 'fruit', 'color': 'red', 'item': 'apple'}),
  Scenario({'category': 'fruit', 'color': 'yellow', 'item': 'banana'}),
  Scenario({'category': 'fruit', 'color': 'red', 'item': 'cherry'}),
  Scenario({'category': 'vegetable', 'color': 'green', 'item': 'spinach'})
])

s.collapse('item')

This will return:

category

color

item

fruit

red

[‘apple’, ‘cherry’]

fruit

yellow

[‘banana’]

vegetable

green

[‘spinach’]

The method from_sqlite() can be used to create a scenario list from a SQLite database. It takes a filepath to the database file and optional parameters table and sql_query.

Creating scenarios from a dataset

There are a variety of methods for creating and working with scenarios generated from datasets and different data types.

Turning results into scenarios

The method to_scenario_list() can be used to turn the results of a survey into a list of scenarios.

Example usage:

Say we have some results from a survey where we asked agents to choose a random number between 1 and 1000:

from edsl import QuestionNumerical, Agent, AgentList

q_random = QuestionNumerical(
  question_name = "random",
  question_text = "Choose a random number between 1 and 1000."
)

agents = AgentList(Agent({"persona":p}) for p in ["Child", "Magician", "Olympic breakdancer"])

results = q_random.by(agents).run()

results.select("persona", "random")

Our results are:

agent.persona

answer.random

Child

7

Magician

472

Olympic breakdancer

529

We can use the to_scenario_list() method turn components of the results into a list of scenarios to use in a new survey:

scenarios = results.select("persona", "random").to_scenario_list() # excluding other columns of the results
scenarios

We can inspect the scenarios to see that they have been created correctly:

persona

random

Child

7

Magician

472

Olympic breakdancer

529

PDFs as textual scenarios

The ScenarioList method from_pdf(‘path/to/pdf’) is a convenient way to extract information from large files. It allows you to read in a PDF and automatically create a list of textual scenarios for the pages of the file. Each scenario has the following keys which can be used as parameters in a question or stored as metadata, and renamed as desired: filename, page, text.

If you prefer to create a single Scenario for the entire PDF file, you can use the Scenario.from_pdf(‘path/to/pdf’) method instead.

To use this method with either object, we start by adding a placeholder {{ scenario.text }} to a question text where the text of a PDF or PDF page will be inserted. When the question or survey is run with the PDF scenario or scenario list, the text of the PDF or individual pages will be inserted into the question text at the placeholder.

For example, this code can be used to insert the text of each page of a PDF in a survey of question:

from edsl import QuestionFreeText, ScenarioList, Survey

# Create a survey of questions parameterized by the {{ text }} of the PDF pages:
q1 = QuestionFreeText(
  question_name = "themes",
  question_text = "Identify the key themes mentioned on this page: {{ scenario.text }}",
)

q2 = QuestionFreeText(
  question_name = "idea",
  question_text = "Identify the most important idea on this page: {{ scenario.text }}",
)

survey = Survey([q1, q2])

scenarios = ScenarioList.from_pdf("path/to/pdf_file.pdf") # modify the filepath

# Run the survey with the pages of the PDF as scenarios:
results = survey.by(scenarios).run()

# To print the page and text of each PDF page scenario together with the answers to the question:
results.select("page", "text", "answer.*")

Examples of this method can be viewed in a demo notebook.

Image scenarios

A Scenario can be generated from an image by passing the filepath as the value. This is done by using the FileStore module to store the image and then passing the FileStore object to a Scenario.

Example usage:

from edsl import Scenario, FileStore

fs = FileStore("parrot_logo.png") # modify filepath

s = Scenario({"image":fs})

We can add the key to questions as we do scenarios from other data sources:

from edsl import Model, QuestionFreeText, QuestionList, Survey

m = Model("gemini-1.5-flash") # we need to use a vision model

q1 = QuestionFreeText(
  question_name = "identify",
  question_text = "What animal is in this picture: {{ scenario.image }}"
)

q2 = QuestionList(
  question_name = "colors",
  question_text = "What colors do you see in this picture: {{ scenario.image }}"
)

survey = Survey([q1, q2])

results = survey.by(s).run()

results.select("identify", "colors")

Output using the Expected Parrot logo:

answer.identify

answer.colors

The animal in the picture is a parrot.

[‘gray’, ‘green’, ‘yellow’, ‘pink’, ‘blue’, ‘black’]

See a demo notebook using of this method in the documentation page.

Note: You must use a vision model in order to run questions with images. We recommend testing whether a model can reliably identify your images before running a survey with them. You can also check the model pricing page to see available models’ performance with test questions, including images.

Creating a scenario list from a list

The ScenarioList method from_list() creates a list of scenarios for a specified key and list of values that is passed to it.

Example usage:

from edsl import ScenarioList

scenariolist = ScenarioList.from_list("item", ["color", "food", "animal"])

scenariolist

This will return:

item

color

food

animal

Creating a scenario list from a dictionary

The ScenarioList method from_nested_dict() creates a list of scenarios for a specified key and nested dictionary.

Example usage:

from edsl import ScenarioList

d = {"item": ["color", "food", "animal"]}

scenariolist = ScenarioList.from_nested_dict(d)
scenariolist

This will return:

item

color

food

animal

Creating a scenario list from a Wikipedia table

The ScenarioList method from_wikipedia_table(‘url’) can be used to create a list of scenarios from a Wikipedia table.

Example usage:

from edsl import ScenarioList

scenarios = ScenarioList.from_wikipedia("https://en.wikipedia.org/wiki/1990s_in_film", 3)
scenarios

This will return a list of scenarios for the first table on the Wikipedia page:

Rank

Title

Studios

Worldwide gross

Year

1

Titanic

Paramount Pictures/20th Century Fox

$1,843,201,268

1997

2

Star Wars: Episode I - The Phantom Menace

20th Century Fox

$924,317,558

1999

3

Jurassic Park

Universal Pictures

$914,691,118

1993

4

Independence Day

20th Century Fox

$817,400,891

1996

5

The Lion King

Walt Disney Studios

$763,455,561

1994

6

Forrest Gump

Paramount Pictures

$677,387,716

1994

7

The Sixth Sense

Walt Disney Studios

$672,806,292

1999

8

The Lost World: Jurassic Park

Universal Pictures

$618,638,999

1997

9

Men in Black

Sony Pictures/Columbia Pictures

$589,390,539

1997

10

Armageddon

Walt Disney Studios

$553,709,788

1998

11

Terminator 2: Judgment Day

TriStar Pictures

$519,843,345

1991

12

Ghost

Paramount Pictures

$505,702,588

1990

13

Aladdin

Walt Disney Studios

$504,050,219

1992

14

Twister

Warner Bros./Universal Pictures

$494,471,524

1996

15

Toy Story 2

Walt Disney Studios

$485,015,179

1999

16

Saving Private Ryan

DreamWorks Pictures/Paramount Pictures

$481,840,909

1998

17

Home Alone

20th Century Fox

$476,684,675

1990

18

The Matrix

Warner Bros.

$463,517,383

1999

19

Pretty Woman

Walt Disney Studios

$463,406,268

1990

20

Mission: Impossible

Paramount Pictures

$457,696,359

1996

21

Tarzan

Walt Disney Studios

$448,191,819

1999

22

Mrs. Doubtfire

20th Century Fox

$441,286,195

1993

23

Dances with Wolves

Orion Pictures

$424,208,848

1990

24

The Mummy

Universal Pictures

$415,933,406

1999

25

The Bodyguard

Warner Bros.

$411,006,740

1992

26

Robin Hood: Prince of Thieves

Warner Bros.

$390,493,908

1991

27

Godzilla

TriStar Pictures

$379,014,294

1998

28

True Lies

20th Century Fox

$378,882,411

1994

29

Toy Story

Walt Disney Studios

$373,554,033

1995

30

There’s Something About Mary

20th Century Fox

$369,884,651

1998

31

The Fugitive

Warner Bros.

$368,875,760

1993

32

Die Hard with a Vengeance

20th Century Fox/Cinergi Pictures

$366,101,666

1995

33

Notting Hill

PolyGram Filmed Entertainment

$363,889,678

1999

34

A Bug’s Life

Walt Disney Studios

$363,398,565

1998

35

The World Is Not Enough

Metro-Goldwyn-Mayer Pictures

$361,832,400

1999

36

Home Alone 2: Lost in New York

20th Century Fox

$358,994,850

1992

37

American Beauty

DreamWorks Pictures

$356,296,601

1999

38

Apollo 13

Universal Pictures/Imagine Entertainment

$355,237,933

1995

39

Basic Instinct

TriStar Pictures

$352,927,224

1992

40

GoldenEye

MGM/United Artists

$352,194,034

1995

41

The Mask

New Line Cinema

$351,583,407

1994

42

Speed

20th Century Fox

$350,448,145

1994

43

Deep Impact

Paramount Pictures/DreamWorks Pictures

$349,464,664

1998

44

Beauty and the Beast

Walt Disney Studios

$346,317,207

1991

45

Pocahontas

Walt Disney Studios

$346,079,773

1995

46

The Flintstones

Universal Pictures

$341,631,208

1994

47

Batman Forever

Warner Bros.

$336,529,144

1995

48

The Rock

Walt Disney Studios

$335,062,621

1996

49

Tomorrow Never Dies

MGM/United Artists

$333,011,068

1997

50

Seven

New Line Cinema

$327,311,859

1995

The parameters let us know the keys that can be used in the question text or stored as metadata. (They can be edited as needed - e.g., using the rename method discussed above.)

scenarios.parameters

This will return:

{'Rank', 'Ref.', 'Studios', 'Title', 'Worldwide gross', 'Year'}

The scenarios can be used to ask questions about the data in the table:

from edsl import QuestionList

q_leads = QuestionList(
  question_name = "leads",
  question_text = "Who are the lead actors or actresses in {{ scenario.Title }}?"
)

results = q_leads.by(scenarios).run()

(
  results
  .sort_by("Title")
  .select("Title", "leads")
)

Output:

Title

Leads

A Bug’s Life

Dave Foley, Kevin Spacey, Julia Louis-Dreyfus, Hayden Panettiere, Phyllis Diller, Richard Kind, David Hyde Pierce

Aladdin

Mena Massoud, Naomi Scott, Will Smith

American Beauty

Kevin Spacey, Annette Bening, Thora Birch, Mena Suvari, Wes Bentley, Chris Cooper

Apollo 13

Tom Hanks, Kevin Bacon, Bill Paxton

Armageddon

Bruce Willis, Billy Bob Thornton, Liv Tyler, Ben Affleck

Basic Instinct

Michael Douglas, Sharon Stone

Batman Forever

Val Kilmer, Tommy Lee Jones, Jim Carrey, Nicole Kidman, Chris O’Donnell

Beauty and the Beast

Emma Watson, Dan Stevens, Luke Evans, Kevin Kline, Josh Gad

Dances with Wolves

Kevin Costner, Mary McDonnell, Graham Greene, Rodney A. Grant

Deep Impact

Téa Leoni, Morgan Freeman, Elijah Wood, Robert Duvall

Die Hard with a Vengeance

Bruce Willis, Samuel L. Jackson, Jeremy Irons

Forrest Gump

Tom Hanks, Robin Wright, Gary Sinise, Mykelti Williamson, Sally Field

Ghost

Patrick Swayze, Demi Moore, Whoopi Goldberg

Godzilla

Matthew Broderick, Jean Reno, Bryan Cranston, Aaron Taylor-Johnson, Elizabeth Olsen, Kyle Chandler, Vera Farmiga, Millie Bobby Brown

GoldenEye

Pierce Brosnan, Sean Bean, Izabella Scorupco, Famke Janssen

Home Alone

Macaulay Culkin, Joe Pesci, Daniel Stern, Catherine O’Hara, John Heard

Home Alone 2: Lost in New York

Macaulay Culkin, Joe Pesci, Daniel Stern, Catherine O’Hara, John Heard

Independence Day

Will Smith, Bill Pullman, Jeff Goldblum

Jurassic Park

Sam Neill, Laura Dern, Jeff Goldblum, Richard Attenborough

Men in Black

Tommy Lee Jones, Will Smith

Mission: Impossible

Tom Cruise, Ving Rhames, Simon Pegg, Rebecca Ferguson, Jeremy Renner

Mrs. Doubtfire

Robin Williams, Sally Field, Pierce Brosnan, Lisa Jakub, Matthew Lawrence, Mara Wilson

Notting Hill

Julia Roberts, Hugh Grant

Pocahontas

Irene Bedard, Mel Gibson, Judy Kuhn, David Ogden Stiers, Russell Means, Christian Bale

Pretty Woman

Richard Gere, Julia Roberts

Robin Hood: Prince of Thieves

Kevin Costner, Morgan Freeman, Mary Elizabeth Mastrantonio, Christian Slater, Alan Rickman

Saving Private Ryan

Tom Hanks, Matt Damon, Tom Sizemore, Edward Burns, Barry Pepper, Adam Goldberg, Vin Diesel, Giovanni Ribisi, Jeremy Davies

Seven

Brad Pitt, Morgan Freeman, Gwyneth Paltrow

Speed

Keanu Reeves, Sandra Bullock, Dennis Hopper

Star Wars: Episode I - The Phantom Menace

Liam Neeson, Ewan McGregor, Natalie Portman, Jake Lloyd

Tarzan

Johnny Weissmuller, Maureen O’Sullivan

Terminator 2: Judgment Day

Arnold Schwarzenegger, Linda Hamilton, Edward Furlong, Robert Patrick

The Bodyguard

Kevin Costner, Whitney Houston

The Flintstones

John Goodman, Elizabeth Perkins, Rick Moranis, Rosie O’Donnell

The Fugitive

Harrison Ford, Tommy Lee Jones

The Lion King

Matthew Broderick, James Earl Jones, Jeremy Irons, Moira Kelly, Nathan Lane, Ernie Sabella, Rowan Atkinson, Whoopi Goldberg

The Lost World: Jurassic Park

Jeff Goldblum, Julianne Moore, Pete Postlethwaite

The Mask

Jim Carrey, Cameron Diaz

The Matrix

Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss

The Mummy

Brendan Fraser, Rachel Weisz, John Hannah, Arnold Vosloo

The Rock

Sean Connery, Nicolas Cage, Ed Harris

The Sixth Sense

Bruce Willis, Haley Joel Osment, Toni Collette, Olivia Williams

The World Is Not Enough

Pierce Brosnan, Sophie Marceau, Denise Richards, Robert Carlyle

There’s Something About Mary

Cameron Diaz, Ben Stiller, Matt Dillon

Titanic

Leonardo DiCaprio, Kate Winslet

Tomorrow Never Dies

Pierce Brosnan, Michelle Yeoh, Jonathan Pryce, Teri Hatcher

Toy Story

Tom Hanks, Tim Allen

Toy Story 2

Tom Hanks, Tim Allen, Joan Cusack

True Lies

Arnold Schwarzenegger, Jamie Lee Curtis

Twister

Helen Hunt, Bill Paxton

Creating a scenario list from a CSV

The ScenarioList method from_csv(‘<filepath>.csv’) creates a list of scenarios from a CSV file. The method reads the CSV file and creates a scenario for each row in the file, with the keys as the column names and the values as the row values.

For example, say we have a CSV file containing the following data:

message,user,source,date
I can't log in...,Alice,Customer support,2022-01-01
I need help with my bill...,Bob,Phone,2022-01-02
I have a safety concern...,Charlie,Email,2022-01-03
I need help with a product...,David,Chat,2022-01-04

We can create a list of scenarios from the CSV file:

from edsl import ScenarioList

scenariolist = ScenarioList.from_csv("path/to/file.csv") # update filepath
scenariolist

This will return a scenario for each row:

Message

User

Source

Date

I can’t log in…

Alice

Customer support

2022-01-01

I need help with my bill…

Bob

Phone

2022-01-02

I have a safety concern…

Charlie

Email

2022-01-03

I need help with a product…

David

Chat

2022-01-04

If the scenario keys are not valid Python identifiers, we can use the give_valid_names() method to convert them to valid identifiers.

For example, our CSV file might contain a header row that is question texts:

"What is the message?","Who is the user?","What is the source?","What is the date?"
"I can't log in...","Alice","Customer support","2022-01-01"
"I need help with my bill...","Bob","Phone","2022-01-02"
"I have a safety concern...","Charlie","Email","2022-01-03"
"I need help with a product...","David","Chat","2022-01-04"

We can create a list of scenarios from the CSV file:

from edsl import ScenarioList

scenariolist = ScenarioList.from_csv("path/to/file.csv") # update filepath

scenariolist = scenariolist.give_valid_names()
scenariolist

This will return scenarios with non-Pythonic identifiers:

What is the message?

Who is the user?

What is the source?

What is the date?

I can’t log in…

Alice

Customer support

2022-01-01

I need help with my bill…

Bob

Phone

2022-01-02

I have a safety concern…

Charlie

Email

2022-01-03

I need help with a product…

David

Chat

2022-01-04

We can then use the give_valid_names() method to convert the keys to valid identifiers:

scenariolist.give_valid_names()
scenariolist

This will return scenarios with valid identifiers (removing stop words and using underscores):

message

user

source

date

I can’t log in…

Alice

Customer support

2022-01-01

I need help with my bill…

Bob

Phone

2022-01-02

I have a safety concern…

Charlie

Email

2022-01-03

I need help with a product…

David

Chat

2022-01-04

Methods for un/pivoting and grouping scenarios

There are a variety of methods for modifying scenarios and scenario lists.

Unpivoting a scenario list

The ScenarioList method unpivot() can be used to unpivot a scenario list based on one or more specified identifiers. It takes a list of id_vars which are the names of the key/value pairs to keep in each new scenario, and a list of value_vars which are the names of the key/value pairs to unpivot.

For example, say we have a scenario list for the above CSV file:

from edsl import ScenarioList

scenariolist = ScenarioList.from_csv("<filepath>.csv")
scenariolist

We can call the unpivot the scenario list:

scenariolist.unpivot(id_vars = ["user"], value_vars = ["source", "date", "message"])
scenariolist

This will return a list of scenarios with the source, date, and message key/value pairs unpivoted:

user

variable

value

Alice

source

Customer support

Alice

date

2022-01-01

Alice

message

I can’t log in…

Bob

source

Phone

Bob

date

2022-01-02

Bob

message

I need help with my bill…

Charlie

source

Email

Charlie

date

2022-01-03

Charlie

message

I have a safety concern…

David

source

Chat

David

date

2022-01-04

David

message

I need help with a product…

Pivoting a scenario list

We can call the pivot() method to reverse the unpivot operation:

scenariolist.pivot(id_vars = ["user"], var_name="variable", value_name="value")
scenariolist

This will return a list of scenarios with the source, date, and message key/value pairs pivoted back to their original form:

user

source

date

message

Alice

Customer support

2022-01-01

I can’t log in…

Bob

Phone

2022-01-02

I need help with my bill…

Charlie

Email

2022-01-03

I have a safety concern…

David

Chat

2022-01-04

I need help with a product…

Grouping scenarios

The group_by() method can be used to group scenarios by one or more specified keys and apply a function to the values of the specified variables.

Example usage:

from edsl import Scenario, ScenarioList

def avg_sum(a, b):
  return {'avg_a': sum(a) / len(a), 'sum_b': sum(b)}

scenariolist = ScenarioList([
  Scenario({'group': 'A', 'year': 2020, 'a': 10, 'b': 20}),
  Scenario({'group': 'A', 'year': 2021, 'a': 15, 'b': 25}),
  Scenario({'group': 'B', 'year': 2020, 'a': 12, 'b': 22}),
  Scenario({'group': 'B', 'year': 2021, 'a': 17, 'b': 27})
])

scenariolist.group_by(id_vars=['group'], variables=['a', 'b'], func=avg_sum)

This will return a list of scenarios with the a and b key/value pairs grouped by the group key and the avg_a and sum_b key/value pairs calculated by the avg_sum function:

group

avg_a

sum_b

A

12.5

45

B

14.5

49

Data labeling tasks

Scenarios are particularly useful for conducting data labeling or data coding tasks, where the task can be designed as a survey of questions about each piece of data in a dataset.

For example, say we have a dataset of text messages that we want to sort by topic. We can perform this task by using a language model to answer questions such as “What is the primary topic of this message: {{ scenario.message }}?” or “Does this message mention a safety issue? {{ scenario.message }}”, where each text message is inserted in the message placeholder of the question text.

Here we use scenarios to conduct the task:

from edsl import QuestionMultipleChoice, Survey, Scenario, ScenarioList

# Create a question with that takes a parameter
q1 = QuestionMultipleChoice(
  question_name = "topic",
  question_text = "What is the topic of this message: {{ scenario.message }}?",
  question_options = ["Safety", "Product support", "Billing", "Login issue", "Other"]
)

q2 = QuestionMultipleChoice(
  question_name = "safety",
  question_text = "Does this message mention a safety issue? {{ scenario.message }}?",
  question_options = ["Yes", "No", "Unclear"]
)

# Create a list of scenarios for the parameter
messages = [
  "I can't log in...",
  "I need help with my bill...",
  "I have a safety concern...",
  "I need help with a product..."
]

scenarios = ScenarioList(Scenario({"message": message}) for message in messages)

# Create a survey with the question
survey = Survey(questions = [q1, q2])

# Run the survey with the scenarios
results = survey.by(scenarios).run()

We can then analyze the results to see how the agent answered the questions for each scenario:

results.select("message", "safety", "topic")

This will print a table of the scenarios and the answers to the questions for each scenario:

message

safety

topic

I can’t log in…

No

Login issue

I need help with a product…

No

Product support

I need help with my bill…

No

Billing

I have a safety concern…

Yes

Safety

Adding metadata

If we have metadata about the messages that we want to keep track of, we can add it to the scenarios as well. This will create additional columns for the metadata in the results dataset, but without the need to include it in our question texts. Here we modify the above example to use a dataset of messages with metadata. Note that the question texts are unchanged:

from edsl import QuestionMultipleChoice, Survey, Scenario, ScenarioList

# Create a question with a parameter
q1 = QuestionMultipleChoice(
  question_name = "topic",
  question_text = "What is the topic of this message: {{ scenario.message }}?",
  question_options = ["Safety", "Product support", "Billing", "Login issue", "Other"]
)

q2 = QuestionMultipleChoice(
  question_name = "safety",
  question_text = "Does this message mention a safety issue? {{ scenario.message }}?",
  question_options = ["Yes", "No", "Unclear"]
)

# Create scenarios for the sets of parameters
user_messages = [
  {"message": "I can't log in...", "user": "Alice", "source": "Customer support", "date": "2022-01-01"},
  {"message": "I need help with my bill...", "user": "Bob", "source": "Phone", "date": "2022-01-02"},
  {"message": "I have a safety concern...", "user": "Charlie", "source": "Email", "date": "2022-01-03"},
  {"message": "I need help with a product...", "user": "David", "source": "Chat", "date": "2022-01-04"}
]

scenarios = ScenarioList(
    Scenario.from_dict(m) for m in user_messages
)

# Create a survey with the question
survey = Survey(questions = [q1, q2])

# Run the survey with the scenarios
results = survey.by(scenarios).run()

# Inspect the results
results.select("scenario.*", "answer.*")

We can see how the agent answered the questions for each scenario, together with the metadata that was not included in the question text:

user

source

message

date

topic

safety

Alice

Customer support

I can’t log in…

2022-01-01

Login issue

No

Bob

Phone

I need help with my bill…

2022-01-02

Billing

No

Charlie

Email

I have a safety concern…

2022-01-03

Safety

Yes

David

Chat

I need help with a product…

2022-01-04

Product support

No

To learn more about accessing, analyzing and visualizing survey results, please see the Results section.

Slicing/chunking content into scenarios

We can use the Scenario method chunk() to slice a text scenario into a ScenarioList based on num_words or num_lines.

Example usage:

my_haiku = """
This is a long text.
Pages and pages, oh my!
I need to chunk it.
"""

text_scenario = Scenario({"my_text": my_haiku})

word_chunks_scenariolist = text_scenario.chunk(
  "my_text",
  num_words = 5, # use num_words or num_lines but not both
  include_original = True, # optional
  hash_original = True # optional
)
word_chunks_scenariolist

This will return:

my_text

my_text_chunk

my_text_original

This is a long text.

0

4aec42eda32b7f32bde8be6a6bc11125

Pages and pages, oh my!

1

4aec42eda32b7f32bde8be6a6bc11125

I need to chunk it.

2

4aec42eda32b7f32bde8be6a6bc11125

Using f-strings with scenarios

It is possible to use scenarios and f-strings together in a question. An f-string must be evaluated when a question is constructed, whereas a scenario is either evaluated when a question is run (using the by method) or when a question is constructed (using the loop method).

For example, here we use an f-string to create different versions of a question that also takes a parameter {{ scenario.activity }}, together with a list of scenarios to replace the parameter when the question is run. We optionally include the f-string in the question name in addition to the question text in order to control the unique identifiers for the questions, which are needed in order to pass the questions that are created to a Survey. (If you do not include the f-string in the question name, a number is automatically appended to each question name to ensure uniqueness.) Then we use the show_prompts() method to examine the user prompts that are created when the scenarios are added to the questions:

from edsl import QuestionFreeText, Scenario, ScenarioList, Survey

questions = []
sentiments = ["enjoy", "hate", "love"]
activities = ["running", "reading"]

for sentiment in sentiments:
  q = QuestionFreeText(
    question_name = f"{ sentiment }_activity",
    question_text = f"How much do you { sentiment } {{ scenario.activity }}?"
  )
  questions.append(q)

scenarios = ScenarioList.from_list("activity", activities)

survey = Survey(questions = questions)
survey.by(scenarios).show_prompts()

The show_prompts method will return the questions created with the f-string with the scenarios added. (Note that the system prompts are blank because we have not created any agents.)

user_prompt

system_prompt

How much do you enjoy running?

How much do you hate running?

How much do you love running?

How much do you enjoy reading?

How much do you hate reading?

How much do you love reading?

To learn more about user and system prompts, please see the Prompts section.

Scenario class

A dictionary-like object that stores key-value pairs for parameterizing questions.

A Scenario inherits from both the EDSL Base class and Python’s UserDict, allowing it to function as a dictionary while providing additional functionality. Scenarios are used to parameterize questions by providing variable data that can be referenced within question templates using Jinja syntax.

Scenarios can be created directly with dictionary data or constructed from various sources using class methods (from_file, from_url, from_pdf, etc.). They support operations like addition (combining scenarios) and multiplication (creating cross products with other scenarios or scenario lists).

Attributes:

data (dict): The underlying dictionary data. name (str, optional): A name for the scenario.

Examples:

Create a simple scenario: >>> s = Scenario({“product”: “coffee”, “price”: 4.99})

Combine scenarios: >>> s1 = Scenario({“product”: “coffee”}) >>> s2 = Scenario({“price”: 4.99}) >>> s3 = s1 + s2 >>> s3 Scenario({‘product’: ‘coffee’, ‘price’: 4.99})

Create a scenario from a file: >>> import tempfile >>> with tempfile.NamedTemporaryFile(mode=’w’, suffix=’.txt’, delete=False) as f: … _ = f.write(“Hello World”) … data_path = f.name >>> s = Scenario.from_file(data_path, “document”) >>> import os >>> os.unlink(data_path) # Clean up temp file

ScenarioList class

A collection of Scenario objects with advanced operations for manipulation and analysis.

ScenarioList extends Python’s UserList to provide specialized functionality for working with collections of Scenario objects. It inherits from Base to integrate with EDSL’s object model and from ScenarioListOperationsMixin to provide powerful data manipulation capabilities.

The class provides methods for filtering, sorting, joining, transforming, and analyzing collections of Scenarios. It’s designed to work seamlessly with other EDSL components like Surveys, Jobs, and Questions.

Attributes:

data (list): The underlying list of Scenario objects. codebook (dict): Optional metadata describing the fields in the scenarios.

Examples:

Create a ScenarioList from Scenario objects: >>> from edsl.scenarios import Scenario, ScenarioList >>> s1 = Scenario({“product”: “apple”, “price”: 1.99}) >>> s2 = Scenario({“product”: “banana”, “price”: 0.99}) >>> sl = ScenarioList([s1, s2])

Filter scenarios based on a condition: >>> cheap_fruits = sl.filter(“price < 1.50”) >>> len(cheap_fruits) 1 >>> cheap_fruits[0][“product”] ‘banana’

Add a new column based on existing data: >>> sl_with_tax = sl.mutate(“tax = price * 0.08”) >>> sl_with_tax[0][“tax”] 0.1592