Scenarios
A Scenario is a dictionary containing one or more key/value pairs that is used to add data or content to questions in a survey, replacing a parameter in a question with a specific value (e.g., numerical or textual) or content (e.g., an image or PDF). A ScenarioList is a list of Scenario objects.
Purpose
Scenarios allow you create variations and versions of questions efficiently. For example, we could create a question “How much do you enjoy {{ scenario.activity }}?” and use scenarios to replace the parameter activity with running or reading or other activities. Similarly, we could create a question “What do you see in this image? {{ scenario.image }}” and use scenarios to replace the parameter image with different images.
How it works
Adding scenarios to a question–or to multiple questions at once in a survey–causes it to be administered multiple times, once for each scenario, with the parameter(s) replaced by the value(s) in the scenario. This allows us to administer different versions of a question together, either asynchronously (by default) or according to survey rules that we can specify (e.g., skip/stop logic), without having to create each version of a question manually.
Metadata
Scenarios are also a convenient way to keep track of metadata or other information relating to a survey that is important to an analysis of the results. For example, say we are using scenarios to parameterize question texts with pieces of {{ scenario.content }} from a dataset. In the scenarios that we create for the content parameter we could also include key/value pairs for metadata about the content, such as the {{ scenario.author }}, {{ scenario.publication_date }}, or {{ scenario.source }}. This will automatically include the data in the survey results but without requiring us to also parameterize the question texts those fields. This allows us to analyze the responses in the context of the metadata and avoid having to match up the data with the metadata post-survey. Please see more details on this feature in examples below.
Constructing a Scenario
To use a scenario, we start by creating a question that takes a parameter in double braces:
from edsl import QuestionMultipleChoice
q = QuestionMultipleChoice(
question_name = "enjoy",
question_text = "How much do you enjoy {{ scenario.activity }}?",
question_options = ["Not at all", "Somewhat", "Very much"]
)
Next we create a dictionary for a value that will replace the parameter and store it in a Scenario object:
from edsl import Scenario
scenario = Scenario({"activity": "running"})
We can inspect the scenario and see that it consists of the key/value pair that we created:
scenario
This will return:
key |
value |
---|---|
activity |
running |
ScenarioList
If multiple values will be used with a question or survey, we can create a list of Scenario objects that will be passed to the question or survey together. For example, here we create a list of scenarios and inspect them:
from edsl import Scenario
scenarios = [Scenario({"activity": a}) for a in ["running", "reading"]]
scenarios
Output:
[Scenario({'activity': 'running'}), Scenario({'activity': 'reading'})]
Alternatively, we can create a ScenarioList object. A list of scenarios is used in the same way as a ScenarioList; the difference is that a ScenarioList is a class that can be used to create a list of scenarios from a variety of data sources, such as a CSV, dataframe, list, dictionary, a Wikipedia table or a PDF pages. These special methods are discussed below.
For example, here we create a ScenarioList for the same list as above:
from edsl import Scenario, ScenarioList
scenariolist = ScenarioList(Scenario({"activity": a}) for a in ["running", "reading"])
scenariolist
Output:
activity |
---|
running |
reading |
Special method for creating scenarios
We can use the general purpose from_source() method to create a ScenarioList from a variety of data source types. For example, the following code will create the same scenario list as above:
Each source type has its own set of parameters that can be passed to it:
“csv”
“dataframe”
“delimited_file”
“dict”
“directory”
“dta”
“excel”
“google_doc”
“google_sheet”
“json”
“latex”
“list”
“list_of_tuples”
“pandas”
“parquet”
“pdf”
“png”
“pdf_to_image”
“text”
“tsv”
“sqlite”
“urls”
“wikipedia”
Here we create a scenario list from files in a directory:
from edsl import ScenarioList, QuestionFreeText
# Create a ScenarioList from all image files in a directory
# Each file will be wrapped in a Scenario with key "content"
scenarios = ScenarioList.from_source("directory", "images_folder/*.png")
# Or specify a custom key name (e.g., "image")
scenarios = ScenarioList.from_source("directory", "images_folder/*.png", "image")
# Create a question that uses the scenario key
q = QuestionFreeText(
question_name="image_description",
question_text="Please describe this image: {{ scenario.image }}"
)
# Run the question with the scenarios
results = q.by(scenarios).run()
Examples of these methods are provided below and in this notebook.
Using a scenario
We use a Scenario or ScenarioList by adding it to a question or survey of questions, either when we are constructing questions or when running them. If we add scenarios to a question when running a survey (using the by() method), the scenario contents replace the parameters in the question text at runtime, and are stored in a separate column of the results. If we add scenarios to a question when constructing a survey (using the loop() method), the scenario contents become part of the question text and there is no separate column of the results for the scenarios.
The most common situation is to add a scenario to a question when running it. This is done by passing the Scenario or ScenarioList object to the by() method of a question or survey and then chaining the run() method.
For example, here we call the by() method on the example question created above and pass a scenario list when we run it:
from edsl import QuestionMultipleChoice, Scenario, ScenarioList, Agent, Model
q = QuestionMultipleChoice(
question_name = "enjoy",
question_text = "How much do you enjoy {{ scenario.activity }}?",
question_options = ["Not at all", "Somewhat", "Very much"]
)
s = ScenarioList(Scenario({"activity":a}) for a in ["running", "sleeping"])
a = Agent(traits = {"persona":"You are a human."})
m = Model("gemini-1.5-flash")
results = q.by(s).by(a).by(m).run()
We can check the results to verify that the scenario has been used correctly:
results.select("activity", "enjoy")
This will print a table of the selected components of the results:
scenario.activity |
answer.enjoy |
---|---|
running |
Somewhat |
sleeping |
Very much |
Looping
We use the loop() method to add scenarios to a question when constructing a survey. This method takes a ScenarioList and returns a list of new questions for each scenario that was passed. We can optionally include the scenario key in the question name as well as the question text. This allows us to control the question names when the new questions are created; otherwise a number is automatically added to the original question name in order to ensure uniqueness.
For example:
from edsl import QuestionMultipleChoice, ScenarioList
q = QuestionMultipleChoice(
question_name = "enjoy_{{ scenario.activity }}",
question_text = "How much do you enjoy {{ scenario.activity }}?",
question_options = ["Not at all", "Somewhat", "Very much"]
)
activities = ["running", "reading"]
sl = ScenarioList.from_list("activity", activities)
questions = q.loop(sl)
We can inspect the questions to see that they have been created correctly:
questions
This will return:
[Question('multiple_choice', question_name = """enjoy_running""", question_text = """How much do you enjoy running?""", question_options = ['Not at all', 'Somewhat', 'Very much']),
Question('multiple_choice', question_name = """enjoy_reading""", question_text = """How much do you enjoy reading?""", question_options = ['Not at all', 'Somewhat', 'Very much'])]
We can pass the questions to a survey and run it:
from edsl import Survey, Agent
survey = Survey(questions = questions)
a = Agent(traits = {"persona": "You are a human."})
results = survey.by(a).run()
results.select("answer.*")
This will print a table of the response for each question. Note that “activity” is no longer in a separate scenario field; instead, there is a single column for each question that was constructed with the scenarios:
answer.enjoy_reading |
answer.enjoy_running |
---|---|
Very much |
Somewhat |
Note: The loop() method cannot be used with image or PDF scenarios, as these are not evaluated when the question is constructed. Instead, use the by() method to add these types of scenarios when running a survey (see image scenario examples below).
Multiple parameters
We can also create a Scenario for multiple parameters at once:
from edsl import QuestionFreeText, Scenario
q = QuestionFreeText(
question_name = "counting",
question_text = "How many {{ scenario.unit }} are in a {{ scenario.distance }}?",
)
scenario = Scenario({"unit": "inches", "distance": "mile"})
results = q.by(scenario).run()
results.select("unit", "distance", "counting")
This will print a table of the selected components of the results:
scenario.unit |
scenario.distance |
answer.counting |
---|---|---|
inches |
mile |
There are 63,360 inches in a mile. |
To learn more about constructing surveys, please see the Surveys module.
Scenarios for question options
In the above examples we created scenarios in the question_text. We can also create a Scenario for question_options, e.g., in a multiple choice, checkbox, linear scale or other question type that requires them. Note that we do not include the scenario. prefix when using sceanrios for question options.
from edsl import QuestionMultipleChoice, Scenario
q = QuestionMultipleChoice(
question_name = "capital_of_france",
question_text = "What is the capital of France?",
question_options = "{{ scenario.question_options }}"
)
s = Scenario({'question_options': ['Paris', 'London', 'Berlin', 'Madrid']})
results = q.by(s).run()
results.select("answer.*")
Output:
answer.capital_of_france |
---|
Paris |
Scenario methods
There are a variety of methods for working with scenarios and scenario lists, including: concatenate, concatenate_to_list, concatenate_to_set, drop, duplicate expand, filter, keep, mutate, order_by, rename, sample, shuffle, times, tranform, unpack_dict
These methods can be used to manipulate scenarios and scenario lists in various ways, such as sampling a subset of scenarios, shuffling the order of scenarios, concatenating scenarios together, filtering scenarios based on certain criteria, and more. Examples of some of these methods are provided below.
Combining Scenarios
We can combine multiple scenarios into a single Scenario object:
from edsl import Scenario
scenario1 = Scenario({"food": "apple"})
scenario2 = Scenario({"drink": "water"})
combined_scenario = scenario1 + scenario2
combined_scenario
This will return:
key |
value |
---|---|
food |
drink |
apple |
water |
We can also combine ScenarioList objects:
from edsl import Scenario, ScenarioList
scenariolist1 = ScenarioList([Scenario({"food": "apple"}), Scenario({"drink": "water"})])
scenariolist2 = ScenarioList([Scenario({"color": "red"}), Scenario({"shape": "circle"})])
combined_scenariolist = scenariolist1 + scenariolist2
combined_scenariolist
This will return:
food |
drink |
color |
shape |
---|---|---|---|
apple |
nan |
nan |
nan |
nan |
water |
nan |
nan |
nan |
nan |
nan |
red |
nan |
nan |
circle |
nan |
We can create a cross product of ScenarioList objects (combine the scenarios in each list with each other):
from edsl import Scenario, ScenarioList
scenariolist1 = ScenarioList([Scenario({"food": "apple"}), Scenario({"drink": "water"})])
scenariolist2 = ScenarioList([Scenario({"color": "red"}), Scenario({"shape": "circle"})])
cross_product_scenariolist = scenariolist1 * scenariolist2
cross_product_scenariolist
This will return:
food |
drink |
color |
shape |
---|---|---|---|
apple |
nan |
nan |
red |
apple |
nan |
circle |
nan |
nan |
water |
nan |
red |
nan |
water |
circle |
nan |
Concatenating scenarios
There are several ScenarioList methods for concatenating scenarios.
The method concatenate() can be used to concatenate specified fields into a single string field; the default separator is a semicolon:
from edsl import Scenario, ScenarioList
sl = ScenarioList([
Scenario({"a":1, "b":2, "c":3}),
Scenario({"a":4, "b":5, "c":6})
])
slc = sl.concatenate(["a", "b"])
slc
This will return:
c |
concat_a_b |
---|---|
3 |
1;2 |
6 |
4;5 |
We can specify a different separator:
slc = sl.concatenate(["a", "b"], separator = " ")
slc
This will return:
c |
concat_a_b |
---|---|
3 |
1,2 |
6 |
4,5 |
The method concatenate_to_list() can be used to concatenate specified fields into a single list field:
from edsl import Scenario, ScenarioList
sl = ScenarioList([
Scenario({"a":1, "b":2, "c":3}),
Scenario({"a":4, "b":5, "c":6})
])
slc = sl.concatenate_to_list(["a", "b"])
slc
This will return:
c |
concat_a_b |
---|---|
3 |
[1,2] |
6 |
[4,5] |
The method concatenate_to_set() can be used to concatenate specified fields into a single set field:
from edsl import Scenario, ScenarioList
sl = ScenarioList([
Scenario({"a":1, "b":2, "c":3}),
Scenario({"a":4, "b":5, "c":6})
])
slc = sl.concatenate_to_list(["a", "b"])
slc
This will return:
c |
concat_a_b |
---|---|
3 |
{1,2} |
6 |
{4,5} |
The method collapse() can be used to collapse a scenario list by grouping on all fields except a specified field:
from edsl import ScenarioList
s = ScenarioList([
Scenario({'category': 'fruit', 'color': 'red', 'item': 'apple'}),
Scenario({'category': 'fruit', 'color': 'yellow', 'item': 'banana'}),
Scenario({'category': 'fruit', 'color': 'red', 'item': 'cherry'}),
Scenario({'category': 'vegetable', 'color': 'green', 'item': 'spinach'})
])
s.collapse('item')
This will return:
category |
color |
item |
---|---|---|
fruit |
red |
[‘apple’, ‘cherry’] |
fruit |
yellow |
[‘banana’] |
vegetable |
green |
[‘spinach’] |
The method from_source(“sqlite”) can be used to create a scenario list from a SQLite database. It takes a filepath to the database file and optional parameters table and sql_query.
Creating scenarios from a dataset
There are a variety of methods for creating and working with scenarios generated from datasets and different data types.
Turning results into scenarios
The method to_scenario_list() can be used to turn the results of a survey into a list of scenarios.
Example usage:
Say we have some results from a survey where we asked agents to choose a random number between 1 and 1000:
from edsl import QuestionNumerical, Agent, AgentList
q_random = QuestionNumerical(
question_name = "random",
question_text = "Choose a random number between 1 and 1000."
)
agents = AgentList(Agent({"persona":p}) for p in ["Child", "Magician", "Olympic breakdancer"])
results = q_random.by(agents).run()
results.select("persona", "random")
Our results are:
agent.persona |
answer.random |
---|---|
Child |
7 |
Magician |
472 |
Olympic breakdancer |
529 |
We can use the to_scenario_list() method turn components of the results into a list of scenarios to use in a new survey:
scenarios = results.select("persona", "random").to_scenario_list() # excluding other columns of the results
scenarios
We can inspect the scenarios to see that they have been created correctly:
persona |
random |
---|---|
Child |
7 |
Magician |
472 |
Olympic breakdancer |
529 |
PDFs as textual scenarios
The ScenarioList method from_source(“pdf”, “path/to/pdf”) is a convenient way to extract information from large files. It allows you to read in a PDF and automatically create a list of textual scenarios for the individual pages of the file. Each scenario has the following keys which can be used as parameters in a question or stored as metadata, and renamed as desired: filename, page, text:
from edsl import ScenarioList
scenarios = ScenarioList.from_source("pdf", "path/to/pdf_file.pdf") # modify the filepath
If you prefer to create a single Scenario for the entire PDF file, you can use the FileStore module to pass the file to a Scenario in the usual way (e.g., this method is identical for PNG image files):
from edsl import Scenario, FileStore
fs = FileStore("path/to/pdf") # create a FileStore object for the PDF file (or image file)
scenario = Scenario({"my_pdf": fs}) # pass the FileStore object to a Scenario
To use this method with either object, we start by adding a placeholder {{ scenario.text }} to a question text where the text of a PDF or PDF page will be inserted. When the question or survey is run with the PDF scenario or scenario list, the text of the PDF or individual pages will be inserted into the question text at the placeholder.
For example, this code can be used to insert the text of each page of a PDF in a survey of question:
from edsl import QuestionFreeText, ScenarioList, Survey
# Create a survey of questions parameterized by the {{ text }} of the PDF pages:
q1 = QuestionFreeText(
question_name = "themes",
question_text = "Identify the key themes mentioned on this page: {{ scenario.text }}",
)
q2 = QuestionFreeText(
question_name = "idea",
question_text = "Identify the most important idea on this page: {{ scenario.text }}",
)
survey = Survey([q1, q2])
scenarios = ScenarioList.from_source("pdf", "path/to/pdf_file.pdf") # modify the filepath
# Run the survey with the pages of the PDF as scenarios:
results = survey.by(scenarios).run()
# To print the page and text of each PDF page scenario together with the answers to the question:
results.select("page", "text", "answer.*")
Examples of this method can be viewed in a demo notebook.
Image scenarios
A Scenario can be generated from an image by passing the filepath as the value (the same as a PDF, as shown above). This is done by using the FileStore module to store the image and then passing the FileStore object to a Scenario.
Example usage:
from edsl import Scenario, FileStore
fs = FileStore("parrot_logo.png") # modify filepath
s = Scenario({"image":fs})
We can add the key to questions as we do scenarios from other data sources:
from edsl import Model, QuestionFreeText, QuestionList, Survey
m = Model("gemini-1.5-flash") # we need to use a vision model
q1 = QuestionFreeText(
question_name = "identify",
question_text = "What animal is in this picture: {{ scenario.image }}"
)
q2 = QuestionList(
question_name = "colors",
question_text = "What colors do you see in this picture: {{ scenario.image }}"
)
survey = Survey([q1, q2])
results = survey.by(s).run()
results.select("identify", "colors")
Output using the Expected Parrot logo:
answer.identify |
answer.colors |
---|---|
The animal in the picture is a parrot. |
[‘gray’, ‘green’, ‘yellow’, ‘pink’, ‘blue’, ‘black’] |
See a demo notebook using of this method in the documentation page.
Note: You must use a vision model in order to run questions with images. We recommend testing whether a model can reliably identify your images before running a survey with them. You can also use the models page to check available models’ performance with test questions, including images.
Creating a scenario list from a list
Example usage:
from edsl import ScenarioList
scenariolist = ScenarioList.from_source("list" "item", ["color", "food", "animal"])
scenariolist
This will return:
item |
---|
color |
food |
animal |
Creating a scenario list from a dictionary
Example usage:
from edsl import ScenarioList
d = {"item": ["color", "food", "animal"]}
scenariolist = ScenarioList.from_source("nested_dict", d)
scenariolist
This will return:
item |
---|
color |
food |
animal |
Creating a scenario list from a Wikipedia table
Example usage:
from edsl import ScenarioList
scenarios = ScenarioList.from_source("wikipedia", "https://en.wikipedia.org/wiki/1990s_in_film", 3)
scenarios
This will return a list of scenarios for the first table on the Wikipedia page:
Rank |
Title |
Studios |
Worldwide gross |
Year |
---|---|---|---|---|
1 |
Titanic |
Paramount Pictures/20th Century Fox |
$1,843,201,268 |
1997 |
2 |
Star Wars: Episode I - The Phantom Menace |
20th Century Fox |
$924,317,558 |
1999 |
3 |
Jurassic Park |
Universal Pictures |
$914,691,118 |
1993 |
4 |
Independence Day |
20th Century Fox |
$817,400,891 |
1996 |
5 |
The Lion King |
Walt Disney Studios |
$763,455,561 |
1994 |
6 |
Forrest Gump |
Paramount Pictures |
$677,387,716 |
1994 |
7 |
The Sixth Sense |
Walt Disney Studios |
$672,806,292 |
1999 |
8 |
The Lost World: Jurassic Park |
Universal Pictures |
$618,638,999 |
1997 |
9 |
Men in Black |
Sony Pictures/Columbia Pictures |
$589,390,539 |
1997 |
10 |
Armageddon |
Walt Disney Studios |
$553,709,788 |
1998 |
11 |
Terminator 2: Judgment Day |
TriStar Pictures |
$519,843,345 |
1991 |
12 |
Ghost |
Paramount Pictures |
$505,702,588 |
1990 |
13 |
Aladdin |
Walt Disney Studios |
$504,050,219 |
1992 |
14 |
Twister |
Warner Bros./Universal Pictures |
$494,471,524 |
1996 |
15 |
Toy Story 2 |
Walt Disney Studios |
$485,015,179 |
1999 |
16 |
Saving Private Ryan |
DreamWorks Pictures/Paramount Pictures |
$481,840,909 |
1998 |
17 |
Home Alone |
20th Century Fox |
$476,684,675 |
1990 |
18 |
The Matrix |
Warner Bros. |
$463,517,383 |
1999 |
19 |
Pretty Woman |
Walt Disney Studios |
$463,406,268 |
1990 |
20 |
Mission: Impossible |
Paramount Pictures |
$457,696,359 |
1996 |
21 |
Tarzan |
Walt Disney Studios |
$448,191,819 |
1999 |
22 |
Mrs. Doubtfire |
20th Century Fox |
$441,286,195 |
1993 |
23 |
Dances with Wolves |
Orion Pictures |
$424,208,848 |
1990 |
24 |
The Mummy |
Universal Pictures |
$415,933,406 |
1999 |
25 |
The Bodyguard |
Warner Bros. |
$411,006,740 |
1992 |
26 |
Robin Hood: Prince of Thieves |
Warner Bros. |
$390,493,908 |
1991 |
27 |
Godzilla |
TriStar Pictures |
$379,014,294 |
1998 |
28 |
True Lies |
20th Century Fox |
$378,882,411 |
1994 |
29 |
Toy Story |
Walt Disney Studios |
$373,554,033 |
1995 |
30 |
There’s Something About Mary |
20th Century Fox |
$369,884,651 |
1998 |
31 |
The Fugitive |
Warner Bros. |
$368,875,760 |
1993 |
32 |
Die Hard with a Vengeance |
20th Century Fox/Cinergi Pictures |
$366,101,666 |
1995 |
33 |
Notting Hill |
PolyGram Filmed Entertainment |
$363,889,678 |
1999 |
34 |
A Bug’s Life |
Walt Disney Studios |
$363,398,565 |
1998 |
35 |
The World Is Not Enough |
Metro-Goldwyn-Mayer Pictures |
$361,832,400 |
1999 |
36 |
Home Alone 2: Lost in New York |
20th Century Fox |
$358,994,850 |
1992 |
37 |
American Beauty |
DreamWorks Pictures |
$356,296,601 |
1999 |
38 |
Apollo 13 |
Universal Pictures/Imagine Entertainment |
$355,237,933 |
1995 |
39 |
Basic Instinct |
TriStar Pictures |
$352,927,224 |
1992 |
40 |
GoldenEye |
MGM/United Artists |
$352,194,034 |
1995 |
41 |
The Mask |
New Line Cinema |
$351,583,407 |
1994 |
42 |
Speed |
20th Century Fox |
$350,448,145 |
1994 |
43 |
Deep Impact |
Paramount Pictures/DreamWorks Pictures |
$349,464,664 |
1998 |
44 |
Beauty and the Beast |
Walt Disney Studios |
$346,317,207 |
1991 |
45 |
Pocahontas |
Walt Disney Studios |
$346,079,773 |
1995 |
46 |
The Flintstones |
Universal Pictures |
$341,631,208 |
1994 |
47 |
Batman Forever |
Warner Bros. |
$336,529,144 |
1995 |
48 |
The Rock |
Walt Disney Studios |
$335,062,621 |
1996 |
49 |
Tomorrow Never Dies |
MGM/United Artists |
$333,011,068 |
1997 |
50 |
Seven |
New Line Cinema |
$327,311,859 |
1995 |
The parameters let us know the keys that can be used in the question text or stored as metadata. (They can be edited as needed - e.g., using the rename method discussed above.)
scenarios.parameters
This will return:
{'Rank', 'Ref.', 'Studios', 'Title', 'Worldwide gross', 'Year'}
The scenarios can be used to ask questions about the data in the table:
from edsl import QuestionList
q_leads = QuestionList(
question_name = "leads",
question_text = "Who are the lead actors or actresses in {{ scenario.Title }}?"
)
results = q_leads.by(scenarios).run()
(
results
.sort_by("Title")
.select("Title", "leads")
)
Output:
Title |
Leads |
---|---|
A Bug’s Life |
Dave Foley, Kevin Spacey, Julia Louis-Dreyfus, Hayden Panettiere, Phyllis Diller, Richard Kind, David Hyde Pierce |
Aladdin |
Mena Massoud, Naomi Scott, Will Smith |
American Beauty |
Kevin Spacey, Annette Bening, Thora Birch, Mena Suvari, Wes Bentley, Chris Cooper |
Apollo 13 |
Tom Hanks, Kevin Bacon, Bill Paxton |
Armageddon |
Bruce Willis, Billy Bob Thornton, Liv Tyler, Ben Affleck |
Basic Instinct |
Michael Douglas, Sharon Stone |
Batman Forever |
Val Kilmer, Tommy Lee Jones, Jim Carrey, Nicole Kidman, Chris O’Donnell |
Beauty and the Beast |
Emma Watson, Dan Stevens, Luke Evans, Kevin Kline, Josh Gad |
Dances with Wolves |
Kevin Costner, Mary McDonnell, Graham Greene, Rodney A. Grant |
Deep Impact |
Téa Leoni, Morgan Freeman, Elijah Wood, Robert Duvall |
Die Hard with a Vengeance |
Bruce Willis, Samuel L. Jackson, Jeremy Irons |
Forrest Gump |
Tom Hanks, Robin Wright, Gary Sinise, Mykelti Williamson, Sally Field |
Ghost |
Patrick Swayze, Demi Moore, Whoopi Goldberg |
Godzilla |
Matthew Broderick, Jean Reno, Bryan Cranston, Aaron Taylor-Johnson, Elizabeth Olsen, Kyle Chandler, Vera Farmiga, Millie Bobby Brown |
GoldenEye |
Pierce Brosnan, Sean Bean, Izabella Scorupco, Famke Janssen |
Home Alone |
Macaulay Culkin, Joe Pesci, Daniel Stern, Catherine O’Hara, John Heard |
Home Alone 2: Lost in New York |
Macaulay Culkin, Joe Pesci, Daniel Stern, Catherine O’Hara, John Heard |
Independence Day |
Will Smith, Bill Pullman, Jeff Goldblum |
Jurassic Park |
Sam Neill, Laura Dern, Jeff Goldblum, Richard Attenborough |
Men in Black |
Tommy Lee Jones, Will Smith |
Mission: Impossible |
Tom Cruise, Ving Rhames, Simon Pegg, Rebecca Ferguson, Jeremy Renner |
Mrs. Doubtfire |
Robin Williams, Sally Field, Pierce Brosnan, Lisa Jakub, Matthew Lawrence, Mara Wilson |
Notting Hill |
Julia Roberts, Hugh Grant |
Pocahontas |
Irene Bedard, Mel Gibson, Judy Kuhn, David Ogden Stiers, Russell Means, Christian Bale |
Pretty Woman |
Richard Gere, Julia Roberts |
Robin Hood: Prince of Thieves |
Kevin Costner, Morgan Freeman, Mary Elizabeth Mastrantonio, Christian Slater, Alan Rickman |
Saving Private Ryan |
Tom Hanks, Matt Damon, Tom Sizemore, Edward Burns, Barry Pepper, Adam Goldberg, Vin Diesel, Giovanni Ribisi, Jeremy Davies |
Seven |
Brad Pitt, Morgan Freeman, Gwyneth Paltrow |
Speed |
Keanu Reeves, Sandra Bullock, Dennis Hopper |
Star Wars: Episode I - The Phantom Menace |
Liam Neeson, Ewan McGregor, Natalie Portman, Jake Lloyd |
Tarzan |
Johnny Weissmuller, Maureen O’Sullivan |
Terminator 2: Judgment Day |
Arnold Schwarzenegger, Linda Hamilton, Edward Furlong, Robert Patrick |
The Bodyguard |
Kevin Costner, Whitney Houston |
The Flintstones |
John Goodman, Elizabeth Perkins, Rick Moranis, Rosie O’Donnell |
The Fugitive |
Harrison Ford, Tommy Lee Jones |
The Lion King |
Matthew Broderick, James Earl Jones, Jeremy Irons, Moira Kelly, Nathan Lane, Ernie Sabella, Rowan Atkinson, Whoopi Goldberg |
The Lost World: Jurassic Park |
Jeff Goldblum, Julianne Moore, Pete Postlethwaite |
The Mask |
Jim Carrey, Cameron Diaz |
The Matrix |
Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss |
The Mummy |
Brendan Fraser, Rachel Weisz, John Hannah, Arnold Vosloo |
The Rock |
Sean Connery, Nicolas Cage, Ed Harris |
The Sixth Sense |
Bruce Willis, Haley Joel Osment, Toni Collette, Olivia Williams |
The World Is Not Enough |
Pierce Brosnan, Sophie Marceau, Denise Richards, Robert Carlyle |
There’s Something About Mary |
Cameron Diaz, Ben Stiller, Matt Dillon |
Titanic |
Leonardo DiCaprio, Kate Winslet |
Tomorrow Never Dies |
Pierce Brosnan, Michelle Yeoh, Jonathan Pryce, Teri Hatcher |
Toy Story |
Tom Hanks, Tim Allen |
Toy Story 2 |
Tom Hanks, Tim Allen, Joan Cusack |
True Lies |
Arnold Schwarzenegger, Jamie Lee Curtis |
Twister |
Helen Hunt, Bill Paxton |
Creating a scenario list from a CSV
The ScenarioList method from_source(“csv”, “<filepath>.csv”) creates a list of scenarios from a CSV file. The method reads the CSV file and creates a scenario for each row in the file, with the keys as the column names and the values as the row values.
For example, say we have a CSV file containing the following data:
message,user,source,date
I can't log in...,Alice,Customer support,2022-01-01
I need help with my bill...,Bob,Phone,2022-01-02
I have a safety concern...,Charlie,Email,2022-01-03
I need help with a product...,David,Chat,2022-01-04
We can create a list of scenarios from the CSV file:
from edsl import ScenarioList
scenariolist = ScenarioList.from_source("csv", "path/to/file.csv") # update filepath
scenariolist
This will return a scenario for each row:
Message |
User |
Source |
Date |
---|---|---|---|
I can’t log in… |
Alice |
Customer support |
2022-01-01 |
I need help with my bill… |
Bob |
Phone |
2022-01-02 |
I have a safety concern… |
Charlie |
2022-01-03 |
|
I need help with a product… |
David |
Chat |
2022-01-04 |
If the scenario keys are not valid Python identifiers, we can use the give_valid_names() method to convert them to valid identifiers.
For example, our CSV file might contain a header row that is question texts:
"What is the message?","Who is the user?","What is the source?","What is the date?"
"I can't log in...","Alice","Customer support","2022-01-01"
"I need help with my bill...","Bob","Phone","2022-01-02"
"I have a safety concern...","Charlie","Email","2022-01-03"
"I need help with a product...","David","Chat","2022-01-04"
We can create a list of scenarios from the CSV file:
from edsl import ScenarioList
scenariolist = ScenarioList.from_source("csv", "path/to/file.csv") # update filepath
scenariolist = scenariolist.give_valid_names()
scenariolist
This will return scenarios with non-Pythonic identifiers:
What is the message? |
Who is the user? |
What is the source? |
What is the date? |
---|---|---|---|
I can’t log in… |
Alice |
Customer support |
2022-01-01 |
I need help with my bill… |
Bob |
Phone |
2022-01-02 |
I have a safety concern… |
Charlie |
2022-01-03 |
|
I need help with a product… |
David |
Chat |
2022-01-04 |
We can then use the give_valid_names() method to convert the keys to valid identifiers:
scenariolist.give_valid_names()
scenariolist
This will return scenarios with valid identifiers (removing stop words and using underscores):
message |
user |
source |
date |
---|---|---|---|
I can’t log in… |
Alice |
Customer support |
2022-01-01 |
I need help with my bill… |
Bob |
Phone |
2022-01-02 |
I have a safety concern… |
Charlie |
2022-01-03 |
|
I need help with a product… |
David |
Chat |
2022-01-04 |
Methods for un/pivoting and grouping scenarios
There are a variety of methods for modifying scenarios and scenario lists.
Unpivoting a scenario list
The ScenarioList method unpivot() can be used to unpivot a scenario list based on one or more specified identifiers. It takes a list of id_vars which are the names of the key/value pairs to keep in each new scenario, and a list of value_vars which are the names of the key/value pairs to unpivot.
For example, say we have a scenario list for the above CSV file:
from edsl import ScenarioList
scenariolist = ScenarioList.from_source("csv", "<filepath>.csv")
scenariolist
We can call the unpivot the scenario list:
scenariolist.unpivot(id_vars = ["user"], value_vars = ["source", "date", "message"])
scenariolist
This will return a list of scenarios with the source, date, and message key/value pairs unpivoted:
user |
variable |
value |
---|---|---|
Alice |
source |
Customer support |
Alice |
date |
2022-01-01 |
Alice |
message |
I can’t log in… |
Bob |
source |
Phone |
Bob |
date |
2022-01-02 |
Bob |
message |
I need help with my bill… |
Charlie |
source |
|
Charlie |
date |
2022-01-03 |
Charlie |
message |
I have a safety concern… |
David |
source |
Chat |
David |
date |
2022-01-04 |
David |
message |
I need help with a product… |
Pivoting a scenario list
We can call the pivot() method to reverse the unpivot operation:
scenariolist.pivot(id_vars = ["user"], var_name="variable", value_name="value")
scenariolist
This will return a list of scenarios with the source, date, and message key/value pairs pivoted back to their original form:
user |
source |
date |
message |
---|---|---|---|
Alice |
Customer support |
2022-01-01 |
I can’t log in… |
Bob |
Phone |
2022-01-02 |
I need help with my bill… |
Charlie |
2022-01-03 |
I have a safety concern… |
|
David |
Chat |
2022-01-04 |
I need help with a product… |
Grouping scenarios
The group_by() method can be used to group scenarios by one or more specified keys and apply a function to the values of the specified variables.
Example usage:
from edsl import Scenario, ScenarioList
def avg_sum(a, b):
return {'avg_a': sum(a) / len(a), 'sum_b': sum(b)}
scenariolist = ScenarioList([
Scenario({'group': 'A', 'year': 2020, 'a': 10, 'b': 20}),
Scenario({'group': 'A', 'year': 2021, 'a': 15, 'b': 25}),
Scenario({'group': 'B', 'year': 2020, 'a': 12, 'b': 22}),
Scenario({'group': 'B', 'year': 2021, 'a': 17, 'b': 27})
])
scenariolist.group_by(id_vars=['group'], variables=['a', 'b'], func=avg_sum)
This will return a list of scenarios with the a and b key/value pairs grouped by the group key and the avg_a and sum_b key/value pairs calculated by the avg_sum function:
group |
avg_a |
sum_b |
---|---|---|
A |
12.5 |
45 |
B |
14.5 |
49 |
Data labeling tasks
Scenarios are particularly useful for conducting data labeling or data coding tasks, where the task can be designed as a survey of questions about each piece of data in a dataset.
For example, say we have a dataset of text messages that we want to sort by topic. We can perform this task by using a language model to answer questions such as “What is the primary topic of this message: {{ scenario.message }}?” or “Does this message mention a safety issue? {{ scenario.message }}”, where each text message is inserted in the message placeholder of the question text.
Here we use scenarios to conduct the task:
from edsl import QuestionMultipleChoice, Survey, Scenario, ScenarioList
# Create a question with that takes a parameter
q1 = QuestionMultipleChoice(
question_name = "topic",
question_text = "What is the topic of this message: {{ scenario.message }}?",
question_options = ["Safety", "Product support", "Billing", "Login issue", "Other"]
)
q2 = QuestionMultipleChoice(
question_name = "safety",
question_text = "Does this message mention a safety issue? {{ scenario.message }}?",
question_options = ["Yes", "No", "Unclear"]
)
# Create a list of scenarios for the parameter
messages = [
"I can't log in...",
"I need help with my bill...",
"I have a safety concern...",
"I need help with a product..."
]
scenarios = ScenarioList.from_source("list", "message", messages)
# Create a survey with the question
survey = Survey(questions = [q1, q2])
# Run the survey with the scenarios
results = survey.by(scenarios).run()
We can then analyze the results to see how the agent answered the questions for each scenario:
results.select("message", "safety", "topic")
This will print a table of the scenarios and the answers to the questions for each scenario:
message |
safety |
topic |
---|---|---|
I can’t log in… |
No |
Login issue |
I need help with a product… |
No |
Product support |
I need help with my bill… |
No |
Billing |
I have a safety concern… |
Yes |
Safety |
Adding metadata
If we have metadata about the messages that we want to keep track of, we can add it to the scenarios as well. This will create additional columns for the metadata in the results dataset, but without the need to include it in our question texts. Here we modify the above example to use a dataset of messages with metadata. Note that the question texts are unchanged:
from edsl import QuestionMultipleChoice, Survey, Scenario, ScenarioList
# Create a question with a parameter
q1 = QuestionMultipleChoice(
question_name = "topic",
question_text = "What is the topic of this message: {{ scenario.message }}?",
question_options = ["Safety", "Product support", "Billing", "Login issue", "Other"]
)
q2 = QuestionMultipleChoice(
question_name = "safety",
question_text = "Does this message mention a safety issue? {{ scenario.message }}?",
question_options = ["Yes", "No", "Unclear"]
)
# Create scenarios for the sets of parameters
user_messages = [
{"message": "I can't log in...", "user": "Alice", "source": "Customer support", "date": "2022-01-01"},
{"message": "I need help with my bill...", "user": "Bob", "source": "Phone", "date": "2022-01-02"},
{"message": "I have a safety concern...", "user": "Charlie", "source": "Email", "date": "2022-01-03"},
{"message": "I need help with a product...", "user": "David", "source": "Chat", "date": "2022-01-04"}
]
scenarios = ScenarioList.from_source("dict", user_messages)
# Create a survey with the question
survey = Survey(questions = [q1, q2])
# Run the survey with the scenarios
results = survey.by(scenarios).run()
# Inspect the results
results.select("scenario.*", "answer.*")
We can see how the agent answered the questions for each scenario, together with the metadata that was not included in the question text:
user |
source |
message |
date |
topic |
safety |
---|---|---|---|---|---|
Alice |
Customer support |
I can’t log in… |
2022-01-01 |
Login issue |
No |
Bob |
Phone |
I need help with my bill… |
2022-01-02 |
Billing |
No |
Charlie |
I have a safety concern… |
2022-01-03 |
Safety |
Yes |
|
David |
Chat |
I need help with a product… |
2022-01-04 |
Product support |
No |
To learn more about accessing, analyzing and visualizing survey results, please see the Results section.
Slicing/chunking content into scenarios
We can use the Scenario method chunk() to slice a text scenario into a ScenarioList based on num_words or num_lines.
Example usage:
my_haiku = """
This is a long text.
Pages and pages, oh my!
I need to chunk it.
"""
text_scenario = Scenario({"my_text": my_haiku})
word_chunks_scenariolist = text_scenario.chunk(
"my_text",
num_words = 5, # use num_words or num_lines but not both
include_original = True, # optional
hash_original = True # optional
)
word_chunks_scenariolist
This will return:
my_text |
my_text_chunk |
my_text_original |
---|---|---|
This is a long text. |
0 |
4aec42eda32b7f32bde8be6a6bc11125 |
Pages and pages, oh my! |
1 |
4aec42eda32b7f32bde8be6a6bc11125 |
I need to chunk it. |
2 |
4aec42eda32b7f32bde8be6a6bc11125 |
Using f-strings with scenarios
It is possible to use scenarios and f-strings together in a question. An f-string must be evaluated when a question is constructed, whereas a scenario is either evaluated when a question is run (using the by method) or when a question is constructed (using the loop method).
For example, here we use an f-string to create different versions of a question that also takes a parameter {{ scenario.activity }}, together with a list of scenarios to replace the parameter when the question is run. We optionally include the f-string in the question name in addition to the question text in order to control the unique identifiers for the questions, which are needed in order to pass the questions that are created to a Survey. (If you do not include the f-string in the question name, a number is automatically appended to each question name to ensure uniqueness.) Then we use the show_prompts() method to examine the user prompts that are created when the scenarios are added to the questions:
from edsl import QuestionFreeText, Scenario, ScenarioList, Survey
questions = []
sentiments = ["enjoy", "hate", "love"]
activities = ["running", "reading"]
for sentiment in sentiments:
q = QuestionFreeText(
question_name = f"{ sentiment }_activity",
question_text = f"How much do you { sentiment } {{ scenario.activity }}?"
)
questions.append(q)
scenarios = ScenarioList.from_source("list", "activity", activities)
survey = Survey(questions = questions)
survey.by(scenarios).show_prompts()
The show_prompts method will return the questions created with the f-string with the scenarios added. (Note that the system prompts are blank because we have not created any agents.)
user_prompt |
system_prompt |
---|---|
How much do you enjoy running? |
|
How much do you hate running? |
|
How much do you love running? |
|
How much do you enjoy reading? |
|
How much do you hate reading? |
|
How much do you love reading? |
To learn more about user and system prompts, please see the Prompts section.
Special methods
Special methods are available for generating or modifying scenarios using web searches:
The from_prompt method allows you to create scenarios from a prompt, which can be useful for generating scenarios based on user input or other dynamic sources:
from edsl import ScenarioList
scenarios = ScenarioList.from_prompt(
description = "What are some popular programming languages?",
name = "programming_languages", # optional name for the scenarios; default is "item"
target_number = 5, # optional number of scenarios to generate; default is 10
verbose = True # optional flag to return verbose output; default is False
)
The from_search_terms method allows you to create scenarios from a list of search terms, which can be useful for generating scenarios based on search queries or other dynamic sources:
from edsl import ScenarioList
search_terms = ["Python", "Java", "JavaScript"]
scenarios = ScenarioList.from_search_terms(search_terms)
The method augment_with_wikipedia allows you to augment scenarios with information from Wikipedia, which can be useful for enriching scenarios with additional context or data:
from edsl import ScenarioList
# method is used to augment existing scenarios
scenarios = ScenarioList.from_prompt(
description = "What are some popular programming languages?",
name = "programming_languages"
)
scenarios.augment_with_wikipedia(
search_key = "programming languages",
content_only = True # default optional flag to return only the content
key_name = "wikipedia_content" # default optional key name for the content
)
Scenario class
- class edsl.scenarios.Scenario(data: Dict[str, Any] | Mapping[str, Any] | None = None, name: str | None = None)[source]
Bases:
Base
,UserDict
A dictionary-like object that stores key-value pairs for parameterizing questions.
A Scenario inherits from both the EDSL Base class and Python’s UserDict, allowing it to function as a dictionary while providing additional functionality. Scenarios are used to parameterize questions by providing variable data that can be referenced within question templates using Jinja syntax.
Scenarios can be created directly with dictionary data or constructed from various sources using class methods (from_file, from_url, from_pdf, etc.). They support operations like addition (combining scenarios) and multiplication (creating cross products with other scenarios or scenario lists).
- Attributes:
data (dict): The underlying dictionary data. name (str, optional): A name for the scenario.
- Examples:
Create a simple scenario: >>> s = Scenario({“product”: “coffee”, “price”: 4.99})
Combine scenarios: >>> s1 = Scenario({“product”: “coffee”}) >>> s2 = Scenario({“price”: 4.99}) >>> s3 = s1 + s2 >>> s3 Scenario({‘product’: ‘coffee’, ‘price’: 4.99})
Create a scenario from a file: >>> import tempfile >>> with tempfile.NamedTemporaryFile(mode=’w’, suffix=’.txt’, delete=False) as f: … _ = f.write(“Hello World”) … data_path = f.name >>> s = Scenario.from_file(data_path, “document”) >>> import os >>> os.unlink(data_path) # Clean up temp file
- __init__(data: Dict[str, Any] | Mapping[str, Any] | None = None, name: str | None = None)[source]
Initialize a new Scenario.
- Args:
- data: A dictionary of key-value pairs for parameterizing questions.
Any dictionary-like object that can be converted to a dict is accepted.
name: An optional name for the scenario to aid in identification.
- Raises:
ScenarioError: If the data cannot be converted to a dictionary.
- Examples:
>>> s = Scenario({"product": "coffee", "price": 4.99}) >>> s = Scenario({"question": "What is your favorite color?"}, name="color_question")
- chunk(field: str, num_words: int | None = None, num_lines: int | None = None, include_original: bool = False, hash_original: bool = False) ScenarioList [source]
Splits a text field into chunks of a specified size, creating a ScenarioList.
This method takes a field containing text and divides it into smaller chunks based on either word count or line count. It’s particularly useful for processing large text documents in manageable pieces, such as for summarization, analysis, or when working with models that have token limits.
- Args:
field: The key name of the field in the Scenario to split. num_words: The number of words to include in each chunk. Mutually exclusive
with num_lines.
- num_lines: The number of lines to include in each chunk. Mutually exclusive
with num_words.
- include_original: If True, includes the original complete text in each chunk
with a “_original” suffix.
- hash_original: If True and include_original is True, stores a hash of the
original text instead of the full text.
- Returns:
A ScenarioList containing multiple Scenarios, each with a chunk of the original text. Each Scenario includes the chunk text, chunk index, character count, and word count.
- Raises:
ValueError: If neither num_words nor num_lines is specified, or if both are. KeyError: If the specified field doesn’t exist in the Scenario.
- Examples:
Split by lines (1 line per chunk): >>> s = Scenario({“text”: “This is a test.nThis is a test.nnThis is a test.”}) >>> s.chunk(“text”, num_lines=1) ScenarioList([Scenario({‘text’: ‘This is a test.’, ‘text_chunk’: 0, ‘text_char_count’: 15, ‘text_word_count’: 4}), Scenario({‘text’: ‘This is a test.’, ‘text_chunk’: 1, ‘text_char_count’: 15, ‘text_word_count’: 4}), Scenario({‘text’: ‘’, ‘text_chunk’: 2, ‘text_char_count’: 0, ‘text_word_count’: 0}), Scenario({‘text’: ‘This is a test.’, ‘text_chunk’: 3, ‘text_char_count’: 15, ‘text_word_count’: 4})])
Split by words (2 words per chunk): >>> s.chunk(“text”, num_words=2) ScenarioList([Scenario({‘text’: ‘This is’, ‘text_chunk’: 0, ‘text_char_count’: 7, ‘text_word_count’: 2}), Scenario({‘text’: ‘a test.’, ‘text_chunk’: 1, ‘text_char_count’: 7, ‘text_word_count’: 2}), Scenario({‘text’: ‘This is’, ‘text_chunk’: 2, ‘text_char_count’: 7, ‘text_word_count’: 2}), Scenario({‘text’: ‘a test.’, ‘text_chunk’: 3, ‘text_char_count’: 7, ‘text_word_count’: 2}), Scenario({‘text’: ‘This is’, ‘text_chunk’: 4, ‘text_char_count’: 7, ‘text_word_count’: 2}), Scenario({‘text’: ‘a test.’, ‘text_chunk’: 5, ‘text_char_count’: 7, ‘text_word_count’: 2})])
Include original text in each chunk: >>> s = Scenario({“text”: “Hello World”}) >>> s.chunk(“text”, num_words=1, include_original=True) ScenarioList([Scenario({‘text’: ‘Hello’, ‘text_chunk’: 0, ‘text_char_count’: 5, ‘text_word_count’: 1, ‘text_original’: ‘Hello World’}), Scenario({‘text’: ‘World’, ‘text_chunk’: 1, ‘text_char_count’: 5, ‘text_word_count’: 1, ‘text_original’: ‘Hello World’})])
Use a hash of the original text: >>> s.chunk(“text”, num_words=1, include_original=True, hash_original=True) ScenarioList([Scenario({‘text’: ‘Hello’, ‘text_chunk’: 0, ‘text_char_count’: 5, ‘text_word_count’: 1, ‘text_original’: ‘b10a8db164e0754105b7a99be72e3fe5’}), Scenario({‘text’: ‘World’, ‘text_chunk’: 1, ‘text_char_count’: 5, ‘text_word_count’: 1, ‘text_original’: ‘b10a8db164e0754105b7a99be72e3fe5’})])
- Notes:
Either num_words or num_lines must be specified, but not both
Each chunk is assigned a sequential index in the ‘text_chunk’ field
Character and word counts for each chunk are included
When include_original is True, the original text is preserved in each chunk
The hash_original option is useful to save space while maintaining traceability
- code() List[str] [source]
Generate Python code to recreate this scenario.
- Returns:
A list of strings representing Python code lines that can be executed to recreate this scenario.
- Examples:
>>> s = Scenario({"name": "Alice", "age": 30}) >>> code_lines = s.code() >>> print("\n".join(code_lines)) from edsl.scenarios import Scenario s = Scenario({'name': 'Alice', 'age': 30})
- drop(*args: str | Iterable[str]) Scenario [source]
Drop a subset of keys from a scenario.
This method delegates to ScenarioSelector for the actual dropping logic. It supports both individual string arguments and collection arguments for backward compatibility.
- Args:
- *args: Either a single collection of keys (for backward compatibility)
or individual string arguments for keys to drop.
- Returns:
A new Scenario containing all keys except the dropped ones.
- Raises:
ValueError: If no arguments are provided.
- Examples:
Using a list (backward compatible): >>> s = Scenario({“food”: “wood chips”, “drink”: “water”}) >>> s.drop([“food”]) Scenario({‘drink’: ‘water’})
Using individual string arguments: >>> s = Scenario({“food”: “wood chips”, “drink”: “water”, “dessert”: “cookies”}) >>> s.drop(“drink”, “dessert”) Scenario({‘food’: ‘wood chips’})
Single string argument: >>> s.drop(“drink”) Scenario({‘food’: ‘wood chips’, ‘dessert’: ‘cookies’})
- classmethod example(randomize: bool = False) Scenario [source]
Returns an example Scenario instance.
- Args:
- randomize: If True, adds a random string to the value of the example key
to ensure uniqueness.
- Returns:
A Scenario instance with example data suitable for testing or demonstration.
- Examples:
>>> s = Scenario.example() >>> 'persona' in s True >>> s1 = Scenario.example(randomize=True) >>> s2 = Scenario.example(randomize=True) >>> s1.data != s2.data # Should be different due to randomization True
- classmethod from_dict(d: Dict[str, Any]) Scenario [source]
Creates a Scenario from a dictionary, with special handling for FileStore objects.
This method creates a Scenario using the provided dictionary. It has special handling for dictionary values that represent serialized FileStore objects, which it will deserialize back into proper FileStore instances.
- Args:
d: A dictionary to convert to a Scenario.
- Returns:
A new Scenario containing the provided dictionary data.
- Examples:
>>> Scenario.from_dict({"food": "wood chips"}) Scenario({'food': 'wood chips'})
>>> # Example with a serialized FileStore >>> from edsl import FileStore >>> file_dict = {"path": "example.txt", "base64_string": "SGVsbG8gV29ybGQ="} >>> s = Scenario.from_dict({"document": file_dict}) >>> isinstance(s["document"], FileStore) True
- Notes:
Any dictionary values that match the FileStore format will be converted to FileStore objects
The method detects FileStore objects by looking for “base64_string” and “path” keys
EDSL version information is automatically removed by the @remove_edsl_version decorator
This method is commonly used when deserializing scenarios from JSON or other formats
- classmethod from_docx(docx_path: str) Scenario [source]
Creates a Scenario containing text extracted from a Microsoft Word document.
This method extracts text and structure from a DOCX file and creates a Scenario containing this information. It uses the DocxScenario class to handle the extraction process and maintain document structure where possible.
- Args:
docx_path: Path to the DOCX file to extract content from.
- Returns:
A Scenario containing the file path and extracted text from the DOCX file.
- Raises:
FileNotFoundError: If the specified DOCX file does not exist. ImportError: If the python-docx library is not installed.
- Examples:
>>> from docx import Document >>> doc = Document() >>> _ = doc.add_heading("EDSL Survey") >>> _ = doc.add_paragraph("This is a test.") >>> doc.save("test.docx") >>> s = Scenario.from_docx("test.docx") >>> s Scenario({'file_path': 'test.docx', 'text': 'EDSL Survey\nThis is a test.'}) >>> import os; os.remove("test.docx")
- Notes:
The returned Scenario typically contains the file path and extracted text
The extraction process attempts to maintain document structure
Requires the python-docx library to be installed
- classmethod from_file(file_path: str, field_name: str) Scenario [source]
Creates a Scenario containing a FileStore object from a file.
This method creates a Scenario with a single key-value pair where the value is a FileStore object that encapsulates the specified file. The FileStore handles appropriate file loading, encoding, and extraction based on the file type.
- Args:
file_path: Path to the file to be incorporated into the Scenario. field_name: Key name to use for storing the FileStore in the Scenario.
- Returns:
A Scenario containing a FileStore object linked to the specified file.
- Raises:
FileNotFoundError: If the specified file does not exist.
- Examples:
>>> import tempfile >>> with tempfile.NamedTemporaryFile(suffix=".txt", mode="w") as f: ... _ = f.write("This is a test.") ... _ = f.flush() ... s = Scenario.from_file(f.name, "file") >>> s Scenario({'file': FileStore(path='...', ...)})
- Notes:
The FileStore object handles various file formats differently
FileStore provides methods to access file content, extract text, and manage file operations appropriate to the file type
- classmethod from_html(url: str, field_name: str | None = None) Scenario [source]
Creates a Scenario containing both HTML content and extracted text from a URL.
This method fetches HTML content from a URL, extracts readable text from it, and creates a Scenario containing the original URL, the raw HTML, and the extracted text. Unlike from_url, this method preserves the raw HTML content.
- Args:
url: URL to fetch HTML content from. field_name: Key name to use for the extracted text in the Scenario.
If not provided, defaults to “text”.
- Returns:
A Scenario containing the URL, raw HTML, and extracted text.
- Raises:
requests.exceptions.RequestException: If the URL cannot be accessed.
- Examples:
Create a scenario from HTML content (requires network access):
s = Scenario.from_html(”https://example.com”) # Returns a Scenario with “url”, “html”, and “text” fields
s = Scenario.from_html(”https://example.com”, field_name=”content”) # Returns a Scenario with “url”, “html”, and “content” fields
- Notes:
Uses BeautifulSoup for HTML parsing when available
Stores both the raw HTML and the extracted text
Provides a more comprehensive representation than from_url
Useful when the HTML structure or specific elements are needed
- classmethod from_image(image_path: str, image_name: str | None = None) Scenario [source]
Creates a Scenario containing an image file as a FileStore object.
This method creates a Scenario with a single key-value pair where the value is a FileStore object that encapsulates the specified image file. The image is stored as a base64-encoded string, allowing it to be easily serialized and transmitted.
- Args:
image_path: Path to the image file to be incorporated into the Scenario. image_name: Key name to use for storing the FileStore in the Scenario.
If not provided, uses the filename without extension.
- Returns:
A Scenario containing a FileStore object with the image data.
- Raises:
FileNotFoundError: If the specified image file does not exist.
- Examples:
>>> import os >>> # Assuming an image file exists >>> if os.path.exists("image.jpg"): ... s = Scenario.from_image("image.jpg") ... s_named = Scenario.from_image("image.jpg", "picture")
- Notes:
The resulting FileStore can be displayed in notebooks or used in questions
Supported image formats include JPG, PNG, GIF, etc.
The image is stored as a base64-encoded string for portability
- classmethod from_pdf(pdf_path: str) Scenario [source]
Creates a Scenario containing text extracted from a PDF file.
This method extracts text and metadata from a PDF file and creates a Scenario containing this information. It uses the PdfExtractor class which provides access to text content, metadata, and structure from PDF files.
- Args:
pdf_path: Path to the PDF file to extract content from.
- Returns:
A Scenario containing extracted text and metadata from the PDF.
- Raises:
FileNotFoundError: If the specified PDF file does not exist. ImportError: If the required PDF extraction libraries are not installed.
- Examples:
>>> import os >>> # Assuming a PDF file exists >>> if os.path.exists("document.pdf"): ... s = Scenario.from_pdf("document.pdf")
- Notes:
The returned Scenario contains various keys with PDF content and metadata
PDF extraction requires the PyMuPDF library
The extraction process parses the PDF to maintain structure where possible
- classmethod from_pdf_to_image(pdf_path: str, image_format: str = 'jpeg') Scenario [source]
Converts each page of a PDF into an image and creates a Scenario containing them.
This method takes a PDF file, converts each page to an image in the specified format, and creates a Scenario containing the original file path and FileStore objects for each page image. This is particularly useful for visualizing PDF content or for image-based processing of PDF documents.
- Args:
pdf_path: Path to the PDF file to convert to images. image_format: Format of the output images (default is ‘jpeg’).
Other formats include ‘png’, ‘tiff’, etc.
- Returns:
A Scenario containing the original PDF file path and FileStore objects for each page image, with keys like “page_0”, “page_1”, etc.
- Raises:
FileNotFoundError: If the specified PDF file does not exist. ImportError: If pdf2image is not installed.
- Examples:
>>> import os >>> # Assuming a PDF file exists >>> if os.path.exists("document.pdf"): ... s = Scenario.from_pdf_to_image("document.pdf") ... s_png = Scenario.from_pdf_to_image("document.pdf", "png")
- Notes:
Requires the pdf2image library which depends on poppler
Creates a separate image for each page of the PDF
Images are stored in FileStore objects for easy display and handling
Images are created in a temporary directory which is automatically cleaned up
- classmethod from_url(url: str, field_name: str | None = 'text', testing: bool = False) Scenario [source]
Creates a Scenario from the content of a URL.
This method fetches content from a web URL and creates a Scenario containing the URL and the extracted text. When available, BeautifulSoup is used for better HTML parsing and text extraction, otherwise a basic requests approach is used.
- Args:
url: The URL to fetch content from. field_name: The key name to use for storing the extracted text in the Scenario.
Defaults to “text”.
- testing: If True, uses a simplified requests method instead of BeautifulSoup.
This is primarily for testing purposes.
- Returns:
A Scenario containing the URL and extracted text.
- Raises:
requests.exceptions.RequestException: If the URL cannot be accessed.
- Examples:
Create a scenario from a URL (requires network access):
s = Scenario.from_url(”https://example.com”, testing=True) # Returns a Scenario with “url” and “text” fields
s = Scenario.from_url(”https://example.com”, field_name=”content”, testing=True) # Returns a Scenario with “url”, “html”, and “content” fields
- Notes:
The method attempts to use BeautifulSoup and fake_useragent for better HTML parsing and to mimic a real browser.
If these packages are not available, it falls back to basic requests.
When using BeautifulSoup, it extracts text from paragraph and heading tags.
- get_filestore_info() Dict[str, Any] [source]
Returns information about FileStore objects present in this Scenario.
This method is useful for determining how many signed URLs need to be generated and what file extensions/types are present before calling save_to_gcs_bucket().
- Returns:
- dict: Information about FileStore objects containing:
total_count: Total number of FileStore objects
filestore_keys: List of scenario keys that contain FileStore objects
file_extensions: Dictionary mapping keys to file extensions
file_types: Dictionary mapping keys to MIME types
is_filestore_scenario: Boolean indicating if this Scenario was created from a FileStore
summary: Human-readable summary of files
- property has_jinja_braces: bool[source]
Return whether the scenario has jinja braces. This matters for rendering.
>>> s = Scenario({"food": "I love {{wood chips}}"}) >>> s.has_jinja_braces True
- keep(*args: str | Iterable[str]) Scenario [source]
Keep a subset of keys from a scenario (alias for select).
This method delegates to ScenarioSelector for the actual selection logic. It is functionally identical to select() but provides more intuitive naming.
- Args:
- *args: Either a single collection of keys (for backward compatibility)
or individual string arguments for keys to keep.
- Returns:
A new Scenario containing only the kept keys and their values.
- Raises:
KeyError: If any of the specified keys don’t exist in the scenario. ValueError: If no arguments are provided.
- Examples:
Using a list (backward compatible): >>> s = Scenario({“food”: “wood chips”, “drink”: “water”}) >>> s.keep([“food”]) Scenario({‘food’: ‘wood chips’})
Using individual string arguments: >>> s = Scenario({“food”: “wood chips”, “drink”: “water”, “dessert”: “cookies”}) >>> s.keep(“food”, “drink”) Scenario({‘food’: ‘wood chips’, ‘drink’: ‘water’})
- new_column_names(new_names: List[str]) Scenario [source]
Rename all keys of a scenario using a list of new names.
- Args:
- new_names: A list of new key names. Must have the same length as the
number of keys in the scenario.
- Returns:
A new Scenario with keys renamed according to the provided list.
- Raises:
ValueError: If the length of new_names doesn’t match the number of keys.
- Examples:
>>> s = Scenario({"food": "wood chips"}) >>> s.new_column_names(["food_preference"]) Scenario({'food_preference': 'wood chips'})
- offload(inplace: bool = False) Scenario [source]
Offload base64-encoded content from the scenario by replacing ‘base64_string’ fields with ‘offloaded’. This reduces memory usage.
This method delegates to ScenarioOffloader for the actual offloading logic. It handles three types of base64 content: 1. Direct base64_string in the scenario (from FileStore.to_dict()) 2. FileStore objects containing base64 content 3. Dictionary values containing base64_string fields
- Args:
inplace: If True, modify the current scenario. If False, return a new one.
- Returns:
The modified scenario (either self or a new instance).
- Examples:
Basic offloading: >>> s = Scenario({“base64_string”: “SGVsbG8gV29ybGQ=”, “name”: “test”}) >>> offloaded = s.offload() >>> offloaded[“base64_string”] ‘offloaded’ >>> offloaded[“name”] ‘test’
In-place offloading: >>> s = Scenario({“base64_string”: “SGVsbG8gV29ybGQ=”, “name”: “test”}) >>> result = s.offload(inplace=True) >>> result is s True >>> s[“base64_string”] ‘offloaded’
- open_url(position: int = 0) None [source]
Open a URL field from the scenario in the default web browser.
- Args:
position: The index of the URL to open (0-based). Defaults to 0 for the first URL.
- Raises:
- ValueError: If no URL fields are found in the scenario, or if the position
is out of range.
- Examples:
>>> s = Scenario({"website": "https://example.com", "name": "test"}) >>> s.open_url() # Opens the first URL found
- rename(old_name_or_replacement_dict: str | Dict[str, str], new_name: str | None = None) Scenario [source]
Rename the keys of a scenario.
- Args:
- old_name_or_replacement_dict: Either a dictionary mapping old keys to new keys,
or a string representing the old key name.
- new_name: The new name for the key. Required if old_name_or_replacement_dict
is a string, ignored if it’s a dictionary.
- Returns:
A new Scenario with renamed keys.
- Raises:
TypeError: If old_name_or_replacement_dict is a string but new_name is None.
- Examples:
Using a dictionary: >>> s = Scenario({“food”: “wood chips”}) >>> s.rename({“food”: “food_preference”}) Scenario({‘food_preference’: ‘wood chips’})
Using individual arguments: >>> s = Scenario({“food”: “wood chips”}) >>> s.rename(“food”, “snack”) Scenario({‘snack’: ‘wood chips’})
- replicate(n: int) ScenarioList [source]
Replicate a scenario n times to return a ScenarioList.
- Args:
n: The number of times to replicate the scenario. Must be non-negative.
- Returns:
A ScenarioList containing n copies of this scenario.
- Raises:
ValueError: If n is negative.
- Examples:
>>> s = Scenario({"food": "wood chips"}) >>> s.replicate(2) ScenarioList([Scenario({'food': 'wood chips'}), Scenario({'food': 'wood chips'})])
- save_to_gcs_bucket(signed_url_or_dict: str | Dict[str, str]) Dict[str, Any] [source]
Saves FileStore objects contained within this Scenario to a Google Cloud Storage bucket.
This method finds all FileStore objects in the Scenario and uploads them to GCS using the provided signed URL(s). If the Scenario itself was created from a FileStore (has base64_string as a top-level key), it uploads that content directly.
- Args:
- signed_url_or_dict: Either:
str: Single signed URL (for single FileStore or Scenario from FileStore)
- dict: Mapping of scenario keys to signed URLs for multiple FileStore objects
e.g., {“video”: “signed_url_1”, “image”: “signed_url_2”}
- Returns:
dict: Summary of upload operations performed
- Raises:
ValueError: If no uploadable content found or content is offloaded requests.RequestException: If any upload fails
- select(*args: str | Iterable[str]) Scenario [source]
Select a subset of keys from a scenario.
This method delegates to ScenarioSelector for the actual selection logic. It supports both individual string arguments and collection arguments for backward compatibility.
- Args:
- *args: Either a single collection of keys (for backward compatibility)
or individual string arguments for keys to select.
- Returns:
A new Scenario containing only the selected keys and their values.
- Raises:
KeyError: If any of the specified keys don’t exist in the scenario. ValueError: If no arguments are provided.
- Examples:
Using a list (backward compatible): >>> s = Scenario({“food”: “wood chips”, “drink”: “water”}) >>> s.select([“food”]) Scenario({‘food’: ‘wood chips’})
Using individual string arguments: >>> s = Scenario({“food”: “wood chips”, “drink”: “water”, “dessert”: “cookies”}) >>> s.select(“food”, “drink”) Scenario({‘food’: ‘wood chips’, ‘drink’: ‘water’})
Single string argument: >>> s.select(“food”) Scenario({‘food’: ‘wood chips’})
- table(tablefmt: str = 'grid') str [source]
Display a scenario as a formatted table.
- Args:
- tablefmt: The table format to use. Common options include “grid”,
“simple”, “pipe”, “orgtbl”, “rst”, “mediawiki”, “html”, “latex”.
- Returns:
A string representation of the scenario formatted as a table.
- Examples:
>>> s = Scenario({"food": "chips", "drink": "water"}) >>> print(s.table("simple")) key value ----- ------- food chips drink water
- to(question_or_survey: 'Question' | 'Survey') Jobs [source]
Send the scenario to a question or survey for execution.
- Args:
question_or_survey: A Question or Survey object to parameterize with this scenario.
- Returns:
A Jobs object that can be run to execute the question or survey with this scenario.
- Examples:
>>> from edsl.questions import QuestionFreeText >>> s = Scenario({"name": "Alice"}) >>> q = QuestionFreeText(question_name="greeting", question_text="Hello {{name}}") >>> jobs = s.to(q)
- to_dataset() Dataset [source]
Convert a scenario to a dataset.
>>> s = Scenario({"food": "wood chips"}) >>> s.to_dataset() Dataset([{'key': ['food']}, {'value': ['wood chips']}])
- to_dict(add_edsl_version: bool = True, offload_base64: bool = False) Dict[str, Any] [source]
Convert a scenario to a dictionary.
- Args:
add_edsl_version: If True, adds the EDSL version to the returned dictionary. offload_base64: If True, replaces any base64_string fields with ‘offloaded’
to reduce memory usage.
Example:
>>> s = Scenario({"food": "wood chips"}) >>> s.to_dict() {'food': 'wood chips', 'edsl_version': '...', 'edsl_class_name': 'Scenario'}
>>> s.to_dict(add_edsl_version = False) {'food': 'wood chips'}
ScenarioList class
- class edsl.scenarios.ScenarioList(data: list | None = None, codebook: dict[str, str] | None = None, data_class: type | None = <class 'list'>)[source]
Bases:
MutableSequence
,Base
,ScenarioListOperationsMixin
A collection of Scenario objects with advanced operations for manipulation and analysis.
ScenarioList provides specialized functionality for working with collections of Scenario objects. It inherits from MutableSequence to provide standard list operations, from Base to integrate with EDSL’s object model, and from ScenarioListOperationsMixin to provide powerful data manipulation capabilities.
- Attributes:
data (list): The underlying list containing Scenario objects. codebook (dict): Optional metadata describing the fields in the scenarios.
- __init__(data: list | None = None, codebook: dict[str, str] | None = None, data_class: type | None = <class 'list'>)[source]
Initialize a new ScenarioList with optional data and codebook.
- add_list(name: str, values: List[Any]) ScenarioList [source]
Add a list of values to a ScenarioList.
Example:
>>> s = ScenarioList([Scenario({'name': 'Alice'}), Scenario({'name': 'Bob'})]) >>> s.add_list('age', [30, 25]) ScenarioList([Scenario({'name': 'Alice', 'age': 30}), Scenario({'name': 'Bob', 'age': 25})])
- add_value(name: str, value: Any) ScenarioList [source]
Add a value to all scenarios in a ScenarioList.
Example:
>>> s = ScenarioList([Scenario({'name': 'Alice'}), Scenario({'name': 'Bob'})]) >>> s.add_value('age', 30) ScenarioList([Scenario({'name': 'Alice', 'age': 30}), Scenario({'name': 'Bob', 'age': 30})])
- apply(func: Callable, field: str, new_name: str | None, replace: bool = False) ScenarioList [source]
Apply a function to a field and return a new ScenarioList.
- at(index: int) Scenario [source]
Get the scenario at the specified index position. >>> sl = ScenarioList.from_list(“a”, [1, 2, 3]) >>> sl.at(0) Scenario({‘a’: 1}) >>> sl.at(-1) Scenario({‘a’: 3})
- augment_with_wikipedia(search_key: str, content_only: bool = True, key_name: str = 'wikipedia_content') ScenarioList [source]
Augment the ScenarioList with Wikipedia content.
- chunk(field, num_words: int | None = None, num_lines: int | None = None, include_original=False, hash_original=False) ScenarioList [source]
Chunk the scenarios based on a field.
Example:
>>> s = ScenarioList([Scenario({'text': 'The quick brown fox jumps over the lazy dog.'})]) >>> s.chunk('text', num_words=3) ScenarioList([Scenario({'text': 'The quick brown', 'text_chunk': 0, 'text_char_count': 15, 'text_word_count': 3}), Scenario({'text': 'fox jumps over', 'text_chunk': 1, 'text_char_count': 14, 'text_word_count': 3}), Scenario({'text': 'the lazy dog.', 'text_chunk': 2, 'text_char_count': 13, 'text_word_count': 3})])
- clipboard_data() str [source]
Return TSV representation of this object for clipboard operations.
This method is called by the clipboard() method in the base class to provide a custom format for copying objects to the system clipboard.
- Returns:
str: Tab-separated values representation of the object
- collapse(field: str, separator: str | None = None, prefix: str = '', postfix: str = '', add_count: bool = False) ScenarioList [source]
Collapse a ScenarioList by grouping on all fields except the specified one, collecting the values of the specified field into a list.
- Args:
field: The field to collapse (whose values will be collected into lists) separator: Optional string to join the values with instead of keeping as a list prefix: String to prepend to each value before joining (only used with separator) postfix: String to append to each value before joining (only used with separator) add_count: If True, adds a field showing the number of collapsed rows
- Returns:
ScenarioList: A new ScenarioList with the specified field collapsed into lists
Example: >>> s = ScenarioList([ … Scenario({‘category’: ‘fruit’, ‘color’: ‘red’, ‘item’: ‘apple’}), … Scenario({‘category’: ‘fruit’, ‘color’: ‘red’, ‘item’: ‘cherry’}), … Scenario({‘category’: ‘vegetable’, ‘color’: ‘green’, ‘item’: ‘spinach’}) … ]) >>> s.collapse(‘item’, add_count=True) ScenarioList([Scenario({‘category’: ‘fruit’, ‘color’: ‘red’, ‘item’: [‘apple’, ‘cherry’], ‘num_collapsed_rows’: 2}), Scenario({‘category’: ‘vegetable’, ‘color’: ‘green’, ‘item’: [‘spinach’], ‘num_collapsed_rows’: 1})]) >>> s.collapse(‘item’, separator=’; ‘, prefix=’<example>’, postfix=’</example>’) ScenarioList([Scenario({‘category’: ‘fruit’, ‘color’: ‘red’, ‘item’: ‘<example>apple</example>; <example>cherry</example>’}), Scenario({‘category’: ‘vegetable’, ‘color’: ‘green’, ‘item’: ‘<example>spinach</example>’})])
- concatenate(fields: List[str], separator: str = ';', prefix: str = '', postfix: str = '', new_field_name: str | None = None) ScenarioList [source]
Concatenate specified fields into a single string field.
- Parameters:
fields – The fields to concatenate.
separator – The separator to use.
prefix – String to prepend to each value before concatenation.
postfix – String to append to each value before concatenation.
new_field_name – Optional custom name for the concatenated field.
- Returns:
ScenarioList: A new ScenarioList with concatenated fields.
- Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 2, 'c': 3}), Scenario({'a': 4, 'b': 5, 'c': 6})]) >>> s.concatenate(['a', 'b', 'c']) ScenarioList([Scenario({'concat_a_b_c': '1;2;3'}), Scenario({'concat_a_b_c': '4;5;6'})]) >>> s.concatenate(['a', 'b', 'c'], new_field_name='combined') ScenarioList([Scenario({'combined': '1;2;3'}), Scenario({'combined': '4;5;6'})]) >>> s.concatenate(['a', 'b', 'c'], prefix='[', postfix=']') ScenarioList([Scenario({'concat_a_b_c': '[1];[2];[3]'}), Scenario({'concat_a_b_c': '[4];[5];[6]'})])
- concatenate_to_list(fields: List[str], prefix: str = '', postfix: str = '', new_field_name: str | None = None) ScenarioList [source]
Concatenate specified fields into a single list field.
- Parameters:
fields – The fields to concatenate.
prefix – String to prepend to each value before concatenation.
postfix – String to append to each value before concatenation.
new_field_name – Optional custom name for the concatenated field.
- Returns:
ScenarioList: A new ScenarioList with fields concatenated into a list.
- Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 2, 'c': 3}), Scenario({'a': 4, 'b': 5, 'c': 6})]) >>> s.concatenate_to_list(['a', 'b', 'c']) ScenarioList([Scenario({'concat_a_b_c': [1, 2, 3]}), Scenario({'concat_a_b_c': [4, 5, 6]})]) >>> s.concatenate_to_list(['a', 'b', 'c'], new_field_name='values') ScenarioList([Scenario({'values': [1, 2, 3]}), Scenario({'values': [4, 5, 6]})]) >>> s.concatenate_to_list(['a', 'b', 'c'], prefix='[', postfix=']') ScenarioList([Scenario({'concat_a_b_c': ['[1]', '[2]', '[3]']}), Scenario({'concat_a_b_c': ['[4]', '[5]', '[6]']})])
- concatenate_to_set(fields: List[str], prefix: str = '', postfix: str = '', new_field_name: str | None = None) ScenarioList [source]
Concatenate specified fields into a single set field.
- Parameters:
fields – The fields to concatenate.
prefix – String to prepend to each value before concatenation.
postfix – String to append to each value before concatenation.
new_field_name – Optional custom name for the concatenated field.
- Returns:
ScenarioList: A new ScenarioList with fields concatenated into a set.
- Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 2, 'c': 3}), Scenario({'a': 4, 'b': 5, 'c': 6})]) >>> result = s.concatenate_to_set(['a', 'b', 'c']) >>> result[0]['concat_a_b_c'] == {1, 2, 3} True >>> result[1]['concat_a_b_c'] == {4, 5, 6} True >>> result = s.concatenate_to_set(['a', 'b', 'c'], new_field_name='unique_values') >>> result[0]['unique_values'] == {1, 2, 3} True >>> result = s.concatenate_to_set(['a', 'b', 'c'], prefix='[', postfix=']') >>> result[0]['concat_a_b_c'] == {'[1]', '[2]', '[3]'} True >>> result[1]['concat_a_b_c'] == {'[4]', '[5]', '[6]'} True
- copy()[source]
Create a copy of this ScenarioList.
- Returns:
A new ScenarioList with copies of the same scenarios
- create_comparisons(bidirectional: bool = False, num_options: int = 2, option_prefix: str = 'option_', use_alphabet: bool = False) ScenarioList [source]
Create a new ScenarioList with comparisons between scenarios.
Each scenario in the result contains multiple original scenarios as dictionaries, allowing for side-by-side comparison.
- Args:
- bidirectional (bool): If True, include both (A,B) and (B,A) comparisons.
If False, only include (A,B) where A comes before B in the original list.
- num_options (int): Number of scenarios to include in each comparison.
Default is 2 for pairwise comparisons.
- option_prefix (str): Prefix for the keys in the resulting scenarios.
Default is “option_”, resulting in keys like “option_1”, “option_2”, etc. Ignored if use_alphabet is True.
- use_alphabet (bool): If True, use letters as keys (A, B, C, etc.) instead of
the option_prefix with numbers.
- Returns:
- ScenarioList: A new ScenarioList where each scenario contains multiple original
scenarios as dictionaries.
- Example:
>>> s = ScenarioList([ ... Scenario({'id': 1, 'text': 'Option A'}), ... Scenario({'id': 2, 'text': 'Option B'}), ... Scenario({'id': 3, 'text': 'Option C'}) ... ]) >>> s.create_comparisons(use_alphabet=True) ScenarioList([Scenario({'A': {'id': 1, 'text': 'Option A'}, 'B': {'id': 2, 'text': 'Option B'}}), Scenario({'A': {'id': 1, 'text': 'Option A'}, 'B': {'id': 3, 'text': 'Option C'}}), Scenario({'A': {'id': 2, 'text': 'Option B'}, 'B': {'id': 3, 'text': 'Option C'}})]) >>> s.create_comparisons(num_options=3, use_alphabet=True) ScenarioList([Scenario({'A': {'id': 1, 'text': 'Option A'}, 'B': {'id': 2, 'text': 'Option B'}, 'C': {'id': 3, 'text': 'Option C'}})])
- classmethod create_empty_scenario_list(n: int) ScenarioList [source]
Create an empty ScenarioList with n scenarios.
- Args:
n: The number of empty scenarios to create
Example:
>>> ScenarioList.create_empty_scenario_list(3) ScenarioList([Scenario({}), Scenario({}), Scenario({})])
- drop(*fields: str) ScenarioList [source]
Drop fields from the scenarios.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})]) >>> s.drop('a') ScenarioList([Scenario({'b': 1}), Scenario({'b': 2})])
- duplicate() ScenarioList [source]
Return a copy of the ScenarioList using streaming to avoid loading everything into memory.
>>> sl = ScenarioList.example() >>> sl_copy = sl.duplicate() >>> sl == sl_copy True >>> sl is sl_copy False
- classmethod example(randomize: bool = False) ScenarioList [source]
Return an example ScenarioList instance.
- Params randomize:
If True, use Scenario’s randomize method to randomize the values.
- expand(expand_field: str, number_field: bool = False) ScenarioList [source]
Expand the ScenarioList by a field.
- Parameters:
expand_field – The field to expand.
number_field – Whether to add a field with the index of the value
Example:
>>> s = ScenarioList( [ Scenario({'a':1, 'b':[1,2]}) ] ) >>> s.expand('b') ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})]) >>> s.expand('b', number_field=True) ScenarioList([Scenario({'a': 1, 'b': 1, 'b_number': 1}), Scenario({'a': 1, 'b': 2, 'b_number': 2})])
- fillna(value: Any = '', inplace: bool = False) ScenarioList [source]
Fill None/NaN values in all scenarios with a specified value.
This method is equivalent to pandas’ df.fillna() functionality, allowing you to replace None, NaN, or other null-like values across all scenarios in the list.
- Args:
value: The value to use for filling None/NaN values. Defaults to empty string “”. inplace: If True, modify the original ScenarioList. If False (default),
return a new ScenarioList with filled values.
- Returns:
ScenarioList: A new ScenarioList with filled values, or self if inplace=True
- Examples:
>>> scenarios = ScenarioList([ ... Scenario({'a': None, 'b': 1, 'c': 'hello'}), ... Scenario({'a': 2, 'b': None, 'c': None}), ... Scenario({'a': None, 'b': 3, 'c': 'world'}) ... ]) >>> # Fill None values with empty string (default) >>> filled = scenarios.fillna() >>> print(filled) ScenarioList([Scenario({'a': '', 'b': 1, 'c': 'hello'}), Scenario({'a': 2, 'b': '', 'c': ''}), Scenario({'a': '', 'b': 3, 'c': 'world'})]) >>> # Fill with custom value >>> filled_custom = scenarios.fillna(value="N/A") >>> print(filled_custom) ScenarioList([Scenario({'a': 'N/A', 'b': 1, 'c': 'hello'}), Scenario({'a': 2, 'b': 'N/A', 'c': 'N/A'}), Scenario({'a': 'N/A', 'b': 3, 'c': 'world'})]) >>> # Original scenarios remain unchanged >>> print(scenarios) ScenarioList([Scenario({'a': None, 'b': 1, 'c': 'hello'}), Scenario({'a': 2, 'b': None, 'c': None}), Scenario({'a': None, 'b': 3, 'c': 'world'})]) >>> # Modify in place >>> _ = scenarios.fillna(value="MISSING", inplace=True) >>> print(scenarios) ScenarioList([Scenario({'a': 'MISSING', 'b': 1, 'c': 'hello'}), Scenario({'a': 2, 'b': 'MISSING', 'c': 'MISSING'}), Scenario({'a': 'MISSING', 'b': 3, 'c': 'world'})])
- filter(expression: str) ScenarioList [source]
Filter a list of scenarios based on an expression.
- Parameters:
expression – The expression to filter by.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})]) >>> s.filter("b == 2") ScenarioList([Scenario({'a': 1, 'b': 2})])
- flatten(field: str, keep_original: bool = False) Dataset [source]
Expand a field containing dictionaries into separate fields.
This method takes a field that contains a list of dictionaries and expands it into multiple fields, one for each key in the dictionaries. This is useful when working with nested data structures or results from extraction operations.
- Parameters:
field: The field containing dictionaries to flatten keep_original: Whether to retain the original field in the result
- Returns:
A new Dataset with the dictionary keys expanded into separate fields
- Notes:
Each key in the dictionaries becomes a new field with name pattern “{field}.{key}”
All dictionaries in the field must have compatible structures
If a dictionary is missing a key, the corresponding value will be None
Non-dictionary values in the field will cause a warning
- Examples:
>>> from edsl.dataset import Dataset
# Basic flattening of nested dictionaries >>> Dataset([{‘a’: [{‘a’: 1, ‘b’: 2}]}, {‘c’: [5]}]).flatten(‘a’) Dataset([{‘c’: [5]}, {‘a.a’: [1]}, {‘a.b’: [2]}])
# Works with prefixed fields too >>> Dataset([{‘answer.example’: [{‘a’: 1, ‘b’: 2}]}, {‘c’: [5]}]).flatten(‘answer.example’) Dataset([{‘c’: [5]}, {‘answer.example.a’: [1]}, {‘answer.example.b’: [2]}])
# Keep the original field if needed >>> d = Dataset([{‘a’: [{‘a’: 1, ‘b’: 2}]}, {‘c’: [5]}]) >>> d.flatten(‘a’, keep_original=True) Dataset([{‘a’: [{‘a’: 1, ‘b’: 2}]}, {‘c’: [5]}, {‘a.a’: [1]}, {‘a.b’: [2]}])
# Can also use unambiguous unprefixed field name >>> result = Dataset([{‘answer.pros_cons’: [{‘pros’: [‘Safety’], ‘cons’: [‘Cost’]}]}]).flatten(‘pros_cons’) >>> sorted(result.keys()) == [‘answer.pros_cons.cons’, ‘answer.pros_cons.pros’] True >>> sorted(result.to_dicts()[0].items()) == sorted({‘cons’: [‘Cost’], ‘pros’: [‘Safety’]}.items()) True
- for_n(target: 'Question' | 'Survey' | 'Job', iterations: int) Jobs [source]
Execute a target multiple times, feeding each iteration’s output into the next.
Parameters
- targetQuestion | Survey | Job
The object to be executed on each round. A fresh
duplicate()
of target is taken for every iteration so that state is not shared between runs.- iterationsint
How many times to run target.
Returns
- Jobs
A
Jobs
instance containing the results of the final iteration.
Example (non-doctest):
from edsl import ScenarioList, QuestionFreeText base_personas = ScenarioList.from_list( "persona", [ "- Likes basketball", "- From Germany", "- Once owned a sawmill", ], ) persona_detail_jobs = ( QuestionFreeText( question_text=( "Take this persona: {{ scenario.persona }} and add one additional detail, " "preserving the original details." ), question_name="enhance", ) .to_jobs() .select("enhance") .to_scenario_list() .rename({"enhance": "persona"}) ) # Run the enrichment five times enriched_personas = base_personas.for_n(persona_detail_jobs, 5) print(enriched_personas.select("persona"))
- classmethod from_csv(source: str | 'ParseResult', has_header: bool = True, encoding: str = 'utf-8', **kwargs) ScenarioList [source]
Create a ScenarioList from a CSV file or URL.
- Args:
source: Path to a local file or URL to a remote file. has_header: Whether the file has a header row (default is True). encoding: The file encoding to use (default is ‘utf-8’). **kwargs: Additional parameters for csv reader.
- Returns:
ScenarioList: An instance of the ScenarioList class.
- classmethod from_delimited_file(source: str | 'ParseResult', delimiter: str = ',', encoding: str = 'utf-8', **kwargs) ScenarioList [source]
Create a ScenarioList from a delimited file (CSV/TSV) or URL.
- Args:
source: Path to a local file or URL to a remote file. delimiter: The delimiter character used in the file (default is ‘,’). encoding: The file encoding to use (default is ‘utf-8’). **kwargs: Additional parameters for csv reader.
- Returns:
ScenarioList: An instance of the ScenarioList class.
- classmethod from_dict(data: dict) ScenarioList [source]
Create a ScenarioList from a dictionary.
>>> d = {'scenarios': [{'food': 'wood chips'}], 'codebook': {'food': 'description'}} >>> s = ScenarioList.from_dict(d) >>> s.codebook == {'food': 'description'} True >>> s[0]['food'] 'wood chips'
- classmethod from_directory(path: str | None = None, recursive: bool = False, key_name: str = 'content') ScenarioList [source]
Create a ScenarioList of Scenario objects from files in a directory.
This method scans a directory and creates a Scenario object for each file found, where each Scenario contains a FileStore object under the specified key. Optionally filters files based on a wildcard pattern. If no path is provided, the current working directory is used.
- Args:
- path: The directory path to scan, optionally including a wildcard pattern.
If None, uses the current working directory. Examples: - “/path/to/directory” - scans all files in the directory - “/path/to/directory/.py” - scans only Python files in the directory - “.txt” - scans only text files in the current working directory
recursive: Whether to scan subdirectories recursively. Defaults to False. key_name: The key to use for the FileStore object in each Scenario. Defaults to “content”.
- Returns:
A ScenarioList containing Scenario objects for all matching files, where each Scenario has a FileStore object under the specified key.
- Raises:
FileNotFoundError: If the specified directory does not exist.
- Examples:
# Get all files in the current directory with default key “content” sl = ScenarioList.from_directory()
# Get all Python files in a specific directory with custom key “python_file” sl = ScenarioList.from_directory(’*.py’, key_name=”python_file”)
# Get all image files in the current directory sl = ScenarioList.from_directory(’*.png’, key_name=”image”)
# Get all files recursively including subdirectories sl = ScenarioList.from_directory(recursive=True, key_name=”document”)
- classmethod from_dta(filepath: str, include_metadata: bool = True) ScenarioList [source]
Create a ScenarioList from a Stata file.
- Args:
filepath (str): Path to the Stata (.dta) file include_metadata (bool): If True, extract and preserve variable labels and value labels
as additional metadata in the ScenarioList
- Returns:
ScenarioList: A ScenarioList containing the data from the Stata file
- classmethod from_excel(filename: str, sheet_name: str | None = None, skip_rows: List[int] | None = None, use_codebook: bool = False, **kwargs) ScenarioList [source]
Create a ScenarioList from an Excel file.
If the Excel file contains multiple sheets and no sheet_name is provided, the method will print the available sheets and require the user to specify one.
- Args:
filename (str): Path to the Excel file sheet_name (Optional[str]): Name of the sheet to load. If None and multiple sheets exist,
will raise an error listing available sheets.
skip_rows (Optional[List[int]]): List of row indices to skip (0-based). If None, all rows are included. use_codebook (bool): If True, rename columns to standard format and store original names in codebook. **kwargs: Additional parameters to pass to pandas.read_excel.
Example:
>>> import tempfile >>> import os >>> import pandas as pd >>> with tempfile.NamedTemporaryFile(delete=False, suffix='.xlsx') as f: ... df1 = pd.DataFrame({ ... 'name': ['Alice', 'Bob', 'Charlie'], ... 'age': [30, 25, 35], ... 'location': ['New York', 'Los Angeles', 'Chicago'] ... }) ... df2 = pd.DataFrame({ ... 'name': ['David', 'Eve'], ... 'age': [40, 45], ... 'location': ['Boston', 'Seattle'] ... }) ... with pd.ExcelWriter(f.name) as writer: ... df1.to_excel(writer, sheet_name='Sheet1', index=False) ... df2.to_excel(writer, sheet_name='Sheet2', index=False) ... temp_filename = f.name >>> # Load all rows >>> from edsl.scenarios.scenario_source import ScenarioSource >>> scenario_list = ScenarioSource.from_source('excel', temp_filename, sheet_name='Sheet1') >>> len(scenario_list) 3 >>> # Skip the second row (index 1) >>> scenario_list = ScenarioSource.from_source('excel', temp_filename, sheet_name='Sheet1', skip_rows=[1]) >>> len(scenario_list) 2 >>> scenario_list[0]['name'] 'Alice' >>> scenario_list[1]['name'] 'Charlie'
- classmethod from_google_doc(url: str) ScenarioList [source]
Create a ScenarioList from a Google Doc.
This method downloads the Google Doc as a Word file (.docx), saves it to a temporary file, and then reads it using the from_docx class method.
- Args:
url (str): The URL to the Google Doc.
- Returns:
ScenarioList: An instance of the ScenarioList class.
- classmethod from_google_sheet(url: str, sheet_name: str = None, column_names: List[str] | None = None, **kwargs) ScenarioList [source]
Create a ScenarioList from a Google Sheet.
This method downloads the Google Sheet as an Excel file, saves it to a temporary file, and then reads it using the from_excel class method.
- Args:
url (str): The URL to the Google Sheet. sheet_name (str, optional): The name of the sheet to load. If None, the method will behave
the same as from_excel regarding multiple sheets.
- column_names (List[str], optional): If provided, use these names for the columns instead
of the default column names from the sheet.
**kwargs: Additional parameters to pass to pandas.read_excel.
- Returns:
ScenarioList: An instance of the ScenarioList class.
- classmethod from_latex(tex_file_path: str, table_index: int = 0, has_header: bool = True)[source]
Create a ScenarioList from a LaTeX file.
- Args:
tex_file_path: The path to the LaTeX file. table_index: The index of the table to extract (if multiple tables exist).
Default is 0 (first table).
has_header: Whether the table has a header row. Default is True.
- Returns:
ScenarioList: A new ScenarioList containing the data from the LaTeX table.
- classmethod from_list(field_name: str, values: list, use_indexes: bool = False) ScenarioList [source]
Create a ScenarioList from a list of values with a specified field name.
>>> ScenarioList.from_list('text', ['a', 'b', 'c']) ScenarioList([Scenario({'text': 'a'}), Scenario({'text': 'b'}), Scenario({'text': 'c'})])
- classmethod from_list_of_tuples(field_names: list[str], values: list[tuple], use_indexes: bool = False) ScenarioList [source]
Create a ScenarioList from a list of tuples with specified field names.
- Args:
field_names: A list of field names for the tuples values: A list of tuples with values matching the field_names use_indexes: Whether to add an index field to each scenario
- Returns:
A ScenarioList containing the data from the tuples
- classmethod from_nested_dict(data: dict) ScenarioList [source]
Create a ScenarioList from a nested dictionary.
>>> data = {"headline": ["Armistice Signed, War Over: Celebrations Erupt Across City"], "date": ["1918-11-11"], "author": ["Jane Smith"]} >>> ScenarioList.from_nested_dict(data) ScenarioList([Scenario({'headline': 'Armistice Signed, War Over: Celebrations Erupt Across City', 'date': '1918-11-11', 'author': 'Jane Smith'})])
- classmethod from_pandas(df) ScenarioList [source]
Create a ScenarioList from a pandas DataFrame.
Example:
>>> import pandas as pd >>> from edsl.scenarios.scenario_source import ScenarioSource >>> df = pd.DataFrame({'name': ['Alice', 'Bob'], 'age': [30, 25], 'location': ['New York', 'Los Angeles']}) >>> ScenarioSource.from_source('pandas', df) ScenarioList([Scenario({'name': 'Alice', 'age': 30, 'location': 'New York'}), Scenario({'name': 'Bob', 'age': 25, 'location': 'Los Angeles'})])
- classmethod from_parquet(filepath: str) ScenarioList [source]
Create a ScenarioList from a Parquet file.
- Args:
filepath (str): The path to the Parquet file.
- Returns:
ScenarioList: A new ScenarioList containing the scenarios from the Parquet file.
- classmethod from_pdf(filename_or_url, collapse_pages=False)[source]
Create a ScenarioList from a PDF file or URL.
- classmethod from_pdf_to_image(pdf_path, image_format='jpeg')[source]
Create a ScenarioList with images extracted from a PDF file.
- classmethod from_prompt(description: str, name: str | None = 'item', target_number: int = 10, verbose=False)[source]
- classmethod from_search_terms(search_terms: List[str]) ScenarioList [source]
Create a ScenarioList from a list of search terms, using Wikipedia.
- Args:
search_terms: A list of search terms.
- classmethod from_source(source_type: str, *args, **kwargs) ScenarioList [source]
Create a ScenarioList from a specified source type.
This method serves as the main entry point for creating ScenarioList objects, providing a unified interface for various data sources.
- Args:
- source_type: The type of source to create a ScenarioList from.
Valid values include: ‘urls’, ‘directory’, ‘csv’, ‘tsv’, ‘excel’, ‘pdf’, ‘pdf_to_image’, and others.
*args: Positional arguments to pass to the source-specific method. **kwargs: Keyword arguments to pass to the source-specific method.
- Returns:
A ScenarioList object created from the specified source.
- Examples:
>>> # This is a simplified example for doctest >>> # In real usage, you would provide a path to your CSV file: >>> # sl_csv = ScenarioList.from_source('csv', 'your_data.csv') >>> # Or use other source types like 'directory', 'excel', etc. >>> # Examples of other source types: >>> # sl_dir = ScenarioList.from_source('directory', '/path/to/files')
- classmethod from_sqlite(filepath: str, table: str | None = None, sql_query: str | None = None)[source]
Create a ScenarioList from a SQLite database.
- Args:
filepath (str): Path to the SQLite database file table (Optional[str]): Name of table to query. If None, sql_query must be provided. sql_query (Optional[str]): SQL query to execute. Used if table is None.
- Returns:
ScenarioList: List of scenarios created from database rows
- Raises:
ValueError: If both table and sql_query are None sqlite3.Error: If there is an error executing the database query
- classmethod from_tsv(source: str | 'ParseResult', has_header: bool = True, encoding: str = 'utf-8', **kwargs) ScenarioList [source]
Create a ScenarioList from a TSV file or URL.
- Args:
source: Path to a local file or URL to a remote file. has_header: Whether the file has a header row (default is True). encoding: The file encoding to use (default is ‘utf-8’). **kwargs: Additional parameters for csv reader.
- Returns:
ScenarioList: An instance of the ScenarioList class.
- classmethod from_urls(urls: list[str], field_name: str | None = 'text') ScenarioList [source]
- classmethod from_wikipedia(url: str, table_index: int = 0, header: bool = True)[source]
Extracts a table from a Wikipedia page.
- Parameters:
url (str): The URL of the Wikipedia page. table_index (int): The index of the table to extract (default is 0). header (bool): Whether the table has a header row (default is True).
- Returns:
ScenarioList: A ScenarioList containing data from the Wikipedia table.
- Example usage:
url = “https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)” scenarios = ScenarioList.from_wikipedia(url, 0)
- classmethod gen(scenario_dicts_list: List[dict]) ScenarioList [source]
Create a ScenarioList from a list of dictionaries.
Example:
>>> ScenarioList.gen([{'name': 'Alice'}, {'name': 'Bob'}]) ScenarioList([Scenario({'name': 'Alice'}), Scenario({'name': 'Bob'})])
- get_tabular_data(remove_prefix: bool = False, pretty_labels: dict | None = None) Tuple[List[str], List[List]] [source]
Internal method to get tabular data in a standard format.
- Args:
remove_prefix: Whether to remove the prefix from column names pretty_labels: Dictionary mapping original column names to pretty labels
- Returns:
Tuple containing (header_row, data_rows)
- ggplot2(ggplot_code: str, shape: str = 'wide', sql: str | None = None, remove_prefix: bool = True, debug: bool = False, height: float = 4, width: float = 6, factor_orders: dict | None = None)[source]
Create visualizations using R’s ggplot2 library.
This method provides a bridge to R’s powerful ggplot2 visualization library, allowing you to create sophisticated plots directly from EDSL data structures.
- Parameters:
ggplot_code: R code string containing ggplot2 commands shape: Data shape to use (“wide” or “long”) sql: Optional SQL query to transform data before visualization remove_prefix: Whether to remove prefixes (like “answer.”) from column names debug: Whether to display debugging information height: Plot height in inches width: Plot width in inches factor_orders: Dictionary mapping factor variables to their desired order
- Returns:
A plot object that renders in Jupyter notebooks
- Notes:
Requires R and the ggplot2 package to be installed
Data is automatically converted to a format suitable for ggplot2
The ggplot2 code should reference column names as they appear after any transformations from the shape and remove_prefix parameters
- Examples:
>>> from edsl.results import Results >>> r = Results.example() >>> # The following would create a plot if R is installed (not shown in doctest): >>> # r.ggplot2(''' >>> # ggplot(df, aes(x=how_feeling)) + >>> # geom_bar() + >>> # labs(title="Distribution of Feelings") >>> # ''')
- give_valid_names(existing_codebook: dict = None) ScenarioList [source]
Give valid names to the scenario keys, using an existing codebook if provided.
- Args:
- existing_codebook (dict, optional): Existing mapping of original keys to valid names.
Defaults to None.
- Returns:
ScenarioList: A new ScenarioList with valid variable names and updated codebook.
>>> s = ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})]) >>> s.give_valid_names() ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})]) >>> s = ScenarioList([Scenario({'are you there John?': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})]) >>> s.give_valid_names() ScenarioList([Scenario({'john': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})]) >>> s.give_valid_names({'are you there John?': 'custom_name'}) ScenarioList([Scenario({'custom_name': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})])
- group_by(id_vars: List[str], variables: List[str], func: Callable) ScenarioList [source]
Group the ScenarioList by id_vars and apply a function to the specified variables.
- Parameters:
id_vars – Fields to use as identifier variables
variables – Fields to group and aggregate
func – Function to apply to the grouped variables
Returns: ScenarioList: A new ScenarioList with the grouped and aggregated results
Example: >>> def avg_sum(a, b): … return {‘avg_a’: sum(a) / len(a), ‘sum_b’: sum(b)} >>> s = ScenarioList([ … Scenario({‘group’: ‘A’, ‘year’: 2020, ‘a’: 10, ‘b’: 20}), … Scenario({‘group’: ‘A’, ‘year’: 2021, ‘a’: 15, ‘b’: 25}), … Scenario({‘group’: ‘B’, ‘year’: 2020, ‘a’: 12, ‘b’: 22}), … Scenario({‘group’: ‘B’, ‘year’: 2021, ‘a’: 17, ‘b’: 27}) … ]) >>> s.group_by(id_vars=[‘group’], variables=[‘a’, ‘b’], func=avg_sum) ScenarioList([Scenario({‘group’: ‘A’, ‘avg_a’: 12.5, ‘sum_b’: 45}), Scenario({‘group’: ‘B’, ‘avg_a’: 14.5, ‘sum_b’: 49})])
- property has_jinja_braces: bool[source]
Check if any Scenario in the list contains values with Jinja template braces.
This property checks all Scenarios in the list to determine if any contain string values with Jinja template syntax ({{ and }}). This is important for rendering templates and avoiding conflicts with other templating systems.
- Returns:
True if any Scenario contains values with Jinja braces, False otherwise.
- Examples:
>>> from edsl.scenarios import Scenario, ScenarioList >>> s1 = Scenario({"text": "Plain text"}) >>> s2 = Scenario({"text": "Template with {{variable}}"}) >>> sl1 = ScenarioList([s1]) >>> sl1.has_jinja_braces False >>> sl2 = ScenarioList([s1, s2]) >>> sl2.has_jinja_braces True
- inner_join(other: ScenarioList, by: str | list[str]) ScenarioList [source]
Perform an inner join with another ScenarioList, following SQL join semantics.
- Args:
other: The ScenarioList to join with by: String or list of strings representing the key(s) to join on. Cannot be empty.
- Returns:
A new ScenarioList containing only scenarios that have matches in both ScenarioLists
>>> s1 = ScenarioList([Scenario({'name': 'Alice', 'age': 30}), Scenario({'name': 'Bob', 'age': 25})]) >>> s2 = ScenarioList([Scenario({'name': 'Alice', 'location': 'New York'}), Scenario({'name': 'Charlie', 'location': 'Los Angeles'})]) >>> s4 = s1.inner_join(s2, 'name') >>> s4 == ScenarioList([Scenario({'age': 30, 'location': 'New York', 'name': 'Alice'})]) True
- items()[source]
Make this class compatible with dict.items() by accessing first scenario items.
This ensures the class works as a drop-in replacement for UserList in code that expects a dictionary-like interface.
- Returns:
items view from the first scenario object if available, empty list otherwise
- keep(*fields: str) ScenarioList [source]
Keep only the specified fields in the scenarios.
- Parameters:
fields – The fields to keep.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})]) >>> s.keep('a') ScenarioList([Scenario({'a': 1}), Scenario({'a': 1})])
- left_join(other: ScenarioList, by: str | list[str]) ScenarioList [source]
Perform a left join with another ScenarioList, following SQL join semantics.
- Args:
other: The ScenarioList to join with by: String or list of strings representing the key(s) to join on. Cannot be empty.
>>> s1 = ScenarioList([Scenario({'name': 'Alice', 'age': 30}), Scenario({'name': 'Bob', 'age': 25})]) >>> s2 = ScenarioList([Scenario({'name': 'Alice', 'location': 'New York'}), Scenario({'name': 'Charlie', 'location': 'Los Angeles'})]) >>> s3 = s1.left_join(s2, 'name') >>> s3 == ScenarioList([Scenario({'age': 30, 'location': 'New York', 'name': 'Alice'}), Scenario({'age': 25, 'location': None, 'name': 'Bob'})]) True
- make_tabular(remove_prefix: bool, pretty_labels: dict | None = None) tuple[list, List[list]] [source]
Turn the results into a tabular format.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').make_tabular(remove_prefix = True) (['how_feeling'], [['OK'], ['Great'], ['Terrible'], ['OK']])
>>> r.select('how_feeling').make_tabular(remove_prefix = True, pretty_labels = {'how_feeling': "How are you feeling"}) (['How are you feeling'], [['OK'], ['Great'], ['Terrible'], ['OK']])
- mutate(new_var_string: str, functions_dict: dict[str, Callable] | None = None) ScenarioList [source]
Return a new ScenarioList with a new variable added.
- Parameters:
new_var_string – A string with the new variable assignment.
functions_dict – A dictionary of functions to use in the assignment.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})]) >>> s.mutate("c = a + b") ScenarioList([Scenario({'a': 1, 'b': 2, 'c': 3}), Scenario({'a': 1, 'b': 1, 'c': 2})])
- num_observations()[source]
Return the number of observations in the dataset.
>>> from edsl.results import Results >>> Results.example().num_observations() 4
- offload(inplace: bool = False) ScenarioList [source]
Offloads base64-encoded content from all scenarios in the list by replacing ‘base64_string’ fields with ‘offloaded’. This reduces memory usage.
- Args:
inplace (bool): If True, modify the current scenario list. If False, return a new one.
- Returns:
ScenarioList: The modified scenario list (either self or a new instance).
- order_by(*fields: str, reverse: bool = False) ScenarioList [source]
Order the scenarios by one or more fields.
- Parameters:
fields – The fields to order by.
reverse – Whether to reverse the order.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})]) >>> s.order_by('b', 'a') ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})])
- property parameters: set[source]
Return the set of parameters in the ScenarioList
Example:
>>> s = ScenarioList([Scenario({'a': 1}), Scenario({'b': 2})]) >>> s.parameters == {'a', 'b'} True
- pivot(id_vars: List[str] = None, var_name='variable', value_name='value') ScenarioList [source]
Pivot the ScenarioList from long to wide format.
Parameters: id_vars (list): Fields to use as identifier variables var_name (str): Name of the variable column (default: ‘variable’) value_name (str): Name of the value column (default: ‘value’)
Example: >>> s = ScenarioList([ … Scenario({‘id’: 1, ‘year’: 2020, ‘variable’: ‘a’, ‘value’: 10}), … Scenario({‘id’: 1, ‘year’: 2020, ‘variable’: ‘b’, ‘value’: 20}), … Scenario({‘id’: 2, ‘year’: 2021, ‘variable’: ‘a’, ‘value’: 15}), … Scenario({‘id’: 2, ‘year’: 2021, ‘variable’: ‘b’, ‘value’: 25}) … ]) >>> s.pivot(id_vars=[‘id’, ‘year’]) ScenarioList([Scenario({‘id’: 1, ‘year’: 2020, ‘a’: 10, ‘b’: 20}), Scenario({‘id’: 2, ‘year’: 2021, ‘a’: 15, ‘b’: 25})])
- print_long()[source]
Print the results in a long format. >>> from edsl.results import Results >>> r = Results.example() >>> r.select(‘how_feeling’).print_long() answer.how_feeling: OK answer.how_feeling: Great answer.how_feeling: Terrible answer.how_feeling: OK
- relevant_columns(data_type: str | None = None, remove_prefix: bool = False) list [source]
Return the set of keys that are present in the dataset.
- Parameters:
data_type – The data type to filter by.
remove_prefix – Whether to remove the prefix from the column names.
>>> from ..dataset import Dataset >>> d = Dataset([{'a.b':[1,2,3,4]}]) >>> d.relevant_columns() ['a.b']
>>> d.relevant_columns(remove_prefix=True) ['b']
>>> d = Dataset([{'a':[1,2,3,4]}, {'b':[5,6,7,8]}]) >>> d.relevant_columns() ['a', 'b']
>>> from edsl.results import Results; Results.example().select('how_feeling', 'how_feeling_yesterday').relevant_columns() ['answer.how_feeling', 'answer.how_feeling_yesterday']
>>> from edsl.results import Results >>> sorted(Results.example().select().relevant_columns(data_type = "model")) ['model.canned_response', 'model.inference_service', 'model.model', 'model.model_index', 'model.temperature']
>>> # Testing relevant_columns with invalid data_type raises DatasetValueError - tested in unit tests
- remove_prefix()[source]
Returns a new Dataset with the prefix removed from all column names.
The prefix is defined as everything before the first dot (.) in the column name. If removing prefixes would result in duplicate column names, an exception is raised.
- Returns:
Dataset: A new Dataset with prefixes removed from column names
- Raises:
ValueError: If removing prefixes would result in duplicate column names
- Examples:
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling', 'how_feeling_yesterday').relevant_columns() ['answer.how_feeling', 'answer.how_feeling_yesterday'] >>> r.select('how_feeling', 'how_feeling_yesterday').remove_prefix().relevant_columns() ['how_feeling', 'how_feeling_yesterday']
>>> from edsl.dataset import Dataset >>> d = Dataset([{'a.x': [1, 2, 3]}, {'b.x': [4, 5, 6]}]) >>> # d.remove_prefix()
# Testing remove_prefix with duplicate column names raises DatasetValueError - tested in unit tests
- rename(replacement_dict: dict) ScenarioList [source]
Rename the fields in the scenarios.
- Parameters:
replacement_dict – A dictionary with the old names as keys and the new names as values.
Example:
>>> s = ScenarioList([Scenario({'name': 'Alice', 'age': 30}), Scenario({'name': 'Bob', 'age': 25})]) >>> s.rename({'name': 'first_name', 'age': 'years'}) ScenarioList([Scenario({'first_name': 'Alice', 'years': 30}), Scenario({'first_name': 'Bob', 'years': 25})])
- reorder_keys(new_order: List[str]) ScenarioList [source]
Reorder the keys in the scenarios.
- Parameters:
new_order – The new order of the keys.
Example:
# Example: # s = ScenarioList([Scenario({‘a’: 1, ‘b’: 2}), Scenario({‘a’: 3, ‘b’: 4})]) # s.reorder_keys([‘b’, ‘a’]) # Returns a new ScenarioList with reordered keys # Attempting s.reorder_keys([‘a’, ‘b’, ‘c’]) would fail as ‘c’ is not a valid key
- replace_names(new_names: list) ScenarioList [source]
Replace the field names in the scenarios with a new list of names.
- Parameters:
new_names – A list of new field names to use.
Example:
>>> s = ScenarioList([Scenario({'name': 'Alice', 'age': 30}), Scenario({'name': 'Bob', 'age': 25})]) >>> s.replace_names(['first_name', 'years']) ScenarioList([Scenario({'first_name': 'Alice', 'years': 30}), Scenario({'first_name': 'Bob', 'years': 25})])
- replace_values(replacements: dict) ScenarioList [source]
Create new scenarios with values replaced according to the provided replacement dictionary.
- Args:
replacements (dict): Dictionary of values to replace {old_value: new_value}
- Returns:
ScenarioList: A new ScenarioList with replaced values
- Examples:
>>> scenarios = ScenarioList([ ... Scenario({'a': 'nan', 'b': 1}), ... Scenario({'a': 2, 'b': 'nan'}) ... ]) >>> replaced = scenarios.replace_values({'nan': None}) >>> print(replaced) ScenarioList([Scenario({'a': None, 'b': 1}), Scenario({'a': 2, 'b': None})]) >>> # Original scenarios remain unchanged >>> print(scenarios) ScenarioList([Scenario({'a': 'nan', 'b': 1}), Scenario({'a': 2, 'b': 'nan'})])
- report(*fields: str | None, top_n: int | None = None, header_fields: List[str] | None = None, divider: bool = True, return_string: bool = False, format: str = 'markdown', filename: str | None = None) str | Document | None [source]
Generates a report of the results by iterating through rows.
- Args:
*fields: The fields to include in the report. If none provided, all fields are used. top_n: Optional limit on the number of observations to include. header_fields: Optional list of fields to include in the main header instead of as sections. divider: If True, adds a horizontal rule between observations (markdown only). return_string: If True, returns the markdown string. If False (default in notebooks),
only displays the markdown without returning.
format: Output format - either “markdown” or “docx”. filename: If provided and format is “docx”, saves the document to this file.
- Returns:
Depending on format and return_string: - For markdown: A string if return_string is True, otherwise None (displays in notebook) - For docx: A docx.Document object, or None if filename is provided (saves to file)
- Examples:
>>> from edsl.results import Results >>> r = Results.example() >>> report = r.select('how_feeling').report(return_string=True) >>> "# Observation: 1" in report True >>> doc = r.select('how_feeling').report(format="docx") >>> isinstance(doc, object) True
- report_from_template(template: str, *fields: str | None, top_n: int | None = None, remove_prefix: bool = True, return_string: bool = False, format: str = 'text', filename: str | None = None, separator: str = '\n\n', observation_title_template: str | None = None, explode: bool = False, filestore: bool = False) str | Document | List | FileStore | None [source]
Generates a report using a Jinja2 template for each row in the dataset.
This method renders a user-provided Jinja2 template for each observation in the dataset, with template variables populated from the row data. This allows for completely customized report formatting using pandoc for advanced output formats.
- Args:
template: Jinja2 template string to render for each row *fields: The fields to include in template context. If none provided, all fields are used. top_n: Optional limit on the number of observations to include. remove_prefix: Whether to remove type prefixes (e.g., “answer.”) from field names in template context. return_string: If True, returns the rendered content. If False (default in notebooks),
only displays the content without returning.
format: Output format - one of “text”, “html”, “pdf”, or “docx”. Formats other than “text” require pandoc. filename: If provided, saves the rendered content to this file. For exploded output,
this becomes a template (e.g., “report_{index}.html”).
separator: String to use between rendered templates for each row (ignored when explode=True). observation_title_template: Optional Jinja2 template for observation titles.
Defaults to “Observation {index}” where index is 1-based. Template has access to all row data plus ‘index’ and ‘index0’ variables.
explode: If True, creates separate files for each observation instead of one combined file. filestore: If True, wraps the generated file(s) in FileStore object(s). If no filename is provided,
creates temporary files. For exploded output, returns a list of FileStore objects.
- Returns:
Depending on explode, format, return_string, and filestore: - For text format: String content or None (if displayed in notebook) - For html format: HTML string content or None (if displayed in notebook) - For docx format: Document object or None (if saved to file) - For pdf format: PDF bytes or None (if saved to file) - If explode=True: List of created filenames (when filename provided) or list of documents/content - If filestore=True: FileStore object(s) containing the generated file(s)
- Notes:
Pandoc is required for HTML, PDF, and DOCX output formats
Templates are treated as Markdown for all non-text formats
PDF output uses XeLaTeX engine through pandoc
HTML output includes standalone document structure
- Examples:
>>> from edsl.results import Results >>> r = Results.example() >>> template = "Person feels: {{ how_feeling }}" >>> report = r.select('how_feeling').report_from_template(template, return_string=True) >>> "Person feels: OK" in report True >>> "Person feels: Great" in report True
# Custom observation titles >>> custom_title = “Response {{ index }}: {{ how_feeling }}” >>> report = r.select(‘how_feeling’).report_from_template( … template, observation_title_template=custom_title, return_string=True) >>> “Response 1: OK” in report True
# HTML output (requires pandoc) >>> html_report = r.select(‘how_feeling’).report_from_template( … template, format=”html”, return_string=True) # doctest: +SKIP >>> # Creates HTML with proper document structure
# PDF output (requires pandoc with XeLaTeX) >>> pdf_report = r.select(‘how_feeling’).report_from_template( … template, format=”pdf”) # doctest: +SKIP >>> # Returns PDF bytes
# Basic template functionality >>> template2 = “Feeling: {{ how_feeling }}, Index: {{ index }}” >>> report2 = r.select(‘how_feeling’).report_from_template( … template2, return_string=True, top_n=2) >>> “Feeling: OK, Index: 1” in report2 True
- right_join(other: ScenarioList, by: str | list[str]) ScenarioList [source]
Perform a right join with another ScenarioList, following SQL join semantics.
- Args:
other: The ScenarioList to join with by: String or list of strings representing the key(s) to join on. Cannot be empty.
- Returns:
A new ScenarioList containing all right scenarios with matching left data added
>>> s1 = ScenarioList([Scenario({'name': 'Alice', 'age': 30}), Scenario({'name': 'Bob', 'age': 25})]) >>> s2 = ScenarioList([Scenario({'name': 'Alice', 'location': 'New York'}), Scenario({'name': 'Charlie', 'location': 'Los Angeles'})]) >>> s5 = s1.right_join(s2, 'name') >>> s5 == ScenarioList([Scenario({'age': 30, 'location': 'New York', 'name': 'Alice'}), Scenario({'age': None, 'location': 'Los Angeles', 'name': 'Charlie'})]) True
- sample(n: int, seed: str | None = None) ScenarioList [source]
Return a random sample from the ScenarioList
>>> s = ScenarioList.from_list("a", [1,2,3,4,5,6]) >>> s.sample(3, seed = "edsl") ScenarioList([Scenario({'a': 2}), Scenario({'a': 1}), Scenario({'a': 3})])
- select(*fields: str) ScenarioList [source]
Select only specified fields from all scenarios in the list.
This method applies the select operation to each scenario in the list, returning a new ScenarioList where each scenario contains only the specified fields.
- Args:
*fields: Field names to select from each scenario.
- Returns:
A new ScenarioList with each scenario containing only the selected fields.
- Raises:
KeyError: If any specified field doesn’t exist in any scenario.
- Examples:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})]) >>> s.select('a') ScenarioList([Scenario({'a': 1}), Scenario({'a': 1})])
- shuffle(seed: str | None = None) ScenarioList [source]
Shuffle the ScenarioList.
>>> s = ScenarioList.from_list("a", [1,2,3,4]) >>> s.shuffle(seed = "1234") ScenarioList([Scenario({'a': 1}), Scenario({'a': 4}), Scenario({'a': 3}), Scenario({'a': 2})])
- sql(query: str, transpose: bool = None, transpose_by: str = None, remove_prefix: bool = True, shape: str = 'wide') Dataset [source]
Execute SQL queries on the dataset.
This powerful method allows you to use SQL to query and transform your data, combining the expressiveness of SQL with EDSL’s data structures. It works by creating an in-memory SQLite database from your data and executing the query against it.
- Parameters:
query: SQL query string to execute transpose: Whether to transpose the resulting table (rows become columns) transpose_by: Column to use as the new index when transposing remove_prefix: Whether to remove type prefixes (e.g., “answer.”) from column names shape: Data shape to use (“wide” or “long”)
“wide”: Default tabular format with columns for each field
“long”: Melted format with key-value pairs, useful for certain queries
- Returns:
A Dataset object containing the query results
- Notes:
The data is stored in a table named “self” in the SQLite database
In wide format, column names include their type prefix unless remove_prefix=True
In long format, the data is melted into columns: row_number, key, value, data_type
Complex objects like lists and dictionaries are converted to strings
- Examples:
>>> from edsl import Results >>> r = Results.example()
# Basic selection >>> len(r.sql(“SELECT * FROM self”, shape=”wide”)) 4
# Filtering with WHERE clause >>> r.sql(“SELECT * FROM self WHERE how_feeling = ‘Great’”).num_observations() 1
# Aggregation >>> r.sql(“SELECT how_feeling, COUNT(*) as count FROM self GROUP BY how_feeling”).keys() [‘how_feeling’, ‘count’]
# Using long format >>> len(r.sql(“SELECT * FROM self”, shape=”long”)) 200
- table(*fields: str, tablefmt: Literal['plain', 'simple', 'github', 'grid', 'fancy_grid', 'pipe', 'orgtbl', 'rst', 'mediawiki', 'html', 'latex', 'latex_raw', 'latex_booktabs', 'tsv'] | None = 'rich', pretty_labels: dict[str, str] | None = None) str [source]
Return the ScenarioList as a table.
- tack_on(replacements: dict[str, Any], index: int = -1) ScenarioList [source]
Add a duplicate of an existing scenario with optional value replacements.
This method duplicates the scenario at index (default
-1
which refers to the last scenario), applies the key/value pairs provided in replacements, and returns a new ScenarioList with the modified scenario appended.- Args:
- replacements: Mapping of field names to new values to overwrite in the cloned
scenario.
- index: Index of the scenario to duplicate. Supports negative indexing just
like normal Python lists (
-1
is the last item).
- Returns:
ScenarioList: A new ScenarioList containing all original scenarios plus the newly created one.
- Raises:
- ScenarioError: If the ScenarioList is empty, index is out of range, or if
any key in replacements does not exist in the reference scenario.
- tally(*fields: str | None, top_n: int | None = None, output='Dataset') dict | Dataset [source]
Count frequency distributions of values in specified fields.
This method tallies the occurrence of unique values within one or more fields, similar to a GROUP BY and COUNT in SQL. When multiple fields are provided, it performs cross-tabulation across those fields.
- Parameters:
*fields: Field names to tally. If none provided, uses all available fields. top_n: Optional limit to return only the top N most frequent values. output: Format for results, either “Dataset” (recommended) or “dict”.
- Returns:
By default, returns a Dataset with columns for the field(s) and a ‘count’ column. If output=”dict”, returns a dictionary mapping values to counts.
- Notes:
For single fields, returns counts of each unique value
For multiple fields, returns counts of each unique combination of values
Results are sorted in descending order by count
Fields can be specified with or without their type prefix
- Examples:
>>> from edsl import Results >>> r = Results.example()
# Single field frequency count >>> r.select(‘how_feeling’).tally(‘answer.how_feeling’, output=”dict”) {‘OK’: 2, ‘Great’: 1, ‘Terrible’: 1}
# Return as Dataset (default) >>> from edsl.dataset import Dataset >>> expected = Dataset([{‘answer.how_feeling’: [‘OK’, ‘Great’, ‘Terrible’]}, {‘count’: [2, 1, 1]}]) >>> r.select(‘how_feeling’).tally(‘answer.how_feeling’, output=”Dataset”) == expected True
# Multi-field cross-tabulation - exact output varies based on data >>> result = r.tally(‘how_feeling’, ‘how_feeling_yesterday’) >>> ‘how_feeling’ in result.keys() and ‘how_feeling_yesterday’ in result.keys() and ‘count’ in result.keys() True
- times(other: ScenarioList) ScenarioList [source]
Takes the cross product of two ScenarioLists.
Example:
>>> s1 = ScenarioList([Scenario({'a': 1}), Scenario({'a': 2})]) >>> s2 = ScenarioList([Scenario({'b': 1}), Scenario({'b': 2})]) >>> s1 * s2 ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2}), Scenario({'a': 2, 'b': 1}), Scenario({'a': 2, 'b': 2})])
- to(survey: 'Survey' | 'QuestionBase') Jobs [source]
Create a Jobs object from a ScenarioList and a Survey object.
- Parameters:
survey – The Survey object to use for the Jobs object.
Example: >>> from edsl import Survey, Jobs, ScenarioList # doctest: +SKIP >>> isinstance(ScenarioList.example().to(Survey.example()), Jobs) # doctest: +SKIP True
- to_agent_list(remove_prefix: bool = True)[source]
Convert the results to a list of dictionaries, one per agent.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').to_agent_list() AgentList([Agent(traits = {'how_feeling': 'OK'}), Agent(traits = {'how_feeling': 'Great'}), Agent(traits = {'how_feeling': 'Terrible'}), Agent(traits = {'how_feeling': 'OK'})])
- to_csv(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None) FileStore [source]
Export the results to a FileStore instance containing CSV data.
- to_dataset() Dataset [source]
Convert the ScenarioList to a Dataset.
>>> s = ScenarioList.from_list("a", [1,2,3]) >>> s.to_dataset() Dataset([{'a': [1, 2, 3]}]) >>> s = ScenarioList.from_list("a", [1,2,3]).add_list("b", [4,5,6]) >>> s.to_dataset() Dataset([{'a': [1, 2, 3]}, {'b': [4, 5, 6]}])
- to_dict(sort: bool = False, add_edsl_version: bool = False) dict [source]
>>> s = ScenarioList([Scenario({'food': 'wood chips'}), Scenario({'food': 'wood-fired pizza'})]) >>> s.to_dict() {'scenarios': [{'food': 'wood chips'}, {'food': 'wood-fired pizza'}]}
>>> s = ScenarioList([Scenario({'food': 'wood chips'})], codebook={'food': 'description'}) >>> d = s.to_dict() >>> 'codebook' in d True >>> d['codebook'] == {'food': 'description'} True
>>> # To include edsl_version and edsl_class_name, explicitly set add_edsl_version=True >>> s.to_dict(add_edsl_version=True) {'scenarios': [{'food': 'wood chips', 'edsl_version': '...', 'edsl_class_name': 'Scenario'}], 'codebook': {'food': 'description'}, 'edsl_version': '...', 'edsl_class_name': 'ScenarioList'}
- to_dicts(remove_prefix: bool = True) list[dict] [source]
Convert the results to a list of dictionaries.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').to_dicts() [{'how_feeling': 'OK'}, {'how_feeling': 'Great'}, {'how_feeling': 'Terrible'}, {'how_feeling': 'OK'}]
- to_docx(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None) FileStore [source]
Export the results to a FileStore instance containing DOCX data.
Each row of the dataset will be rendered on its own page, with a 2-column table that lists the keys and associated values for that observation.
- to_excel(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None, sheet_name: str | None = None)[source]
Export the results to a FileStore instance containing Excel data.
- to_jsonl(filename: str | None = None)[source]
Export the results to a FileStore instance containing JSONL data.
- to_key_value(field: str, value=None) dict | set [source]
Return the set of values in the field.
- Parameters:
field – The field to extract values from.
value – An optional field to use as the value in the key-value pair.
Example:
>>> s = ScenarioList([Scenario({'name': 'Alice'}), Scenario({'name': 'Bob'})]) >>> s.to_key_value('name') == {'Alice', 'Bob'} True
- to_list(flatten=False, remove_none=False, unzipped=False) list[list] [source]
Convert the results to a list of lists.
- Parameters:
flatten – Whether to flatten the list of lists.
remove_none – Whether to remove None values from the list.
>>> from edsl.results import Results >>> Results.example().select('how_feeling', 'how_feeling_yesterday') Dataset([{'answer.how_feeling': ['OK', 'Great', 'Terrible', 'OK']}, {'answer.how_feeling_yesterday': ['Great', 'Good', 'OK', 'Terrible']}])
>>> Results.example().select('how_feeling', 'how_feeling_yesterday').to_list() [('OK', 'Great'), ('Great', 'Good'), ('Terrible', 'OK'), ('OK', 'Terrible')]
>>> r = Results.example() >>> r.select('how_feeling').to_list() ['OK', 'Great', 'Terrible', 'OK']
>>> from edsl.dataset import Dataset >>> Dataset([{'a.b': [[1, 9], 2, 3, 4]}]).select('a.b').to_list(flatten = True) [1, 9, 2, 3, 4]
>>> from edsl.dataset import Dataset >>> # Testing to_list flatten with multiple columns raises DatasetValueError - tested in unit tests
- to_pandas(remove_prefix: bool = False, lists_as_strings=False)[source]
Convert the results to a pandas DataFrame, ensuring that lists remain as lists.
- Args:
remove_prefix: Whether to remove the prefix from the column names. lists_as_strings: Whether to convert lists to strings.
- Returns:
A pandas DataFrame.
- to_polars(remove_prefix: bool = False, lists_as_strings=False)[source]
Convert the results to a Polars DataFrame.
- Args:
remove_prefix: Whether to remove the prefix from the column names. lists_as_strings: Whether to convert lists to strings.
- Returns:
A Polars DataFrame.
- to_scenario_list(remove_prefix: bool = True) list[dict] [source]
Convert the results to a list of dictionaries, one per scenario.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').to_scenario_list() ScenarioList([Scenario({'how_feeling': 'OK'}), Scenario({'how_feeling': 'Great'}), Scenario({'how_feeling': 'Terrible'}), Scenario({'how_feeling': 'OK'})])
- to_sqlite(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None, table_name: str = 'results', if_exists: str = 'replace')[source]
Export the results to a SQLite database file.
- transform(field: str, func: Callable, new_name: str | None = None) ScenarioList [source]
Transform a field using a function.
- Parameters:
field – The field to transform.
func – The function to apply to the field.
new_name – An optional new name for the transformed field.
>>> s = ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})]) >>> s.transform('b', lambda x: x + 1) ScenarioList([Scenario({'a': 1, 'b': 3}), Scenario({'a': 1, 'b': 2})])
- transform_by_key(key_field: str) Scenario [source]
Transform the ScenarioList into a single Scenario with key/value pairs.
This method transforms the ScenarioList by: 1. Using the value of the specified key_field from each Scenario as a new key 2. Automatically formatting the remaining values as “key: value, key: value” 3. Creating a single Scenario containing all the transformed key/value pairs
- Args:
key_field: The field name whose value will become the new key
- Returns:
A single Scenario with all the transformed key/value pairs
- Examples:
>>> # Original scenarios: [{'topic': 'party', 'location': 'offsite', 'time': 'evening'}] >>> scenarios = ScenarioList([ ... Scenario({'topic': 'party', 'location': 'offsite', 'time': 'evening'}) ... ]) >>> transformed = scenarios.transform_by_key('topic') >>> # Result: Scenario({'party': 'location: offsite, time: evening'})
- tree(node_order: List[str] | None = None)[source]
Convert the results to a Tree.
- Args:
node_order: The order of the nodes.
- Returns:
A Tree object.
- unique() ScenarioList [source]
Return a new ScenarioList containing only unique Scenario objects.
This method removes duplicate Scenario objects based on their hash values, which are determined by their content. Two Scenarios with identical key-value pairs will have the same hash and be considered duplicates.
- Returns:
A new ScenarioList containing only unique Scenario objects.
- Examples:
>>> from edsl.scenarios import Scenario, ScenarioList >>> s1 = Scenario({"a": 1}) >>> s2 = Scenario({"a": 1}) # Same content as s1 >>> s3 = Scenario({"a": 2}) >>> sl = ScenarioList([s1, s2, s3]) >>> unique_sl = sl.unique() >>> len(unique_sl) 2 >>> unique_sl ScenarioList([Scenario({'a': 1}), Scenario({'a': 2})])
- Notes:
The order of scenarios in the result is not guaranteed due to the use of sets
Uniqueness is determined by the Scenario’s __hash__ method
The original ScenarioList is not modified
This implementation is memory efficient as it processes scenarios one at a time
- unpack(field: str, new_names: List[str] | None = None, keep_original=True) ScenarioList [source]
Unpack a field into multiple fields.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': [2, True]}), Scenario({'a': 3, 'b': [3, False]})]) >>> s.unpack('b') ScenarioList([Scenario({'a': 1, 'b': [2, True], 'b_0': 2, 'b_1': True}), Scenario({'a': 3, 'b': [3, False], 'b_0': 3, 'b_1': False})]) >>> s.unpack('b', new_names=['c', 'd'], keep_original=False) ScenarioList([Scenario({'a': 1, 'c': 2, 'd': True}), Scenario({'a': 3, 'c': 3, 'd': False})])
- unpack_dict(field: str, prefix: str | None = None, drop_field: bool = False) ScenarioList [source]
Unpack a dictionary field into separate fields.
- Parameters:
field – The field to unpack.
prefix – An optional prefix to add to the new fields.
drop_field – Whether to drop the original field.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': {'c': 2, 'd': 3}})]) >>> s.unpack_dict('b') ScenarioList([Scenario({'a': 1, 'b': {'c': 2, 'd': 3}, 'c': 2, 'd': 3})]) >>> s.unpack_dict('b', prefix='new_') ScenarioList([Scenario({'a': 1, 'b': {'c': 2, 'd': 3}, 'new_c': 2, 'new_d': 3})])
- unpack_list(field: str, new_names: List[str] | None = None, keep_original: bool = True) Dataset [source]
Unpack list columns into separate columns with provided names or numeric suffixes.
For example, if a dataset contains: [{‘data’: [[1, 2, 3], [4, 5, 6]], ‘other’: [‘x’, ‘y’]}]
After d.unpack_list(‘data’), it should become: [{‘other’: [‘x’, ‘y’], ‘data_1’: [1, 4], ‘data_2’: [2, 5], ‘data_3’: [3, 6]}]
- Args:
field: The field containing lists to unpack new_names: Optional list of names for the unpacked fields. If None, uses numeric suffixes. keep_original: If True, keeps the original field in the dataset
- Returns:
A new Dataset with unpacked columns
- Examples:
>>> from edsl.dataset import Dataset >>> d = Dataset([{'data': [[1, 2, 3], [4, 5, 6]]}]) >>> d.unpack_list('data') Dataset([{'data': [[1, 2, 3], [4, 5, 6]]}, {'data_1': [1, 4]}, {'data_2': [2, 5]}, {'data_3': [3, 6]}])
>>> d.unpack_list('data', new_names=['first', 'second', 'third']) Dataset([{'data': [[1, 2, 3], [4, 5, 6]]}, {'first': [1, 4]}, {'second': [2, 5]}, {'third': [3, 6]}])
- unpivot(id_vars: List[str] | None = None, value_vars: List[str] | None = None) ScenarioList [source]
Unpivot the ScenarioList, allowing for id variables to be specified.
Parameters: id_vars (list): Fields to use as identifier variables (kept in each entry) value_vars (list): Fields to unpivot. If None, all fields not in id_vars will be used.
Example: >>> s = ScenarioList([ … Scenario({‘id’: 1, ‘year’: 2020, ‘a’: 10, ‘b’: 20}), … Scenario({‘id’: 2, ‘year’: 2021, ‘a’: 15, ‘b’: 25}) … ]) >>> s.unpivot(id_vars=[‘id’, ‘year’], value_vars=[‘a’, ‘b’]) ScenarioList([Scenario({‘id’: 1, ‘year’: 2020, ‘variable’: ‘a’, ‘value’: 10}), Scenario({‘id’: 1, ‘year’: 2020, ‘variable’: ‘b’, ‘value’: 20}), Scenario({‘id’: 2, ‘year’: 2021, ‘variable’: ‘a’, ‘value’: 15}), Scenario({‘id’: 2, ‘year’: 2021, ‘variable’: ‘b’, ‘value’: 25})])