Scenarios
A Scenario is a dictionary containing one or more key/value pairs that is used to add data or content to questions in a survey, replacing a parameter in a question with a specific value (e.g., numerical or textual) or content (e.g., an image or PDF). A ScenarioList is a list of Scenario objects.
Purpose
Scenarios allow you create variations and versions of questions efficiently. For example, we could create a question “How much do you enjoy {{ activity }}?” and use scenarios to replace the parameter activity with running or reading or other activities. Similarly, we could create a question “What do you see in this image? {{ image }}” and use scenarios to replace the parameter image with different images.
Adding scenarios to a question (or multiple questions in a survey) causes it to be administered multiple times, once for each scenario, with the parameter(s) replaced by the value(s) in the scenario. This allows us to administer multiple versions of a question together, either asynchronously (by default) or according to Surveys rules that we can specify (e.g., skip/stop logic), without having to create each version of a question manually.
Metadata
Scenarios are also a convenient way to keep track of metadata or other information relating to survey questions that is important to an analysis of the results. For example, say we are using scenarios to parameterize questions with pieces of {{ content }} from a dataset. In scenarios for the content parameter, we could also include metadata about the source of the content, such as the {{ author }}, {{ publication_date }}, or {{ source }}. This will automatically create columns for the additional data in the survey results without passing them to the question texts (if there is no corresponding parameter in the question texts). This allows us to analyze the responses in the context of the metadata without needing to match up the data with the metadata post-survey.
Constructing a Scenario
To use a scenario, we start by creating a question that takes a parameter in double braces:
from edsl import QuestionMultipleChoice
q = QuestionMultipleChoice(
question_name = "enjoy",
question_text = "How much do you enjoy {{ activity }}?",
question_options = ["Not at all", "Somewhat", "Very much"]
)
Next we create a dictionary for a value that will replace the parameter and store it in a Scenario object:
from edsl import Scenario
scenario = Scenario({"activity": "running"})
We can inspect the scenario and see that it consists of the key/value pair that we created:
scenario
This will return:
key |
value |
---|---|
activity |
running |
ScenarioList
If multiple values will be used, we can create a list of Scenario objects:
from edsl import Scenario
scenarios = [Scenario({"activity": a}) for a in ["running", "reading"]]
We can inspect the scenarios:
scenarios
This will return:
[Scenario({'activity': 'running'}), Scenario({'activity': 'reading'})]
We can also create a ScenarioList object to store multiple scenarios:
from edsl import ScenarioList
scenariolist = ScenarioList([Scenario({"activity": a}) for a in ["running", "reading"]])
We can inspect it:
scenariolist
This will return:
activity |
---|
running |
reading |
We can also create a ScenarioList from a list of values for a key. The following code will generate the same scenario list as above:
scenariolist = ScenarioList.from_list("activity", ["running", "reading"])
A list of scenarios is used in the same way as a ScenarioList. The difference is that a ScenarioList is a class that can be used to create a list of scenarios from a variety of data sources, such as a list, dictionary, or a Wikipedia table (see examples below).
Using f-strings with scenarios
It is possible to use scenarios and f-strings together in a question. An f-string must be evaluated when a question is constructed, whereas a scenario is evaluated when a question is run.
For example, here we use an f-string to create different versions of a question that also takes a parameter {{ activity }}, together with a list of scenarios to replace the parameter when the questions are run. We optionally include the f-string in the question name as well as the question text in order to simultaneously create unique identifiers for the questions, which are needed in order to pass the questions that are created to a Survey. Then we use the show_prompts() method to examine the user prompts that are created when the scenarios are added to the questions:
from edsl import QuestionFreeText, ScenarioList, Scenario, Survey
questions = []
sentiments = ["enjoy", "hate", "love"]
for sentiment in sentiments:
q = QuestionFreeText(
question_name = f"{ sentiment }_activity",
question_text = f"How much do you { sentiment } {{ activity }}?"
)
questions.append(q)
scenarios = ScenarioList(
Scenario({"activity": activity}) for activity in ["running", "reading"]
)
survey = Survey(questions = questions)
survey.by(scenarios).show_prompts()
This will print the questions created with the f-string with the scenarios added (not that the system prompts are blank because we have not created any agents):
user_prompt |
system_prompt |
---|---|
How much do you enjoy running? |
|
How much do you hate running? |
|
How much do you love running? |
|
How much do you enjoy reading? |
|
How much do you hate reading? |
|
How much do you love reading? |
To learn more about prompts, please see the Prompts section.
Using a Scenario
We use a scenario (or scenariolist) by adding it to a question (or a survey of questions), either when constructing the question or else when running it.
We use the by() method to add a scenario to a question when running it:
from edsl import QuestionMultipleChoice, Scenario, Agent
q = QuestionMultipleChoice(
question_name = "enjoy",
question_text = "How much do you enjoy {{ activity }}?",
question_options = ["Not at all", "Somewhat", "Very much"]
)
s = Scenario({"activity": "running"})
a = Agent(traits = {"persona":"You are a human."})
results = q.by(s).by(a).run()
We can check the results to verify that the scenario has been used correctly:
results.select("activity", "enjoy")
This will print a table of the selected components of the results:
scenario.activity |
answer.enjoy |
---|---|
running |
Somewhat |
Looping
We use the loop() method to add a scenario to a question when constructing it, passing it a ScenarioList. This creates a list containing a new question for each scenario that was passed. Note that we can optionally include the scenario key in the question name as well; otherwise a unique identifies is automatically added to each question name.
For example:
from edsl import QuestionMultipleChoice, ScenarioList, Scenario
q = QuestionMultipleChoice(
question_name = "enjoy_{{ activity }}",
question_text = "How much do you enjoy {{ activity }}?",
question_options = ["Not at all", "Somewhat", "Very much"]
)
sl = ScenarioList(
Scenario({"activity": a}) for a in ["running", "reading"]
)
questions = q.loop(sl)
We can inspect the questions to see that they have been created correctly:
questions
This will return:
[Question('multiple_choice', question_name = """enjoy_running""", question_text = """How much do you enjoy running?""", question_options = ['Not at all', 'Somewhat', 'Very much']),
Question('multiple_choice', question_name = """enjoy_reading""", question_text = """How much do you enjoy reading?""", question_options = ['Not at all', 'Somewhat', 'Very much'])]
We can pass the questions to a survey and run it:
from edsl import Survey, Agent
survey = Survey(questions = questions)
a = Agent(traits = {"persona": "You are a human."})
results = survey.by(a).run()
results.select("answer.*")
This will print a table of the response for each question (note that “activity” is no longer in a separate scenario field):
answer.enjoy_reading |
answer.enjoy_running |
---|---|
Very much |
Somewhat |
Note: The loop() method cannot be used with image or PDF scenarios, as these are not evaluated when the question is constructed. Instead, use the by() method to add these types of scenarios when running a survey (see image scenario examples below).
Multiple parameters
We can also create a Scenario for multiple parameters:
from edsl import QuestionFreeText
q = QuestionFreeText(
question_name = "counting",
question_text = "How many {{ unit }} are in a {{ distance }}?",
)
scenario = Scenario({"unit": "inches", "distance": "mile"})
results = q.by(scenario).run()
results.select("unit", "distance", "counting")
This will print a table of the selected components of the results:
scenario.unit |
scenario.distance |
answer.counting |
---|---|---|
inches |
mile |
There are 63,360 inches in a mile. |
To learn more about constructing surveys, please see the Surveys module.
Scenarios for question options
In the above examples we created scenarios in the question_text. We can also create a Scenario for question_options, e.g., in a multiple choice, checkbox, linear scale or other question type that requires them:
from edsl import QuestionMultipleChoice, Scenario
q = QuestionMultipleChoice(
question_name = "capital_of_france",
question_text = "What is the capital of France?",
question_options = "{{ question_options }}"
)
s = Scenario({'question_options': ['Paris', 'London', 'Berlin', 'Madrid']})
results = q.by(s).run()
results.select("answer.*")
Output:
answer.capital_of_france |
---|
Paris |
Combining Scenarios
We can combine multiple scenarios into a single Scenario object:
from edsl import Scenario
scenario1 = Scenario({"food": "apple"})
scenario2 = Scenario({"drink": "water"})
combined_scenario = scenario1 + scenario2
combined_scenario
This will return:
key |
value |
---|---|
food |
drink |
apple |
water |
We can also combine ScenarioList objects:
from edsl import ScenarioList
scenariolist1 = ScenarioList([Scenario({"food": "apple"}), Scenario({"drink": "water"})])
scenariolist2 = ScenarioList([Scenario({"color": "red"}), Scenario({"shape": "circle"})])
combined_scenariolist = scenariolist1 + scenariolist2
combined_scenariolist
This will return:
food |
drink |
color |
shape |
---|---|---|---|
apple |
|||
water |
|||
red |
|||
circle |
We can create a cross product of ScenarioList objects (combine the scenarios in each list with each other):
from edsl import ScenarioList
scenariolist1 = ScenarioList([Scenario({"food": "apple"}), Scenario({"drink": "water"})])
scenariolist2 = ScenarioList([Scenario({"color": "red"}), Scenario({"shape": "circle"})])
cross_product_scenariolist = scenariolist1 * scenariolist2
cross_product_scenariolist
This will return:
food |
drink |
color |
shape |
---|---|---|---|
apple |
red |
||
apple |
circle |
||
red |
water |
||
water |
circle |
Creating scenarios from a dataset
There are a variety of methods for creating and working with scenarios generated from datasets and different data types.
Turning results into scenarios
The method to_scenario_list() can be used to turn the results of a survey into a list of scenarios.
Example usage:
Say we have some results from a survey where we asked agents to choose a random number between 1 and 1000:
from edsl import QuestionNumerical, Agent
q_random = QuestionNumerical(
question_name = "random",
question_text = "Choose a random number between 1 and 1000."
)
agents = [Agent({"persona":p}) for p in ["Child", "Magician", "Olympic breakdancer"]]
results = q_random.by(agents).run()
results.select("persona", "random")
Our results are:
agent.persona |
answer.random |
---|---|
Child |
7 |
Magician |
472 |
Olympic breakdancer |
529 |
We can use the to_scenario_list() method turn components of the results into a list of scenarios to use in a new survey:
scenarios = results.select("persona", "random").to_scenario_list() # excluding other columns of the results
scenarios
We can inspect the scenarios to see that they have been created correctly:
persona |
random |
---|---|
Child |
7 |
Magician |
472 |
Olympic breakdancer |
529 |
PDFs as textual scenarios
The ScenarioList method from_pdf(‘path/to/pdf’) is a convenient way to extract information from large files. It allows you to read in a PDF and automatically create a list of textual scenarios for the pages of the file. Each scenario has the following keys: filename, page, text which can be used as a parameter in a question (or stored as metadat), and renamed as desired.
How it works: Add a placeholder {{ text }} to a question text to use the text of a PDF page as a parameter in the question. When you run the survey with the PDF scenarios, the text of each page will be inserted into the question text in place of the placeholder.
Example usage:
from edsl import QuestionFreeText, ScenarioList, Survey
# Create a survey of questions parameterized by the {{ text }} of the PDF pages:
q1 = QuestionFreeText(
question_name = "themes",
question_text = "Identify the key themes mentioned on this page: {{ text }}",
)
q2 = QuestionFreeText(
question_name = "idea",
question_text = "Identify the most important idea on this page: {{ text }}",
)
survey = Survey([q1, q2])
scenarios = ScenarioList.from_pdf("path/to/pdf_file.pdf")
# Run the survey with the pages of the PDF as scenarios:
results = survey.by(scenarios).run()
# To print the page and text of each PDF page scenario together with the answers to the question:
results.select("page", "text", "answer.*")
See a demo notebook of this method in the notebooks section of the docs index: “Extracting information from PDFs”.
Image scenarios
A Scenario can be generated from an image by passing the filepath as the value.
Example usage:
from edsl import Scenario
s = Scenario("logo":"logo.png") # Replace with your own local file
We can add the key to questions as we do scenarios from other data sources:
from edsl import Model, QuestionFreeText, QuestionList, Survey
m = Model("gpt-4o")
q1 = QuestionFreeText(
question_name = "identify",
question_text = "What animal is in this picture: {{ logo }}"
)
q2 = QuestionList(
question_name = "colors",
question_text = "What colors do you see in this picture: {{ logo }}"
)
survey = Survey([q1, q2])
results = survey.by(s).run()
results.select("logo", "identify", "colors")
Output using the Expected Parrot logo:
answer.identify |
answer.colors |
---|---|
The image shows a large letter “E” followed by a pair of square brackets containing an illustration of a parrot. The parrot is green with a yellow beak and some red and blue coloring on its body. This combination suggests the mathematical notation for the expected value, often denoted as “E” followed by a random variable in brackets, commonly used in probability and statistics. |
[‘gray’, ‘green’, ‘orange’, ‘pink’, ‘blue’, ‘black’] |
See an example of this method in the notebooks section of the docs index: Using images in a survey.
Note: You must use a vision model in order to run questions with images. It is recommended to test that a model can reliably identify each image before running a survey with image scenarios.
Creating a scenario list from a list
The ScenarioList method from_list() creates a list of scenarios for a specified key and list of values that is passed to it.
Example usage:
from edsl import ScenarioList
scenariolist = ScenarioList.from_list("item", ["color", "food", "animal"])
scenariolist
This will return:
item |
---|
color |
food |
animal |
Creating a scenario list from a dictionary
The Scenario method from_dict() creates a scenario for a dictionary that is passed to it.
The ScenarioList method from_nested_dict() creates a list of scenarios for a specified key and nested dictionary.
Example usage:
# Example dictionary
d = {"item": ["color", "food", "animal"]}
from edsl import Scenario
scenario = Scenario.from_dict(d)
scenario
This will return a single scenario for the list of items in the dict:
key |
value |
---|---|
item:0 |
color |
item:1 |
food |
item:2 |
animal |
If we instead want to create a scenario for each item in the list individually:
from edsl import ScenarioList
scenariolist = ScenarioList.from_nested_dict(d)
scenariolist
This will return:
item |
---|
color |
food |
animal |
Creating a scenario list from a Wikipedia table
The ScenarioList method from_wikipedia_table(‘url’) can be used to create a list of scenarios from a Wikipedia table.
Example usage:
from edsl import ScenarioList
scenarios = ScenarioList.from_wikipedia("https://en.wikipedia.org/wiki/1990s_in_film", 3)
scenarios
This will return a list of scenarios for the first table on the Wikipedia page:
Rank |
Title |
Studios |
Worldwide gross |
Year |
---|---|---|---|---|
1 |
Titanic |
Paramount Pictures/20th Century Fox |
$1,843,201,268 |
1997 |
2 |
Star Wars: Episode I - The Phantom Menace |
20th Century Fox |
$924,317,558 |
1999 |
3 |
Jurassic Park |
Universal Pictures |
$914,691,118 |
1993 |
4 |
Independence Day |
20th Century Fox |
$817,400,891 |
1996 |
5 |
The Lion King |
Walt Disney Studios |
$763,455,561 |
1994 |
6 |
Forrest Gump |
Paramount Pictures |
$677,387,716 |
1994 |
7 |
The Sixth Sense |
Walt Disney Studios |
$672,806,292 |
1999 |
8 |
The Lost World: Jurassic Park |
Universal Pictures |
$618,638,999 |
1997 |
9 |
Men in Black |
Sony Pictures/Columbia Pictures |
$589,390,539 |
1997 |
10 |
Armageddon |
Walt Disney Studios |
$553,709,788 |
1998 |
11 |
Terminator 2: Judgment Day |
TriStar Pictures |
$519,843,345 |
1991 |
12 |
Ghost |
Paramount Pictures |
$505,702,588 |
1990 |
13 |
Aladdin |
Walt Disney Studios |
$504,050,219 |
1992 |
14 |
Twister |
Warner Bros./Universal Pictures |
$494,471,524 |
1996 |
15 |
Toy Story 2 |
Walt Disney Studios |
$485,015,179 |
1999 |
16 |
Saving Private Ryan |
DreamWorks Pictures/Paramount Pictures |
$481,840,909 |
1998 |
17 |
Home Alone |
20th Century Fox |
$476,684,675 |
1990 |
18 |
The Matrix |
Warner Bros. |
$463,517,383 |
1999 |
19 |
Pretty Woman |
Walt Disney Studios |
$463,406,268 |
1990 |
20 |
Mission: Impossible |
Paramount Pictures |
$457,696,359 |
1996 |
21 |
Tarzan |
Walt Disney Studios |
$448,191,819 |
1999 |
22 |
Mrs. Doubtfire |
20th Century Fox |
$441,286,195 |
1993 |
23 |
Dances with Wolves |
Orion Pictures |
$424,208,848 |
1990 |
24 |
The Mummy |
Universal Pictures |
$415,933,406 |
1999 |
25 |
The Bodyguard |
Warner Bros. |
$411,006,740 |
1992 |
26 |
Robin Hood: Prince of Thieves |
Warner Bros. |
$390,493,908 |
1991 |
27 |
Godzilla |
TriStar Pictures |
$379,014,294 |
1998 |
28 |
True Lies |
20th Century Fox |
$378,882,411 |
1994 |
29 |
Toy Story |
Walt Disney Studios |
$373,554,033 |
1995 |
30 |
There’s Something About Mary |
20th Century Fox |
$369,884,651 |
1998 |
31 |
The Fugitive |
Warner Bros. |
$368,875,760 |
1993 |
32 |
Die Hard with a Vengeance |
20th Century Fox/Cinergi Pictures |
$366,101,666 |
1995 |
33 |
Notting Hill |
PolyGram Filmed Entertainment |
$363,889,678 |
1999 |
34 |
A Bug’s Life |
Walt Disney Studios |
$363,398,565 |
1998 |
35 |
The World Is Not Enough |
Metro-Goldwyn-Mayer Pictures |
$361,832,400 |
1999 |
36 |
Home Alone 2: Lost in New York |
20th Century Fox |
$358,994,850 |
1992 |
37 |
American Beauty |
DreamWorks Pictures |
$356,296,601 |
1999 |
38 |
Apollo 13 |
Universal Pictures/Imagine Entertainment |
$355,237,933 |
1995 |
39 |
Basic Instinct |
TriStar Pictures |
$352,927,224 |
1992 |
40 |
GoldenEye |
MGM/United Artists |
$352,194,034 |
1995 |
41 |
The Mask |
New Line Cinema |
$351,583,407 |
1994 |
42 |
Speed |
20th Century Fox |
$350,448,145 |
1994 |
43 |
Deep Impact |
Paramount Pictures/DreamWorks Pictures |
$349,464,664 |
1998 |
44 |
Beauty and the Beast |
Walt Disney Studios |
$346,317,207 |
1991 |
45 |
Pocahontas |
Walt Disney Studios |
$346,079,773 |
1995 |
46 |
The Flintstones |
Universal Pictures |
$341,631,208 |
1994 |
47 |
Batman Forever |
Warner Bros. |
$336,529,144 |
1995 |
48 |
The Rock |
Walt Disney Studios |
$335,062,621 |
1996 |
49 |
Tomorrow Never Dies |
MGM/United Artists |
$333,011,068 |
1997 |
50 |
Seven |
New Line Cinema |
$327,311,859 |
1995 |
The parameters let us know the keys that can be used in the question text or stored as metadata. (They can be edited as needed - e.g., using the rename method discussed above.)
scenarios.parameters
This will return:
{'Rank', 'Ref.', 'Studios', 'Title', 'Worldwide gross', 'Year'}
The scenarios can be used to ask questions about the data in the table:
from edsl import QuestionList
q_leads = QuestionList(
question_name = "leads",
question_text = "Who are the lead actors or actresses in {{ Title }}?"
)
results = q_leads.by(scenarios).run()
(
results
.sort_by("Title")
.select("Title", "leads")
)
Output:
Title |
Leads |
---|---|
A Bug’s Life |
Dave Foley, Kevin Spacey, Julia Louis-Dreyfus, Hayden Panettiere, Phyllis Diller, Richard Kind, David Hyde Pierce |
Aladdin |
Mena Massoud, Naomi Scott, Will Smith |
American Beauty |
Kevin Spacey, Annette Bening, Thora Birch, Mena Suvari, Wes Bentley, Chris Cooper |
Apollo 13 |
Tom Hanks, Kevin Bacon, Bill Paxton |
Armageddon |
Bruce Willis, Billy Bob Thornton, Liv Tyler, Ben Affleck |
Basic Instinct |
Michael Douglas, Sharon Stone |
Batman Forever |
Val Kilmer, Tommy Lee Jones, Jim Carrey, Nicole Kidman, Chris O’Donnell |
Beauty and the Beast |
Emma Watson, Dan Stevens, Luke Evans, Kevin Kline, Josh Gad |
Dances with Wolves |
Kevin Costner, Mary McDonnell, Graham Greene, Rodney A. Grant |
Deep Impact |
Téa Leoni, Morgan Freeman, Elijah Wood, Robert Duvall |
Die Hard with a Vengeance |
Bruce Willis, Samuel L. Jackson, Jeremy Irons |
Forrest Gump |
Tom Hanks, Robin Wright, Gary Sinise, Mykelti Williamson, Sally Field |
Ghost |
Patrick Swayze, Demi Moore, Whoopi Goldberg |
Godzilla |
Matthew Broderick, Jean Reno, Bryan Cranston, Aaron Taylor-Johnson, Elizabeth Olsen, Kyle Chandler, Vera Farmiga, Millie Bobby Brown |
GoldenEye |
Pierce Brosnan, Sean Bean, Izabella Scorupco, Famke Janssen |
Home Alone |
Macaulay Culkin, Joe Pesci, Daniel Stern, Catherine O’Hara, John Heard |
Home Alone 2: Lost in New York |
Macaulay Culkin, Joe Pesci, Daniel Stern, Catherine O’Hara, John Heard |
Independence Day |
Will Smith, Bill Pullman, Jeff Goldblum |
Jurassic Park |
Sam Neill, Laura Dern, Jeff Goldblum, Richard Attenborough |
Men in Black |
Tommy Lee Jones, Will Smith |
Mission: Impossible |
Tom Cruise, Ving Rhames, Simon Pegg, Rebecca Ferguson, Jeremy Renner |
Mrs. Doubtfire |
Robin Williams, Sally Field, Pierce Brosnan, Lisa Jakub, Matthew Lawrence, Mara Wilson |
Notting Hill |
Julia Roberts, Hugh Grant |
Pocahontas |
Irene Bedard, Mel Gibson, Judy Kuhn, David Ogden Stiers, Russell Means, Christian Bale |
Pretty Woman |
Richard Gere, Julia Roberts |
Robin Hood: Prince of Thieves |
Kevin Costner, Morgan Freeman, Mary Elizabeth Mastrantonio, Christian Slater, Alan Rickman |
Saving Private Ryan |
Tom Hanks, Matt Damon, Tom Sizemore, Edward Burns, Barry Pepper, Adam Goldberg, Vin Diesel, Giovanni Ribisi, Jeremy Davies |
Seven |
Brad Pitt, Morgan Freeman, Gwyneth Paltrow |
Speed |
Keanu Reeves, Sandra Bullock, Dennis Hopper |
Star Wars: Episode I - The Phantom Menace |
Liam Neeson, Ewan McGregor, Natalie Portman, Jake Lloyd |
Tarzan |
Johnny Weissmuller, Maureen O’Sullivan |
Terminator 2: Judgment Day |
Arnold Schwarzenegger, Linda Hamilton, Edward Furlong, Robert Patrick |
The Bodyguard |
Kevin Costner, Whitney Houston |
The Flintstones |
John Goodman, Elizabeth Perkins, Rick Moranis, Rosie O’Donnell |
The Fugitive |
Harrison Ford, Tommy Lee Jones |
The Lion King |
Matthew Broderick, James Earl Jones, Jeremy Irons, Moira Kelly, Nathan Lane, Ernie Sabella, Rowan Atkinson, Whoopi Goldberg |
The Lost World: Jurassic Park |
Jeff Goldblum, Julianne Moore, Pete Postlethwaite |
The Mask |
Jim Carrey, Cameron Diaz |
The Matrix |
Keanu Reeves, Laurence Fishburne, Carrie-Anne Moss |
The Mummy |
Brendan Fraser, Rachel Weisz, John Hannah, Arnold Vosloo |
The Rock |
Sean Connery, Nicolas Cage, Ed Harris |
The Sixth Sense |
Bruce Willis, Haley Joel Osment, Toni Collette, Olivia Williams |
The World Is Not Enough |
Pierce Brosnan, Sophie Marceau, Denise Richards, Robert Carlyle |
There’s Something About Mary |
Cameron Diaz, Ben Stiller, Matt Dillon |
Titanic |
Leonardo DiCaprio, Kate Winslet |
Tomorrow Never Dies |
Pierce Brosnan, Michelle Yeoh, Jonathan Pryce, Teri Hatcher |
Toy Story |
Tom Hanks, Tim Allen |
Toy Story 2 |
Tom Hanks, Tim Allen, Joan Cusack |
True Lies |
Arnold Schwarzenegger, Jamie Lee Curtis |
Twister |
Helen Hunt, Bill Paxton |
Creating a scenario list from a CSV
The ScenarioList method from_csv(‘<filepath>.csv’) creates a list of scenarios from a CSV file. The method reads the CSV file and creates a scenario for each row in the file, with the keys as the column names and the values as the row values.
For example, say we have a CSV file containing the following data:
message,user,source,date
I can't log in...,Alice,Customer support,2022-01-01
I need help with my bill...,Bob,Phone,2022-01-02
I have a safety concern...,Charlie,Email,2022-01-03
I need help with a product...,David,Chat,2022-01-04
We can create a list of scenarios from the CSV file:
from edsl import ScenarioList
scenariolist = ScenarioList.from_csv("<filepath>.csv")
scenariolist
This will return a scenario for each row:
Message |
User |
Source |
Date |
---|---|---|---|
I can’t log in… |
Alice |
Customer support |
2022-01-01 |
I need help with my bill… |
Bob |
Phone |
2022-01-02 |
I have a safety concern… |
Charlie |
2022-01-03 |
|
I need help with a product… |
David |
Chat |
2022-01-04 |
If the scenario keys are not valid Python identifiers, we can use the give_valid_names() method to convert them to valid identifiers.
For example, our CSV file might contain a header row that is question texts:
"What is the message?","Who is the user?","What is the source?","What is the date?"
"I can't log in...","Alice","Customer support","2022-01-01"
"I need help with my bill...","Bob","Phone","2022-01-02"
"I have a safety concern...","Charlie","Email","2022-01-03"
"I need help with a product...","David","Chat","2022-01-04"
We can create a list of scenarios from the CSV file:
from edsl import ScenarioList
scenariolist = ScenarioList.from_csv("<filepath>.csv")
scenariolist = scenariolist.give_valid_names()
scenariolist
This will return scenarios with non-Pythonic identifiers:
What is the message? |
Who is the user? |
What is the source? |
What is the date? |
---|---|---|---|
I can’t log in… |
Alice |
Customer support |
2022-01-01 |
I need help with my bill… |
Bob |
Phone |
2022-01-02 |
I have a safety concern… |
Charlie |
2022-01-03 |
|
I need help with a product… |
David |
Chat |
2022-01-04 |
We can then use the give_valid_names() method to convert the keys to valid identifiers:
scenariolist.give_valid_names()
scenariolist
This will return scenarios with valid identifiers (removing stop words and using underscores):
message |
user |
source |
date |
---|---|---|---|
I can’t log in… |
Alice |
Customer support |
2022-01-01 |
I need help with my bill… |
Bob |
Phone |
2022-01-02 |
I have a safety concern… |
Charlie |
2022-01-03 |
|
I need help with a product… |
David |
Chat |
2022-01-04 |
Methods for un/pivoting and grouping scenarios
There are a variety of methods for modifying scenarios and scenario lists.
Unpivoting a scenario list
The ScenarioList method unpivot() can be used to unpivot a scenario list based on one or more specified identifiers. It takes a list of id_vars which are the names of the key/value pairs to keep in each new scenario, and a list of value_vars which are the names of the key/value pairs to unpivot.
For example, say we have a scenario list for the above CSV file:
from edsl import ScenarioList
scenariolist = ScenarioList.from_csv("<filepath>.csv")
scenariolist
We can call the unpivot the scenario list:
scenariolist.unpivot(id_vars = ["user"], value_vars = ["source", "date", "message"])
scenariolist
This will return a list of scenarios with the source, date, and message key/value pairs unpivoted:
user |
variable |
value |
---|---|---|
Alice |
source |
Customer support |
Alice |
date |
2022-01-01 |
Alice |
message |
I can’t log in… |
Bob |
source |
Phone |
Bob |
date |
2022-01-02 |
Bob |
message |
I need help with my bill… |
Charlie |
source |
|
Charlie |
date |
2022-01-03 |
Charlie |
message |
I have a safety concern… |
David |
source |
Chat |
David |
date |
2022-01-04 |
David |
message |
I need help with a product… |
Pivoting a scenario list
We can call the pivot() method to reverse the unpivot operation:
scenariolist.pivot(id_vars = ["user"], var_name="variable", value_name="value")
scenariolist
This will return a list of scenarios with the source, date, and message key/value pairs pivoted back to their original form:
user |
source |
date |
message |
---|---|---|---|
Alice |
Customer support |
2022-01-01 |
I can’t log in… |
Bob |
Phone |
2022-01-02 |
I need help with my bill… |
Charlie |
2022-01-03 |
I have a safety concern… |
|
David |
Chat |
2022-01-04 |
I need help with a product… |
Grouping scenarios
The group_by() method can be used to group scenarios by one or more specified keys and apply a function to the values of the specified variables.
Example usage:
from edsl import ScenarioList
def avg_sum(a, b):
return {'avg_a': sum(a) / len(a), 'sum_b': sum(b)}
scenariolist = ScenarioList([
Scenario({'group': 'A', 'year': 2020, 'a': 10, 'b': 20}),
Scenario({'group': 'A', 'year': 2021, 'a': 15, 'b': 25}),
Scenario({'group': 'B', 'year': 2020, 'a': 12, 'b': 22}),
Scenario({'group': 'B', 'year': 2021, 'a': 17, 'b': 27})
])
scenariolist.group_by(id_vars=['group'], variables=['a', 'b'], func=avg_sum)
This will return a list of scenarios with the a and b key/value pairs grouped by the group key and the avg_a and sum_b key/value pairs calculated by the avg_sum function:
group |
avg_a |
sum_b |
---|---|---|
A |
12.5 |
45 |
B |
14.5 |
49 |
Data labeling tasks
Scenarios are particularly useful for conducting data labeling or data coding tasks, where the task can be designed as a survey of questions about each piece of data in a dataset.
For example, say we have a dataset of text messages that we want to sort by topic. We can perform this task by using a language model to answer questions such as “What is the primary topic of this message: {{ message }}?” or “Does this message mention a safety issue? {{ message }}”, where each text message is inserted in the message placeholder of the question text.
Here we use scenarios to conduct the task:
from edsl import QuestionMultipleChoice, Survey, Scenario
# Create a question with that takes a parameter
q1 = QuestionMultipleChoice(
question_name = "topic",
question_text = "What is the topic of this message: {{ message }}?",
question_options = ["Safety", "Product support", "Billing", "Login issue", "Other"]
)
q2 = QuestionMultipleChoice(
question_name = "safety",
question_text = "Does this message mention a safety issue? {{ message }}?",
question_options = ["Yes", "No", "Unclear"]
)
# Create a list of scenarios for the parameter
messages = [
"I can't log in...",
"I need help with my bill...",
"I have a safety concern...",
"I need help with a product..."
]
scenarios = [Scenario({"message": message}) for message in messages]
# Create a survey with the question
survey = Survey(questions = [q1, q2])
# Run the survey with the scenarios
results = survey.by(scenarios).run()
We can then analyze the results to see how the agent answered the questions for each scenario:
results.select("message", "safety", "topic")
This will print a table of the scenarios and the answers to the questions for each scenario:
message |
safety |
topic |
---|---|---|
I can’t log in… |
No |
Login issue |
I need help with a product… |
No |
Product support |
I need help with my bill… |
No |
Billing |
I have a safety concern… |
Yes |
Safety |
Adding metadata
If we have metadata about the messages that we want to keep track of, we can add it to the scenarios as well. This will create additional columns for the metadata in the results dataset, but without the need to include it in our question texts. Here we modify the above example to use a dataset of messages with metadata. Note that the question texts are unchanged:
from edsl import QuestionMultipleChoice, Survey, ScenarioList, Scenario
# Create a question with a parameter
q1 = QuestionMultipleChoice(
question_name = "topic",
question_text = "What is the topic of this message: {{ message }}?",
question_options = ["Safety", "Product support", "Billing", "Login issue", "Other"]
)
q2 = QuestionMultipleChoice(
question_name = "safety",
question_text = "Does this message mention a safety issue? {{ message }}?",
question_options = ["Yes", "No", "Unclear"]
)
# Create scenarios for the sets of parameters
user_messages = [
{"message": "I can't log in...", "user": "Alice", "source": "Customer support", "date": "2022-01-01"},
{"message": "I need help with my bill...", "user": "Bob", "source": "Phone", "date": "2022-01-02"},
{"message": "I have a safety concern...", "user": "Charlie", "source": "Email", "date": "2022-01-03"},
{"message": "I need help with a product...", "user": "David", "source": "Chat", "date": "2022-01-04"}
]
scenarios = ScenarioList(
Scenario.from_dict(m) for m in user_messages
)
# Create a survey with the question
survey = Survey(questions = [q1, q2])
# Run the survey with the scenarios
results = survey.by(scenarios).run()
# Inspect the results
results.select("scenario.*", "answer.*")
We can see how the agent answered the questions for each scenario, together with the metadata that was not included in the question text:
user |
source |
message |
date |
topic |
safety |
---|---|---|---|---|---|
Alice |
Customer support |
I can’t log in… |
2022-01-01 |
Login issue |
No |
Bob |
Phone |
I need help with my bill… |
2022-01-02 |
Billing |
No |
Charlie |
I have a safety concern… |
2022-01-03 |
Safety |
Yes |
|
David |
Chat |
I need help with a product… |
2022-01-04 |
Product support |
No |
To learn more about accessing, analyzing and visualizing survey results, please see the Results section.
Slicing/chunking content into scenarios
We can use the Scenario method chunk() to slice a text scenario into a ScenarioList based on num_words or num_lines.
Example usage:
my_haiku = """
This is a long text.
Pages and pages, oh my!
I need to chunk it.
"""
text_scenario = Scenario({"my_text": my_haiku})
word_chunks_scenariolist = text_scenario.chunk(
"my_text",
num_words = 5, # use num_words or num_lines but not both
include_original = True, # optional
hash_original = True # optional
)
word_chunks_scenariolist
This will return:
my_text |
my_text_chunk |
my_text_original |
---|---|---|
This is a long text. |
0 |
4aec42eda32b7f32bde8be6a6bc11125 |
Pages and pages, oh my! |
1 |
4aec42eda32b7f32bde8be6a6bc11125 |
I need to chunk it. |
2 |
4aec42eda32b7f32bde8be6a6bc11125 |
Scenario class
A Scenario is a dictionary with a key/value to parameterize a question.
- class edsl.scenarios.Scenario.DisplayJSON(input_dict: dict)[source]
Bases:
object
Display a dictionary as JSON.
- class edsl.scenarios.Scenario.DisplayYAML(input_dict: dict)[source]
Bases:
object
Display a dictionary as YAML.
- class edsl.scenarios.Scenario.Scenario(data: dict | None = None, name: str | None = None)[source]
Bases:
Base
,UserDict
,ScenarioHtmlMixin
A Scenario is a dictionary of keys/values that can be used to parameterize questions.
- __init__(data: dict | None = None, name: str | None = None)[source]
Initialize a new Scenario.
- Parameters:
data – A dictionary of keys/values for parameterizing questions.
name – The name of the scenario.
- chunk(field, num_words: int | None = None, num_lines: int | None = None, include_original=False, hash_original=False) ScenarioList [source]
Split a field into chunks of a given size.
- Parameters:
field – The field to split.
num_words – The number of words in each chunk.
num_lines – The number of lines in each chunk.
include_original – Whether to include the original field in the new scenarios.
hash_original – Whether to hash the original field in the new scenarios.
If you specify include_original=True, the original field will be included in the new scenarios with an “_original” suffix.
Either num_words or num_lines must be specified, but not both.
The hash_original parameter is useful if you do not want to store the original text, but still want a unique identifier for it.
Example:
>>> s = Scenario({"text": "This is a test.\nThis is a test.\n\nThis is a test."}) >>> s.chunk("text", num_lines = 1) ScenarioList([Scenario({'text': 'This is a test.', 'text_chunk': 0}), Scenario({'text': 'This is a test.', 'text_chunk': 1}), Scenario({'text': '', 'text_chunk': 2}), Scenario({'text': 'This is a test.', 'text_chunk': 3})])
>>> s.chunk("text", num_words = 2) ScenarioList([Scenario({'text': 'This is', 'text_chunk': 0}), Scenario({'text': 'a test.', 'text_chunk': 1}), Scenario({'text': 'This is', 'text_chunk': 2}), Scenario({'text': 'a test.', 'text_chunk': 3}), Scenario({'text': 'This is', 'text_chunk': 4}), Scenario({'text': 'a test.', 'text_chunk': 5})])
>>> s = Scenario({"text": "Hello World"}) >>> s.chunk("text", num_words = 1, include_original = True) ScenarioList([Scenario({'text': 'Hello', 'text_chunk': 0, 'text_original': 'Hello World'}), Scenario({'text': 'World', 'text_chunk': 1, 'text_original': 'Hello World'})])
>>> s = Scenario({"text": "Hello World"}) >>> s.chunk("text", num_words = 1, include_original = True, hash_original = True) ScenarioList([Scenario({'text': 'Hello', 'text_chunk': 0, 'text_original': 'b10a8db164e0754105b7a99be72e3fe5'}), Scenario({'text': 'World', 'text_chunk': 1, 'text_original': 'b10a8db164e0754105b7a99be72e3fe5'})])
>>> s.chunk("text") Traceback (most recent call last): ... ValueError: You must specify either num_words or num_lines.
>>> s.chunk("text", num_words = 1, num_lines = 1) Traceback (most recent call last): ... ValueError: You must specify either num_words or num_lines, but not both.
- drop(list_of_keys: Collection[str]) Scenario [source]
Drop a subset of keys from a scenario.
- Parameters:
list_of_keys – The keys to drop.
Example:
>>> s = Scenario({"food": "wood chips", "drink": "water"}) >>> s.drop(["food"]) Scenario({'drink': 'water'})
- classmethod example(randomize: bool = False) Scenario [source]
Returns an example Scenario instance.
- Parameters:
randomize – If True, adds a random string to the value of the example key.
- classmethod from_dict(d: dict) Scenario [source]
Convert a dictionary to a scenario.
Example:
>>> Scenario.from_dict({"food": "wood chips"}) Scenario({'food': 'wood chips'})
- classmethod from_docx(docx_path: str) Scenario [source]
Creates a scenario from the text of a docx file.
- Parameters:
docx_path – The path to the docx file.
Example:
>>> from docx import Document >>> doc = Document() >>> _ = doc.add_heading("EDSL Survey") >>> _ = doc.add_paragraph("This is a test.") >>> doc.save("test.docx") >>> s = Scenario.from_docx("test.docx") >>> s Scenario({'file_path': 'test.docx', 'text': 'EDSL Survey\nThis is a test.'}) >>> import os; os.remove("test.docx")
- classmethod from_file(file_path: str, field_name: str) Scenario [source]
Creates a scenario from a file.
>>> import tempfile >>> with tempfile.NamedTemporaryFile(suffix=".txt", mode="w") as f: ... _ = f.write("This is a test.") ... _ = f.flush() ... s = Scenario.from_file(f.name, "file") >>> s Scenario({'file': FileStore(path='...', ...)})
- classmethod from_image(image_path: str, image_name: str | None = None) Scenario [source]
Creates a scenario with a base64 encoding of an image.
- Args:
image_path (str): Path to the image file.
- Returns:
Scenario: A new Scenario instance with image information.
- classmethod from_url(url: str, field_name: str | None = 'text') Scenario [source]
Creates a scenario from a URL.
- Parameters:
url – The URL to create the scenario from.
field_name – The field name to use for the text.
- property has_jinja_braces: bool[source]
Return whether the scenario has jinja braces. This matters for rendering.
>>> s = Scenario({"food": "I love {{wood chips}}"}) >>> s.has_jinja_braces True
- keep(list_of_keys: List[str]) Scenario [source]
Keep a subset of keys from a scenario.
- Parameters:
list_of_keys – The keys to keep.
Example:
>>> s = Scenario({"food": "wood chips", "drink": "water"}) >>> s.keep(["food"]) Scenario({'food': 'wood chips'})
- new_column_names(new_names: List[str]) Scenario [source]
Rename the keys of a scenario.
>>> s = Scenario({"food": "wood chips"}) >>> s.new_column_names(["food_preference"]) Scenario({'food_preference': 'wood chips'})
- rename(old_name_or_replacement_dict: str | dict[str, str], new_name: str | None = None) Scenario [source]
Rename the keys of a scenario.
- Parameters:
old_name_or_replacement_dict – A dictionary of old keys to new keys OR a string of the old key.
new_name – The new name of the key.
Example:
>>> s = Scenario({"food": "wood chips"}) >>> s.rename({"food": "food_preference"}) Scenario({'food_preference': 'wood chips'})
>>> s = Scenario({"food": "wood chips"}) >>> s.rename("food", "snack") Scenario({'snack': 'wood chips'})
- replicate(n: int) ScenarioList [source]
Replicate a scenario n times to return a ScenarioList.
- Parameters:
n – The number of times to replicate the scenario.
Example: >>> s = Scenario({“food”: “wood chips”}) >>> s.replicate(2) ScenarioList([Scenario({‘food’: ‘wood chips’}), Scenario({‘food’: ‘wood chips’})])
- select(list_of_keys: Collection[str]) Scenario [source]
Select a subset of keys from a scenario.
- Parameters:
list_of_keys – The keys to select.
Example:
>>> s = Scenario({"food": "wood chips", "drink": "water"}) >>> s.select(["food"]) Scenario({'food': 'wood chips'})
- to_dataset() Dataset [source]
Convert a scenario to a dataset.
>>> s = Scenario({"food": "wood chips"}) >>> s.to_dataset() Dataset([{'key': ['food']}, {'value': ['wood chips']}])
ScenarioList class
A list of Scenarios to be used in a survey.
- class edsl.scenarios.ScenarioList.ScenarioList(data: list | None = None, codebook: dict[str, str] | None = None)[source]
Bases:
Base
,UserList
,ScenarioListMixin
Class for creating a list of scenarios to be used in a survey.
- __init__(data: list | None = None, codebook: dict[str, str] | None = None)[source]
Initialize the ScenarioList class.
- add_list(name: str, values: List[Any]) ScenarioList [source]
Add a list of values to a ScenarioList.
Example:
>>> s = ScenarioList([Scenario({'name': 'Alice'}), Scenario({'name': 'Bob'})]) >>> s.add_list('age', [30, 25]) ScenarioList([Scenario({'name': 'Alice', 'age': 30}), Scenario({'name': 'Bob', 'age': 25})])
- add_value(name: str, value: Any) ScenarioList [source]
Add a value to all scenarios in a ScenarioList.
Example:
>>> s = ScenarioList([Scenario({'name': 'Alice'}), Scenario({'name': 'Bob'})]) >>> s.add_value('age', 30) ScenarioList([Scenario({'name': 'Alice', 'age': 30}), Scenario({'name': 'Bob', 'age': 30})])
- chunk(field, num_words: int | None = None, num_lines: int | None = None, include_original=False, hash_original=False) ScenarioList [source]
Chunk the scenarios based on a field.
Example:
>>> s = ScenarioList([Scenario({'text': 'The quick brown fox jumps over the lazy dog.'})]) >>> s.chunk('text', num_words=3) ScenarioList([Scenario({'text': 'The quick brown', 'text_chunk': 0}), Scenario({'text': 'fox jumps over', 'text_chunk': 1}), Scenario({'text': 'the lazy dog.', 'text_chunk': 2})])
- concatenate(fields: List[str], separator: str = ';') ScenarioList [source]
Concatenate specified fields into a single field.
- Parameters:
fields – The fields to concatenate.
separator – The separator to use.
- Returns:
ScenarioList: A new ScenarioList with concatenated fields.
- Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 2, 'c': 3}), Scenario({'a': 4, 'b': 5, 'c': 6})]) >>> s.concatenate(['a', 'b', 'c']) ScenarioList([Scenario({'concat_a_b_c': '1;2;3'}), Scenario({'concat_a_b_c': '4;5;6'})])
- classmethod create_empty_scenario_list(n: int) ScenarioList [source]
Create an empty ScenarioList with n scenarios.
Example:
>>> ScenarioList.create_empty_scenario_list(3) ScenarioList([Scenario({}), Scenario({}), Scenario({})])
- drop(*fields: str) ScenarioList [source]
Drop fields from the scenarios.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})]) >>> s.drop('a') ScenarioList([Scenario({'b': 1}), Scenario({'b': 2})])
- duplicate() ScenarioList [source]
Return a copy of the ScenarioList.
>>> sl = ScenarioList.example() >>> sl_copy = sl.duplicate() >>> sl == sl_copy True >>> sl is sl_copy False
- classmethod example(randomize: bool = False) ScenarioList [source]
Return an example ScenarioList instance.
- Params randomize:
If True, use Scenario’s randomize method to randomize the values.
- expand(expand_field: str, number_field: bool = False) ScenarioList [source]
Expand the ScenarioList by a field.
- Parameters:
expand_field – The field to expand.
number_field – Whether to add a field with the index of the value
Example:
>>> s = ScenarioList( [ Scenario({'a':1, 'b':[1,2]}) ] ) >>> s.expand('b') ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})]) >>> s.expand('b', number_field=True) ScenarioList([Scenario({'a': 1, 'b': 1, 'b_number': 1}), Scenario({'a': 1, 'b': 2, 'b_number': 2})])
- filter(expression: str) ScenarioList [source]
Filter a list of scenarios based on an expression.
- Parameters:
expression – The expression to filter by.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})]) >>> s.filter("b == 2") ScenarioList([Scenario({'a': 1, 'b': 2})])
- classmethod from_csv(source: str | 'ParseResult') ScenarioList [source]
Create a ScenarioList from a CSV file or URL.
- classmethod from_delimited_file(source: str | 'ParseResult', delimiter: str = ',') ScenarioList [source]
Create a ScenarioList from a delimited file (CSV/TSV) or URL.
- classmethod from_dict(data) ScenarioList [source]
Create a ScenarioList from a dictionary.
- classmethod from_excel(filename: str, sheet_name: str | None = None) ScenarioList [source]
Create a ScenarioList from an Excel file.
If the Excel file contains multiple sheets and no sheet_name is provided, the method will print the available sheets and require the user to specify one.
Example:
>>> import tempfile >>> import os >>> import pandas as pd >>> with tempfile.NamedTemporaryFile(delete=False, suffix='.xlsx') as f: ... df1 = pd.DataFrame({ ... 'name': ['Alice', 'Bob'], ... 'age': [30, 25], ... 'location': ['New York', 'Los Angeles'] ... }) ... df2 = pd.DataFrame({ ... 'name': ['Charlie', 'David'], ... 'age': [35, 40], ... 'location': ['Chicago', 'Boston'] ... }) ... with pd.ExcelWriter(f.name) as writer: ... df1.to_excel(writer, sheet_name='Sheet1', index=False) ... df2.to_excel(writer, sheet_name='Sheet2', index=False) ... temp_filename = f.name >>> scenario_list = ScenarioList.from_excel(temp_filename, sheet_name='Sheet1') >>> len(scenario_list) 2 >>> scenario_list[0]['name'] 'Alice' >>> scenario_list = ScenarioList.from_excel(temp_filename) # Should raise an error and list sheets Traceback (most recent call last): ... ValueError: Please provide a sheet name to load data from.
- classmethod from_google_doc(url: str) ScenarioList [source]
Create a ScenarioList from a Google Doc.
This method downloads the Google Doc as a Word file (.docx), saves it to a temporary file, and then reads it using the from_docx class method.
- Args:
url (str): The URL to the Google Doc.
- Returns:
ScenarioList: An instance of the ScenarioList class.
- classmethod from_google_sheet(url: str, sheet_name: str = None) ScenarioList [source]
Create a ScenarioList from a Google Sheet.
This method downloads the Google Sheet as an Excel file, saves it to a temporary file, and then reads it using the from_excel class method.
- Args:
url (str): The URL to the Google Sheet. sheet_name (str, optional): The name of the sheet to load. If None, the method will behave
the same as from_excel regarding multiple sheets.
- Returns:
ScenarioList: An instance of the ScenarioList class.
- classmethod from_list(name: str, values: list, func: Callable | None = None) ScenarioList [source]
Create a ScenarioList from a list of values.
- Parameters:
name – The name of the field.
values – The list of values.
func – An optional function to apply to the values.
Example:
>>> ScenarioList.from_list('name', ['Alice', 'Bob']) ScenarioList([Scenario({'name': 'Alice'}), Scenario({'name': 'Bob'})])
- classmethod from_list_of_tuples(*names: str, values: List[Tuple]) ScenarioList [source]
- classmethod from_nested_dict(data: dict) ScenarioList [source]
Create a ScenarioList from a nested dictionary.
>>> data = {"headline": ["Armistice Signed, War Over: Celebrations Erupt Across City"], "date": ["1918-11-11"], "author": ["Jane Smith"]} >>> ScenarioList.from_nested_dict(data) ScenarioList([Scenario({'headline': 'Armistice Signed, War Over: Celebrations Erupt Across City', 'date': '1918-11-11', 'author': 'Jane Smith'})])
- classmethod from_pandas(df) ScenarioList [source]
Create a ScenarioList from a pandas DataFrame.
Example:
>>> import pandas as pd >>> df = pd.DataFrame({'name': ['Alice', 'Bob'], 'age': [30, 25], 'location': ['New York', 'Los Angeles']}) >>> ScenarioList.from_pandas(df) ScenarioList([Scenario({'name': 'Alice', 'age': 30, 'location': 'New York'}), Scenario({'name': 'Bob', 'age': 25, 'location': 'Los Angeles'})])
- classmethod from_sqlite(filepath: str, table: str)[source]
Create a ScenarioList from a SQLite database.
- classmethod from_tsv(source: str | 'ParseResult') ScenarioList [source]
Create a ScenarioList from a TSV file or URL.
- from_urls(urls: list[str], field_name: str | None = 'text') ScenarioList [source]
Create a ScenarioList from a list of URLs.
- Parameters:
urls – A list of URLs.
field_name – The name of the field to store the text from the URLs.
- classmethod from_wikipedia(url: str, table_index: int = 0)[source]
Extracts a table from a Wikipedia page.
- Parameters:
url (str): The URL of the Wikipedia page. table_index (int): The index of the table to extract (default is 0).
- Returns:
pd.DataFrame: A DataFrame containing the extracted table.
# # Example usage # url = “https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)” # df = from_wikipedia(url, 0)
# if not df.empty: # print(df.head()) # else: # print(“Failed to extract table.”)
- classmethod gen(scenario_dicts_list: List[dict]) ScenarioList [source]
Create a ScenarioList from a list of dictionaries.
Example:
>>> ScenarioList.gen([{'name': 'Alice'}, {'name': 'Bob'}]) ScenarioList([Scenario({'name': 'Alice'}), Scenario({'name': 'Bob'})])
- give_valid_names(existing_codebook: dict = None) ScenarioList [source]
Give valid names to the scenario keys, using an existing codebook if provided.
- Args:
- existing_codebook (dict, optional): Existing mapping of original keys to valid names.
Defaults to None.
- Returns:
ScenarioList: A new ScenarioList with valid variable names and updated codebook.
>>> s = ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})]) >>> s.give_valid_names() ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})]) >>> s = ScenarioList([Scenario({'are you there John?': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})]) >>> s.give_valid_names() ScenarioList([Scenario({'john': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})]) >>> s.give_valid_names({'are you there John?': 'custom_name'}) ScenarioList([Scenario({'custom_name': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})])
- group_by(id_vars: List[str], variables: List[str], func: Callable) ScenarioList [source]
Group the ScenarioList by id_vars and apply a function to the specified variables.
- Parameters:
id_vars – Fields to use as identifier variables
variables – Fields to group and aggregate
func – Function to apply to the grouped variables
Returns: ScenarioList: A new ScenarioList with the grouped and aggregated results
Example: >>> def avg_sum(a, b): … return {‘avg_a’: sum(a) / len(a), ‘sum_b’: sum(b)} >>> s = ScenarioList([ … Scenario({‘group’: ‘A’, ‘year’: 2020, ‘a’: 10, ‘b’: 20}), … Scenario({‘group’: ‘A’, ‘year’: 2021, ‘a’: 15, ‘b’: 25}), … Scenario({‘group’: ‘B’, ‘year’: 2020, ‘a’: 12, ‘b’: 22}), … Scenario({‘group’: ‘B’, ‘year’: 2021, ‘a’: 17, ‘b’: 27}) … ]) >>> s.group_by(id_vars=[‘group’], variables=[‘a’, ‘b’], func=avg_sum) ScenarioList([Scenario({‘group’: ‘A’, ‘avg_a’: 12.5, ‘sum_b’: 45}), Scenario({‘group’: ‘B’, ‘avg_a’: 14.5, ‘sum_b’: 49})])
- keep(*fields: str) ScenarioList [source]
Keep only the specified fields in the scenarios.
- Parameters:
fields – The fields to keep.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})]) >>> s.keep('a') ScenarioList([Scenario({'a': 1}), Scenario({'a': 1})])
- left_join(other: ScenarioList, by: str | list[str]) ScenarioList [source]
Perform a left join with another ScenarioList, following SQL join semantics.
- Args:
other: The ScenarioList to join with by: String or list of strings representing the key(s) to join on. Cannot be empty.
>>> s1 = ScenarioList([Scenario({'name': 'Alice', 'age': 30}), Scenario({'name': 'Bob', 'age': 25})]) >>> s2 = ScenarioList([Scenario({'name': 'Alice', 'location': 'New York'}), Scenario({'name': 'Charlie', 'location': 'Los Angeles'})]) >>> s3 = s1.left_join(s2, 'name') >>> s3 == ScenarioList([Scenario({'age': 30, 'location': 'New York', 'name': 'Alice'}), Scenario({'age': 25, 'location': None, 'name': 'Bob'})]) True
- mutate(new_var_string: str, functions_dict: dict[str, Callable] | None = None) ScenarioList [source]
Return a new ScenarioList with a new variable added.
- Parameters:
new_var_string – A string with the new variable assignment.
functions_dict – A dictionary of functions to use in the assignment.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})]) >>> s.mutate("c = a + b") ScenarioList([Scenario({'a': 1, 'b': 2, 'c': 3}), Scenario({'a': 1, 'b': 1, 'c': 2})])
- num_observations()[source]
Return the number of observations in the dataset.
>>> from edsl.results.Results import Results >>> Results.example().num_observations() 4
- order_by(*fields: str, reverse: bool = False) ScenarioList [source]
Order the scenarios by one or more fields.
- Parameters:
fields – The fields to order by.
reverse – Whether to reverse the order.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})]) >>> s.order_by('b', 'a') ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})])
- property parameters: set[source]
Return the set of parameters in the ScenarioList
Example:
>>> s = ScenarioList([Scenario({'a': 1}), Scenario({'b': 2})]) >>> s.parameters == {'a', 'b'} True
- pivot(id_vars: List[str] = None, var_name='variable', value_name='value') ScenarioList [source]
Pivot the ScenarioList from long to wide format.
Parameters: id_vars (list): Fields to use as identifier variables var_name (str): Name of the variable column (default: ‘variable’) value_name (str): Name of the value column (default: ‘value’)
Example: >>> s = ScenarioList([ … Scenario({‘id’: 1, ‘year’: 2020, ‘variable’: ‘a’, ‘value’: 10}), … Scenario({‘id’: 1, ‘year’: 2020, ‘variable’: ‘b’, ‘value’: 20}), … Scenario({‘id’: 2, ‘year’: 2021, ‘variable’: ‘a’, ‘value’: 15}), … Scenario({‘id’: 2, ‘year’: 2021, ‘variable’: ‘b’, ‘value’: 25}) … ]) >>> s.pivot(id_vars=[‘id’, ‘year’]) ScenarioList([Scenario({‘id’: 1, ‘year’: 2020, ‘a’: 10, ‘b’: 20}), Scenario({‘id’: 2, ‘year’: 2021, ‘a’: 15, ‘b’: 25})])
- print_long()[source]
Print the results in a long format. >>> from edsl.results import Results >>> r = Results.example() >>> r.select(‘how_feeling’).print_long() answer.how_feeling: OK answer.how_feeling: Great answer.how_feeling: Terrible answer.how_feeling: OK
- relevant_columns(data_type: str | None = None, remove_prefix=False) list [source]
Return the set of keys that are present in the dataset.
- Parameters:
data_type – The data type to filter by.
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results.Dataset import Dataset >>> d = Dataset([{'a.b':[1,2,3,4]}]) >>> d.relevant_columns() ['a.b']
>>> d.relevant_columns(remove_prefix=True) ['b']
>>> d = Dataset([{'a':[1,2,3,4]}, {'b':[5,6,7,8]}]) >>> d.relevant_columns() ['a', 'b']
>>> from edsl.results import Results; Results.example().select('how_feeling', 'how_feeling_yesterday').relevant_columns() ['answer.how_feeling', 'answer.how_feeling_yesterday']
>>> from edsl.results import Results >>> sorted(Results.example().select().relevant_columns(data_type = "model")) ['model.frequency_penalty', ...]
>>> Results.example().relevant_columns(data_type = "flimflam") Traceback (most recent call last): ... ValueError: No columns found for data type: flimflam. Available data types are: ...
- rename(replacement_dict: dict) ScenarioList [source]
Rename the fields in the scenarios.
- Parameters:
replacement_dict – A dictionary with the old names as keys and the new names as values.
Example:
>>> s = ScenarioList([Scenario({'name': 'Alice', 'age': 30}), Scenario({'name': 'Bob', 'age': 25})]) >>> s.rename({'name': 'first_name', 'age': 'years'}) ScenarioList([Scenario({'first_name': 'Alice', 'years': 30}), Scenario({'first_name': 'Bob', 'years': 25})])
- reorder_keys(new_order: List[str]) ScenarioList [source]
Reorder the keys in the scenarios.
- Parameters:
new_order – The new order of the keys.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 3, 'b': 4})]) >>> s.reorder_keys(['b', 'a']) ScenarioList([Scenario({'b': 2, 'a': 1}), Scenario({'b': 4, 'a': 3})]) >>> s.reorder_keys(['a', 'b', 'c']) Traceback (most recent call last): ... AssertionError
- sample(n: int, seed: str | None = None) ScenarioList [source]
Return a random sample from the ScenarioList
>>> s = ScenarioList.from_list("a", [1,2,3,4,5,6]) >>> s.sample(3, seed = "edsl") ScenarioList([Scenario({'a': 2}), Scenario({'a': 1}), Scenario({'a': 3})])
- select(*fields: str) ScenarioList [source]
Selects scenarios with only the references fields.
- Parameters:
fields – The fields to select.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2})]) >>> s.select('a') ScenarioList([Scenario({'a': 1}), Scenario({'a': 1})])
- sem_filter(language_predicate: str) ScenarioList [source]
Filter the ScenarioList based on a language predicate.
- Parameters:
language_predicate – The language predicate to use.
Inspired by: @misc{patel2024semanticoperators,
title={Semantic Operators: A Declarative Model for Rich, AI-based Analytics Over Text Data}, author={Liana Patel and Siddharth Jha and Parth Asawa and Melissa Pan and Carlos Guestrin and Matei Zaharia}, year={2024}, eprint={2407.11418}, archivePrefix={arXiv}, primaryClass={cs.DB}, url={https://arxiv.org/abs/2407.11418}, }
- shuffle(seed: str | None = None) ScenarioList [source]
Shuffle the ScenarioList.
>>> s = ScenarioList.from_list("a", [1,2,3,4]) >>> s.shuffle(seed = "1234") ScenarioList([Scenario({'a': 1}), Scenario({'a': 4}), Scenario({'a': 3}), Scenario({'a': 2})])
- sql(query: str, transpose: bool = None, transpose_by: str = None, remove_prefix: bool = True) pd.DataFrame | str [source]
Execute a SQL query and return the results as a DataFrame.
- Args:
query: The SQL query to execute shape: The shape of the data in the database (wide or long) remove_prefix: Whether to remove the prefix from the column names transpose: Whether to transpose the DataFrame transpose_by: The column to use as the index when transposing csv: Whether to return the DataFrame as a CSV string to_list: Whether to return the results as a list to_latex: Whether to return the results as LaTeX filename: Optional filename to save the results to
- Returns:
DataFrame, CSV string, list, or LaTeX string depending on parameters
- table(*fields: str, tablefmt: Literal['plain', 'simple', 'github', 'grid', 'fancy_grid', 'pipe', 'orgtbl', 'rst', 'mediawiki', 'html', 'latex', 'latex_raw', 'latex_booktabs', 'tsv'] | None = None, pretty_labels: dict[str, str] | None = None) str [source]
Return the ScenarioList as a table.
- tally(*fields: str | None, top_n: int | None = None, output='Dataset') dict | Dataset [source]
Tally the values of a field or perform a cross-tab of multiple fields.
- Parameters:
fields – The field(s) to tally, multiple fields for cross-tabulation.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').tally('answer.how_feeling', output = "dict") {'OK': 2, 'Great': 1, 'Terrible': 1} >>> from edsl.results.Dataset import Dataset >>> expected = Dataset([{'answer.how_feeling': ['OK', 'Great', 'Terrible']}, {'count': [2, 1, 1]}]) >>> r.select('how_feeling').tally('answer.how_feeling', output = "Dataset") == expected True
- times(other: ScenarioList) ScenarioList [source]
Takes the cross product of two ScenarioLists.
Example:
>>> s1 = ScenarioList([Scenario({'a': 1}), Scenario({'a': 2})]) >>> s2 = ScenarioList([Scenario({'b': 1}), Scenario({'b': 2})]) >>> s1.times(s2) ScenarioList([Scenario({'a': 1, 'b': 1}), Scenario({'a': 1, 'b': 2}), Scenario({'a': 2, 'b': 1}), Scenario({'a': 2, 'b': 2})])
- to(survey: 'Survey' | 'QuestionBase') Jobs [source]
Create a Jobs object from a ScenarioList and a Survey object.
- Parameters:
survey – The Survey object to use for the Jobs object.
Example: >>> from edsl import Survey >>> from edsl.jobs.Jobs import Jobs >>> from edsl import ScenarioList >>> isinstance(ScenarioList.example().to(Survey.example()), Jobs) True
- to_agent_list(remove_prefix: bool = True)[source]
Convert the results to a list of dictionaries, one per agent.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').to_agent_list() AgentList([Agent(traits = {'how_feeling': 'OK'}), Agent(traits = {'how_feeling': 'Great'}), Agent(traits = {'how_feeling': 'Terrible'}), Agent(traits = {'how_feeling': 'OK'})])
- to_csv(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None) FileStore | None [source]
Export the results to a FileStore instance containing CSV data.
- to_dataset() Dataset [source]
Convert the ScenarioList to a Dataset.
>>> s = ScenarioList.from_list("a", [1,2,3]) >>> s.to_dataset() Dataset([{'a': [1, 2, 3]}]) >>> s = ScenarioList.from_list("a", [1,2,3]).add_list("b", [4,5,6]) >>> s.to_dataset() Dataset([{'a': [1, 2, 3]}, {'b': [4, 5, 6]}])
- to_dict(sort: bool = False, add_edsl_version: bool = True) dict [source]
>>> s = ScenarioList([Scenario({'food': 'wood chips'}), Scenario({'food': 'wood-fired pizza'})]) >>> s.to_dict() {'scenarios': [{'food': 'wood chips', 'edsl_version': '...', 'edsl_class_name': 'Scenario'}, {'food': 'wood-fired pizza', 'edsl_version': '...', 'edsl_class_name': 'Scenario'}], 'edsl_version': '...', 'edsl_class_name': 'ScenarioList'}
- to_dicts(remove_prefix: bool = True) list[dict] [source]
Convert the results to a list of dictionaries.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').to_dicts() [{'how_feeling': 'OK'}, {'how_feeling': 'Great'}, {'how_feeling': 'Terrible'}, {'how_feeling': 'OK'}]
- to_excel(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None, sheet_name: str | None = None) FileStore | None [source]
Export the results to a FileStore instance containing Excel data.
- to_jsonl(filename: str | None = None) FileStore | None [source]
Export the results to a FileStore instance containing JSONL data.
- to_key_value(field: str, value=None) dict | set [source]
Return the set of values in the field.
- Parameters:
field – The field to extract values from.
value – An optional field to use as the value in the key-value pair.
Example:
>>> s = ScenarioList([Scenario({'name': 'Alice'}), Scenario({'name': 'Bob'})]) >>> s.to_key_value('name') == {'Alice', 'Bob'} True
- to_list(flatten=False, remove_none=False, unzipped=False) list[list] [source]
Convert the results to a list of lists.
- Parameters:
flatten – Whether to flatten the list of lists.
remove_none – Whether to remove None values from the list.
>>> from edsl.results import Results >>> Results.example().select('how_feeling', 'how_feeling_yesterday') Dataset([{'answer.how_feeling': ['OK', 'Great', 'Terrible', 'OK']}, {'answer.how_feeling_yesterday': ['Great', 'Good', 'OK', 'Terrible']}])
>>> Results.example().select('how_feeling', 'how_feeling_yesterday').to_list() [('OK', 'Great'), ('Great', 'Good'), ('Terrible', 'OK'), ('OK', 'Terrible')]
>>> r = Results.example() >>> r.select('how_feeling').to_list() ['OK', 'Great', 'Terrible', 'OK']
>>> from edsl.results.Dataset import Dataset >>> Dataset([{'a.b': [[1, 9], 2, 3, 4]}]).select('a.b').to_list(flatten = True) [1, 9, 2, 3, 4]
>>> from edsl.results.Dataset import Dataset >>> Dataset([{'a.b': [[1, 9], 2, 3, 4]}, {'c': [6, 2, 3, 4]}]).select('a.b', 'c').to_list(flatten = True) Traceback (most recent call last): ... ValueError: Cannot flatten a list of lists when there are multiple columns selected.
- to_pandas(remove_prefix: bool = False, lists_as_strings=False) DataFrame [source]
Convert the results to a pandas DataFrame, ensuring that lists remain as lists.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
- to_polars(remove_prefix: bool = False, lists_as_strings=False) pl.DataFrame [source]
Convert the results to a Polars DataFrame.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
- to_scenario_list(remove_prefix: bool = True) list[dict] [source]
Convert the results to a list of dictionaries, one per scenario.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').to_scenario_list() ScenarioList([Scenario({'how_feeling': 'OK'}), Scenario({'how_feeling': 'Great'}), Scenario({'how_feeling': 'Terrible'}), Scenario({'how_feeling': 'OK'})])
- to_sqlite(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None, table_name: str = 'results', if_exists: str = 'replace') FileStore | None [source]
Export the results to a SQLite database file.
- transform(field: str, func: Callable, new_name: str | None = None) ScenarioList [source]
Transform a field using a function.
- Parameters:
field – The field to transform.
func – The function to apply to the field.
new_name – An optional new name for the transformed field.
>>> s = ScenarioList([Scenario({'a': 1, 'b': 2}), Scenario({'a': 1, 'b': 1})]) >>> s.transform('b', lambda x: x + 1) ScenarioList([Scenario({'a': 1, 'b': 3}), Scenario({'a': 1, 'b': 2})])
- tree(node_list: List[str] | None = None) str [source]
Return the ScenarioList as a tree.
- Parameters:
node_list – The list of nodes to include in the tree.
- unique() ScenarioList [source]
Return a list of unique scenarios.
>>> s = ScenarioList([Scenario({'a': 1}), Scenario({'a': 1}), Scenario({'a': 2})]) >>> s.unique() ScenarioList([Scenario({'a': 1}), Scenario({'a': 2})])
- unpack(field: str, new_names: List[str] | None = None, keep_original=True) ScenarioList [source]
Unpack a field into multiple fields.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': [2, True]}), Scenario({'a': 3, 'b': [3, False]})]) >>> s.unpack('b') ScenarioList([Scenario({'a': 1, 'b': [2, True], 'b_0': 2, 'b_1': True}), Scenario({'a': 3, 'b': [3, False], 'b_0': 3, 'b_1': False})]) >>> s.unpack('b', new_names=['c', 'd'], keep_original=False) ScenarioList([Scenario({'a': 1, 'c': 2, 'd': True}), Scenario({'a': 3, 'c': 3, 'd': False})])
- unpack_dict(field: str, prefix: str | None = None, drop_field: bool = False) ScenarioList [source]
Unpack a dictionary field into separate fields.
- Parameters:
field – The field to unpack.
prefix – An optional prefix to add to the new fields.
drop_field – Whether to drop the original field.
Example:
>>> s = ScenarioList([Scenario({'a': 1, 'b': {'c': 2, 'd': 3}})]) >>> s.unpack_dict('b') ScenarioList([Scenario({'a': 1, 'b': {'c': 2, 'd': 3}, 'c': 2, 'd': 3})]) >>> s.unpack_dict('b', prefix='new_') ScenarioList([Scenario({'a': 1, 'b': {'c': 2, 'd': 3}, 'new_c': 2, 'new_d': 3})])
- unpivot(id_vars: List[str] | None = None, value_vars: List[str] | None = None) ScenarioList [source]
Unpivot the ScenarioList, allowing for id variables to be specified.
Parameters: id_vars (list): Fields to use as identifier variables (kept in each entry) value_vars (list): Fields to unpivot. If None, all fields not in id_vars will be used.
Example: >>> s = ScenarioList([ … Scenario({‘id’: 1, ‘year’: 2020, ‘a’: 10, ‘b’: 20}), … Scenario({‘id’: 2, ‘year’: 2021, ‘a’: 15, ‘b’: 25}) … ]) >>> s.unpivot(id_vars=[‘id’, ‘year’], value_vars=[‘a’, ‘b’]) ScenarioList([Scenario({‘id’: 1, ‘year’: 2020, ‘variable’: ‘a’, ‘value’: 10}), Scenario({‘id’: 1, ‘year’: 2020, ‘variable’: ‘b’, ‘value’: 20}), Scenario({‘id’: 2, ‘year’: 2021, ‘variable’: ‘a’, ‘value’: 15}), Scenario({‘id’: 2, ‘year’: 2021, ‘variable’: ‘b’, ‘value’: 25})])
- class edsl.scenarios.ScenarioList.ScenarioListMixin[source]
Bases:
ScenarioListPdfMixin
,ScenarioListExportMixin
- num_observations()[source]
Return the number of observations in the dataset.
>>> from edsl.results.Results import Results >>> Results.example().num_observations() 4
- print_long()[source]
Print the results in a long format. >>> from edsl.results import Results >>> r = Results.example() >>> r.select(‘how_feeling’).print_long() answer.how_feeling: OK answer.how_feeling: Great answer.how_feeling: Terrible answer.how_feeling: OK
- relevant_columns(data_type: str | None = None, remove_prefix=False) list [source]
Return the set of keys that are present in the dataset.
- Parameters:
data_type – The data type to filter by.
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results.Dataset import Dataset >>> d = Dataset([{'a.b':[1,2,3,4]}]) >>> d.relevant_columns() ['a.b']
>>> d.relevant_columns(remove_prefix=True) ['b']
>>> d = Dataset([{'a':[1,2,3,4]}, {'b':[5,6,7,8]}]) >>> d.relevant_columns() ['a', 'b']
>>> from edsl.results import Results; Results.example().select('how_feeling', 'how_feeling_yesterday').relevant_columns() ['answer.how_feeling', 'answer.how_feeling_yesterday']
>>> from edsl.results import Results >>> sorted(Results.example().select().relevant_columns(data_type = "model")) ['model.frequency_penalty', ...]
>>> Results.example().relevant_columns(data_type = "flimflam") Traceback (most recent call last): ... ValueError: No columns found for data type: flimflam. Available data types are: ...
- sql(query: str, transpose: bool = None, transpose_by: str = None, remove_prefix: bool = True) pd.DataFrame | str [source]
Execute a SQL query and return the results as a DataFrame.
- Args:
query: The SQL query to execute shape: The shape of the data in the database (wide or long) remove_prefix: Whether to remove the prefix from the column names transpose: Whether to transpose the DataFrame transpose_by: The column to use as the index when transposing csv: Whether to return the DataFrame as a CSV string to_list: Whether to return the results as a list to_latex: Whether to return the results as LaTeX filename: Optional filename to save the results to
- Returns:
DataFrame, CSV string, list, or LaTeX string depending on parameters
- tally(*fields: str | None, top_n: int | None = None, output='Dataset') dict | Dataset [source]
Tally the values of a field or perform a cross-tab of multiple fields.
- Parameters:
fields – The field(s) to tally, multiple fields for cross-tabulation.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').tally('answer.how_feeling', output = "dict") {'OK': 2, 'Great': 1, 'Terrible': 1} >>> from edsl.results.Dataset import Dataset >>> expected = Dataset([{'answer.how_feeling': ['OK', 'Great', 'Terrible']}, {'count': [2, 1, 1]}]) >>> r.select('how_feeling').tally('answer.how_feeling', output = "Dataset") == expected True
- to_agent_list(remove_prefix: bool = True)[source]
Convert the results to a list of dictionaries, one per agent.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').to_agent_list() AgentList([Agent(traits = {'how_feeling': 'OK'}), Agent(traits = {'how_feeling': 'Great'}), Agent(traits = {'how_feeling': 'Terrible'}), Agent(traits = {'how_feeling': 'OK'})])
- to_csv(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None) FileStore | None [source]
Export the results to a FileStore instance containing CSV data.
- to_dicts(remove_prefix: bool = True) list[dict] [source]
Convert the results to a list of dictionaries.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').to_dicts() [{'how_feeling': 'OK'}, {'how_feeling': 'Great'}, {'how_feeling': 'Terrible'}, {'how_feeling': 'OK'}]
- to_excel(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None, sheet_name: str | None = None) FileStore | None [source]
Export the results to a FileStore instance containing Excel data.
- to_jsonl(filename: str | None = None) FileStore | None [source]
Export the results to a FileStore instance containing JSONL data.
- to_list(flatten=False, remove_none=False, unzipped=False) list[list] [source]
Convert the results to a list of lists.
- Parameters:
flatten – Whether to flatten the list of lists.
remove_none – Whether to remove None values from the list.
>>> from edsl.results import Results >>> Results.example().select('how_feeling', 'how_feeling_yesterday') Dataset([{'answer.how_feeling': ['OK', 'Great', 'Terrible', 'OK']}, {'answer.how_feeling_yesterday': ['Great', 'Good', 'OK', 'Terrible']}])
>>> Results.example().select('how_feeling', 'how_feeling_yesterday').to_list() [('OK', 'Great'), ('Great', 'Good'), ('Terrible', 'OK'), ('OK', 'Terrible')]
>>> r = Results.example() >>> r.select('how_feeling').to_list() ['OK', 'Great', 'Terrible', 'OK']
>>> from edsl.results.Dataset import Dataset >>> Dataset([{'a.b': [[1, 9], 2, 3, 4]}]).select('a.b').to_list(flatten = True) [1, 9, 2, 3, 4]
>>> from edsl.results.Dataset import Dataset >>> Dataset([{'a.b': [[1, 9], 2, 3, 4]}, {'c': [6, 2, 3, 4]}]).select('a.b', 'c').to_list(flatten = True) Traceback (most recent call last): ... ValueError: Cannot flatten a list of lists when there are multiple columns selected.
- to_pandas(remove_prefix: bool = False, lists_as_strings=False) DataFrame [source]
Convert the results to a pandas DataFrame, ensuring that lists remain as lists.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
- to_polars(remove_prefix: bool = False, lists_as_strings=False) pl.DataFrame [source]
Convert the results to a Polars DataFrame.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
- to_scenario_list(remove_prefix: bool = True) list[dict] [source]
Convert the results to a list of dictionaries, one per scenario.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').to_scenario_list() ScenarioList([Scenario({'how_feeling': 'OK'}), Scenario({'how_feeling': 'Great'}), Scenario({'how_feeling': 'Terrible'}), Scenario({'how_feeling': 'OK'})])