Results
A Results object represents the outcome of running a Survey. It contains a list of individual Result objects, where each Result corresponds to a response to the survey for a unique combination of Agent, Model, and Scenario objects used with the survey.
For example, if a survey (of one more more questions) is administered to 2 agents and 2 language models (without any scenarios for the questions), the Results will contain 4 Result objects: one for each combination of agent and model used with the survey. If the survey questions are parameterized with 2 scenarios, the Results will expand to include 8 Result objects, accounting for all combinations of agents, models, and scenarios.
Generating results
A Results object is not typically instantiated directly, but is returned by calling the run() method of a Survey after any agents, language models and scenarios are added to it.
In order to demonstrate how to access and interact with results, we use the following code to generate results for a simple survey. Note that specifying agent traits, scenarios (question parameter values) and language models is optional, and we include those steps here for illustration purposes. See the Agents, scenarios and models sections for more details on these components.
Note: You must store API keys for language models in order to generate results. Please see the Managing Keys section for instructions on activating Remote Inference or storing your own API keys for inference service providers.
To construct a survey we start by creating questions:
from edsl import QuestionLinearScale, QuestionMultipleChoice
q1 = QuestionLinearScale(
question_name = "important",
question_text = "On a scale from 1 to 5, how important to you is {{ scenario.topic }}?",
question_options = [0, 1, 2, 3, 4, 5],
option_labels = {0:"Not at all important", 5:"Very important"}
)
q2 = QuestionMultipleChoice(
question_name = "read",
question_text = "Have you read any books about {{ scenario.topic }}?",
question_options = ["Yes", "No", "I do not know"]
)
We combine them in a survey to administer them together:
from edsl import Survey
survey = Survey([q1, q2])
We have parameterized our questions, so we can use them with different scenarios:
from edsl import ScenarioList
scenarios = ScenarioList.from_list("topic", ["climate change", "house prices"])
We can optionally create agents with personas or other relevant traits to answer the survey:
from edsl import AgentList, Agent
agents = AgentList(
Agent(traits = {"persona": p}) for p in ["student", "celebrity"]
)
We can specify the language models that we want to use to generate responses:
from edsl import ModelList, Model
models = ModelList(
Model(m) for m in ["gemini-1.5-flash", "gpt-4o"]
)
Finally, we generate results by adding the scenarios, agents and models to the survey and calling the run() method:
results = survey.by(scenarios).by(agents).by(models).run()
For more details on each of the above steps, please see the Agents, scenarios and models sections of the docs.
Result objects
We can check the number of Result objects created by inspecting the length of the Results:
len(results)
This will count 2 (scenarios) x 2 (agents) x 2 (models) = 8 Result objects:
8
Generating multiple results
If we want to generate multiple results for a survey–i.e., more than 1 result for each combination of Agent, Model and Scenario objects used–we can pass the desired number of iterations when calling the run() method. For example, the following code will generate 3 results for our survey (n=3):
results = survey.by(scenarios).by(agents).by(models).run(n=3)
We can verify that the number of Result objects created is now 24 = 3 iterations x 2 scenarios x 2 agents x 2 models:
len(results)
24
We can readily inspect a result:
results[0]
Output:
Results components
Results contain components that can be accessed and analyzed individually or collectively. We can see a list of these components by calling the columns method:
results.columns
The following list will be returned for the results generated by the above code:
agent.agent_index |
agent.agent_instruction |
agent.agent_name |
agent.persona |
answer.important |
answer.read |
cache_keys.important_cache_key |
cache_keys.read_cache_key |
cache_keys.important_cache_used |
cache_keys.read_cache_used |
comment.important_comment |
comment.read_comment |
generated_tokens.important_generated_tokens |
generated_tokens.read_generated_tokens |
iteration.iteration |
model.frequency_penalty |
model.logprobs |
model.maxOutputTokens |
model.max_tokens |
model.model |
model.presence_penalty |
model.stopSequences |
model.temperature |
model.topK |
model.topP |
model.top_logprobs |
model.top_p |
prompt.important_system_prompt |
prompt.important_user_prompt |
prompt.read_system_prompt |
prompt.read_user_prompt |
question_options.important_question_options |
question_options.read_question_options |
question_text.important_question_text |
question_text.read_question_text |
question_type.important_question_type |
question_type.read_question_type |
raw_model_response.important_cost |
raw_model_response.important_input_price_per_million_tokens |
raw_model_response.important_input_tokens |
raw_model_response.important_one_usd_buys |
raw_model_response.important_output_price_per_million_tokens |
raw_model_response.important_output_tokens |
raw_model_response.important_raw_model_response |
raw_model_response.read_cost |
raw_model_response.read_input_price_per_million_tokens |
raw_model_response.read_input_tokens |
raw_model_response.read_one_usd_buys |
raw_model_response.read_output_price_per_million_tokens |
raw_model_response.read_output_tokens |
raw_model_response.read_raw_model_response |
scenario.scenario_index |
scenario.topic |
The columns include information about each agent, model and corresponding prompts used to simulate the answer to each question and scenario in the survey, together with each raw model response. If the survey was run multiple times (run(n=<integer>)) then the iteration.iteration column will show the iteration number for each result.
Agent information:
agent.agent_index: The index of the agent in the AgentList used to create the survey.
agent.instruction: The instruction for the agent. This field is the optional instruction that was passed to the agent when it was created.
agent.agent_name: This field is always included in any Results object. It contains a unique identifier for each Agent that can be specified when an agent is is created (Agent(name=<name>, traits={<traits_dict>})). If not specified, it is added automatically when results are generated (in the form Agent_0, etc.).
agent.persona: Each of the traits that we pass to an agent is represented in a column of the results. Our example code created a “persona” trait for each agent, so our results include a “persona” column for this information. Note that the keys for the traits dictionary should be a valid Python keys.
Answer information:
answer.important: Agent responses to the linear scale important question.
answer.read: Agent responses to the multiple choice read question.
Cache information:
cache_keys.important_cache_key: The cache key for the important question.
cache_keys.important_cache_used: Whether the existing cache was used for the important question.
cache_keys.read_cache_key: The cache key for the read question.
cache_keys.read_cache_used: Whether the existing cache was used for the read question.
Comment information:
A “comment” field is automatically included for every question in a survey other than free text questions, to allow the model to provide additional information about its response. The default instruction for the agent to provide a comment is included in user_prompt for a question, and can be modified or omitted when creating the question. (See the Prompts section for details on modifying user and system prompts, and information about prompts in results below. Comments can also be automatically excluded by passing a parameter include_comment=False a question when creating it.)
comment.important_comment: Agent commentary on responses to the important question.
comment.read_comment: Agent commentary on responses to the read question.
Generated tokens information:
generated_tokens.important_generated_tokens: The generated tokens for the important question.
generated_tokens.read_generated_tokens: The generated tokens for the read question.
Iteration information:
The iteration column shows the number of the run (run(n=<integer>)) for the combination of components used (scenarios, agents and models).
Model information:
Each of model columns is a modifiable parameter of the models used to generate the responses.
model.frequency_penalty: The frequency penalty for the model.
model.logprobs: The logprobs for the model.
model.maxOutputTokens: The maximum number of output tokens for the model.
model.max_tokens: The maximum number of tokens for the model.
model.model: The name of the model used.
model.presence_penalty: The presence penalty for the model.
model.stopSequences: The stop sequences for the model.
model.temperature: The temperature for the model.
model.topK: The top k for the model.
model.topP: The top p for the model.
model.top_logprobs: The top logprobs for the model.
model.top_p: The top p for the model.
model.use_cache: Whether the model uses cache.
Note: Some of the above fields are particular to specific models, and may have different names (e.g., top_p vs. topP).
Prompt information:
prompt.important_system_prompt: The system prompt for the important question.
prompt.important_user_prompt: The user prompt for the important question.
prompt.read_system_prompt: The system prompt for the read question.
prompt.read_user_prompt: The user prompt for the read question.
For more details about prompts, please see the Prompts section.
Question information:
question_options.important_question_options: The options for the important question, if any.
question_options.read_question_options: The options for the read question, if any.
question_text.important_question_text: The text of the important question.
question_text.read_question_text: The text of the read question.
question_type.important_question_type: The type of the important question.
question_type.read_question_type: The type of the read question.
Raw model response information:
raw_model_response.important_cost: The cost of the result for the important question, applying the token quanities & prices.
raw_model_response.important_input_price_per_million_tokenss: The price per million input tokens for the important question for the relevant model.
raw_model_response.important_input_tokens: The number of input tokens for the important question for the relevant model.
raw_model_response.important_one_usd_buys: The number of identical results for the important question that 1USD would cover.
raw_model_response.important_output_price_per_million_tokens: The price per million output tokens for the important question for the relevant model.
raw_model_response.important_output_tokens: The number of output tokens for the important question for the relevant model.
raw_model_response.important_raw_model_response: The raw model response for the important question.
raw_model_response.read_cost: The cost of the result for the read question, applying the token quanities & prices.
raw_model_response.read_input_price_per_million_tokens: The price per million input tokens for the read question for the relevant model.
raw_model_response.read_input_tokens: The number of input tokens for the read question for the relevant model.
raw_model_response.read_one_usd_buys: The number of identical results for the read question that 1USD would cover.
raw_model_response.read_output_price_per_million_tokens: The price per million output tokens for the read question for the relevant model.
raw_model_response.read_output_tokens: The number of output tokens for the read question for the relevant model.
raw_model_response.read_raw_model_response: The raw model response for the read question.
Note that the cost of a result for a question is specific to the components (scenario, agent, model used with it).
Scenario information:
scenario.scenario_index: The index of the scenario.
scenario.topic: The values provided for the “topic” scenario for the questions.
Note: We recently added support for OpenAI reasoning models. See an example notebook for usage here. The Results that are generated with reasoning models include additional fields for reasoning summaries.
Creating tables by selecting columns
Each of these columns can be accessed directly by calling the select() method and passing the column names. Alternatively, we can specify the columns to exclude by calling the drop() method. These methods can be chained together to display the specified columns in a table format.
For example, the following code will print a table showing the answers for read and important together with model, persona and topic columns (because the column names are unique we can drop the model, agent, scenario and answer prefixes when selecting them):
results = survey.by(scenarios).by(agents).by(models).run() # Running the survey once
results.select("model", "persona", "topic", "read", "important")
A table with the selected columns will be printed:
model.model |
agent.persona |
scenario.topic |
answer.read |
answer.important |
---|---|---|---|---|
gemini-1.5-flash |
student |
climate change |
Yes |
5 |
gpt-4o |
student |
climate change |
Yes |
5 |
gemini-1.5-flash |
student |
house prices |
No |
1 |
gpt-4o |
student |
house prices |
No |
3 |
gemini-1.5-flash |
celebrity |
climate change |
Yes |
5 |
gpt-4o |
celebrity |
climate change |
Yes |
5 |
gemini-1.5-flash |
celebrity |
house prices |
Yes |
3 |
gpt-4o |
celebrity |
house prices |
No |
3 |
Sorting results
We can sort the columns by calling the sort_by method and passing it the column names to sort by:
(
results
.sort_by("model", "persona", reverse=False)
.select("model", "persona", "topic", "read", "important")
)
The following table will be printed:
model.model |
agent.persona |
scenario.topic |
answer.read |
answer.important |
---|---|---|---|---|
gemini-1.5-flash |
celebrity |
climate change |
Yes |
5 |
gemini-1.5-flash |
celebrity |
house prices |
Yes |
3 |
gemini-1.5-flash |
student |
climate change |
Yes |
5 |
gemini-1.5-flash |
student |
house prices |
No |
1 |
gpt-4o |
celebrity |
climate change |
Yes |
5 |
gpt-4o |
celebrity |
house prices |
No |
3 |
gpt-4o |
student |
climate change |
Yes |
5 |
gpt-4o |
student |
house prices |
No |
3 |
Labeling results
We can also add some table labels by passing a dictionary to the pretty_labels argument of the print method (note that we need to include the column prefixes when specifying the table labels, as shown below):
(
results
.sort_by("model", "persona", reverse=True)
.select("model", "persona", "topic", "read", "important")
.print(pretty_labels={
"model.model": "LLM",
"agent.persona": "Agent",
"scenario.topic": "Topic",
"answer.read": q2.question_text,
"answer.important": q1.question_text
}, format="rich")
)
The following table will be printed:
LLM |
Agent |
Topic |
Have you read any books about {{ scenario.topic }}? |
On a scale from 1 to 5, how important to you is {{ scenario.topic }}? |
---|---|---|---|---|
gpt-4o |
student |
climate change |
Yes |
5 |
gpt-4o |
student |
house prices |
No |
3 |
gpt-4o |
celebrity |
climate change |
Yes |
5 |
gpt-4o |
celebrity |
house prices |
No |
3 |
gemini-1.5-flash |
student |
climate change |
Yes |
5 |
gemini-1.5-flash |
student |
house prices |
No |
1 |
gemini-1.5-flash |
celebrity |
climate change |
Yes |
5 |
gemini-1.5-flash |
celebrity |
house prices |
Yes |
3 |
Filtering results
Results can be filtered by using the filter method and passing it a logical expression identifying the results that should be selected. For example, the following code will filter results where the answer to important is “5” and then just print the topic and important_comment columns:
(
results
.filter("important == 5")
.select("topic", "important", "important_comment")
)
This will return an abbreviated table:
scenario.topic |
answer.important |
comment.important_comment |
---|---|---|
climate change |
5 |
It’s, like, a huge deal. The future of the planet is at stake, and that affects everything - from the environment to the economy to social justice. It’s something I worry about a lot. |
climate change |
5 |
As a student, I’m really concerned about climate change because it affects our future and the planet we’ll inherit. It’s crucial to understand and address it to ensure a sustainable world for generations to come. |
climate change |
5 |
It’s a huge issue, you know? We only have one planet, and if we don’t take care of it, what kind of world are we leaving for future generations? It’s not just about polar bears; it’s about everything. It’s my responsibility, as someone with a platform, to speak out about it. |
climate change |
5 |
Climate change is a critical issue that affects everyone globally, and as a public figure, I believe it’s important to use my platform to raise awareness and advocate for sustainable practices. |
Note: The filter method allows us to pass the unique short names of the columns (without the prefixes) when specifying the logical expression. However, because the model.model column name is also a prefix, we need to include the prefix when filtering by this column, as shown in the example below:
(
results
.filter("model.model == 'gpt-4o'")
.select("model", "persona", "topic", "read", "important")
)
This will return a table of results where the model is “gpt-4o”:
model.model |
agent.persona |
scenario.topic |
answer.read |
answer.important |
---|---|---|---|---|
gpt-4o |
student |
climate change |
Yes |
5 |
gpt-4o |
student |
house prices |
No |
3 |
gpt-4o |
celebrity |
climate change |
Yes |
5 |
gpt-4o |
celebrity |
house prices |
No |
3 |
Limiting results
We can select and print a limited number of results by passing the desired number of max_rows to the print() method. This can be useful for quickly checking the first few results:
(
results
.select("model", "persona", "topic", "read", "important")
.print(max_rows=4, format="rich")
)
This will return a table of the selected components of the first 4 results:
model.model |
agent.persona |
scenario.topic |
answer.read |
answer.important |
---|---|---|---|---|
gemini-1.5-flash |
student |
climate change |
Yes |
5 |
gpt-4o |
student |
climate change |
Yes |
5 |
gemini-1.5-flash |
student |
house prices |
No |
1 |
gpt-4o |
student |
house prices |
No |
3 |
Sampling results
We can select a sample of n results by passing the desired number of random results to the sample() method. This can be useful for checking a random subset of the results with different parameters:
sample_results = results.sample(2)
(
sample_results
.sort_by("model")
.select("model", "persona", "topic", "read", "important")
)
This will return a table of the specified number of randomly selected results:
model.model |
agent.persona |
scenario.topic |
answer.read |
answer.important |
---|---|---|---|---|
gpt-4o |
celebrity |
house prices |
No |
3 |
gpt-4o |
celebrity |
climate change |
Yes |
5 |
Shuffling results
We can shuffle results by calling the shuffle() method. This can be useful for quickly checking the first few results:
shuffle_results = results.shuffle()
(
shuffle_results
.select("model", "persona", "topic", "read", "important")
)
This will return a table of shuffled results:
model.model |
agent.persona |
scenario.topic |
answer.read |
answer.important |
---|---|---|---|---|
gemini-1.5-flash |
celebrity |
climate change |
Yes |
5 |
gpt-4o |
student |
house prices |
No |
3 |
gemini-1.5-flash |
celebrity |
house prices |
Yes |
3 |
gemini-1.5-flash |
student |
house prices |
No |
1 |
gpt-4o |
celebrity |
house prices |
No |
3 |
gpt-4o |
celebrity |
climate change |
Yes |
5 |
gpt-4o |
student |
climate change |
Yes |
5 |
gemini-1.5-flash |
student |
climate change |
Yes |
5 |
Adding results
We can add results together straightforwardly by using the + operator:
add_results = results + results
We can see that the results have doubled:
len(add_results)
This will return the number of results:
16
Flattening results
If a field of results contains dictionaries we can flatten them into separate fields by calling the flatten() method. This method takes a list of the fields to flatten and a boolean indicator whether to preserve the original fields in the new Results object that is returned.
For example:
from edsl import QuestionDict, Model
m = Model("gemini-1.5-flash")
q = QuestionDict(
question_name = "recipe",
question_text = "Please provide a simple recipe for hot chocolate.",
answer_keys = ["title", "ingredients", "instructions"]
)
r = q.by(m).run()
r.select("model", "recipe").flatten(field="answer.recipe", keep_original=True)
This will return a table of the flattened results:
model.model |
answer.recipe |
answer.recipe.title |
answer.recipe.ingredients |
answer.recipe.instructions |
---|---|---|---|---|
gemini-1.5-flash |
{‘title’: ‘Simple Hot Chocolate’, ‘ingredients’: [‘1 cup milk (dairy or non-dairy)’, ‘1 tablespoon unsweetened cocoa powder’, ‘1-2 tablespoons sugar (or to taste)’, ‘Pinch of salt’], ‘instructions’: [‘Combine milk, cocoa powder, sugar, and salt in a small saucepan.’, ‘Heat over medium heat, stirring constantly, until the mixture is smooth and heated through.’, ‘Do not boil.’, ‘Pour into a mug and enjoy!’]} |
Simple Hot Chocolate |
[‘1 cup milk (dairy or non-dairy)’, ‘1 tablespoon unsweetened cocoa powder’, ‘1-2 tablespoons sugar (or to taste)’, ‘Pinch of salt’] |
[‘Combine milk, cocoa powder, sugar, and salt in a small saucepan.’, ‘Heat over medium heat, stirring constantly, until the mixture is smooth and heated through.’, ‘Do not boil.’, ‘Pour into a mug and enjoy!’] |
Retrieving results
We can retrieve details about results posted to Coop by calling the list() method on the Results class. For example, the following code will return information about the 10 most recent results posted to Coop:
from edsl import Results
results = Results.list()
The following information will be returned:
Column |
Description |
---|---|
last_updated_ts |
The timestamp when the result was last updated. |
alias |
The alias for the results. |
uuid |
The UUID of the results. |
version |
The version of the result. |
created_ts |
The timestamp when the results were created. |
visibility |
The visibility of the results (public, private or unlisted). |
description |
A description of the results, if any. |
url |
The URL to access the results. |
object_type |
The type of object (e.g., Results). |
owner_username |
The username of the owner of the results. |
alias_url |
The URL for the alias, if any. |
To access the next page of results, you can specify the page= parameter:
results = Results.list(page=2)
This will return the next page of results, with the same columns as above.
from edsl import Results
# Retrieve the first 2 pages of results and collect their UUIDs
uuids = []
for i in range(1, 3):
results = Results.list(page=i)
uuids.extend(list(results.to_key_value("uuid")))
If you have a predetermined number of objects, you can also use page_size= to specify the number of objects per page (up to 100 objects):
results = Results.list(page_size=5)
This will return the first 5 results, with the same columns as above.
By default, the most recently created objects are returned first. You can reverse this by specifying sort_ascending=True:
from edsl import Results
# Retrieve the first 10 results, sorted in ascending order by creation time
results = Results.list(sort_ascending=True)
You can also filter objects by description using the search_query parameter:
from edsl import Results
# Retrieve results with a description containing the word "testing"
results = Results.list(search_query="testing")
If you want not just the metadata, but the actual object, you can call .fetch() on the metadata list:
from edsl import Results
# Retrieve the first 10 results and fetch the actual objects
results = Results.list().fetch()
The list() method can also be called on Agent and Jobs objects, and the Coop client object (to retrieve details of objects of any type).
Generating a report
We can create a report of the results by calling the report() method and passing the columns to be included (all columns are included by default). This generates a report in markdown by iterating through the rows, presented as observations. You can optionally pass headers, a divider and a limit on the number of observations to include. It can be useful if you want to display some sample part of larger results in a working notebook you are sharing.
For example, the following code will generate a report of the first 4 results:
from edsl import QuestionFreeText, ScenarioList, Model
m = Model("gemini-1.5-flash")
s = ScenarioList.from_list("language", ["German", "Dutch", "French", "English"])
q = QuestionFreeText(
question_name = "poem",
question_text = "Please write me a short poem about winter in {{ language }}."
)
r = q.by(s).by(m).run()
r.select("model", "poem", "language").report(top_n=2, divider=False, return_string=True)
This will return a report of the first 2 results:
Observation: 1
model.model
gemini-1.5-flash
answer.poem
Der Schnee fällt leis', ein weicher Flor, Die Welt in Weiß, ein Zauberchor. Die Bäume stehn, in Stille gehüllt, Der Winterwind, sein Lied erfüllt.
(Translation: The snow falls softly, a gentle veil, / The world in white, a magic choir. / The trees stand, wrapped in silence, / The winter wind, its song fulfilled.)
scenario.language
German
Observation: 2
model.model
gemini-1.5-flash
answer.poem
De winter komt, de dagen kort, De sneeuw valt zacht, een wit decor. De bomen staan, kaal en stil, Een ijzige wind, een koude tril.
(Translation: Winter comes, the days are short, / The snow falls softly, a white décor. / The trees stand, bare and still, / An icy wind, a cold shiver.)
scenario.language
Dutch
"# Observation: 1\n## model.model\ngemini-1.5-flash\n## answer.poem\nDer Schnee fällt leis', ein weicher Flor,\nDie Welt in Weiß, ein Zauberchor.\nDie Bäume stehn, in Stille gehüllt,\nDer Winterwind, sein Lied erfüllt.\n\n(Translation: The snow falls softly, a gentle veil, / The world in white, a magic choir. / The trees stand, wrapped in silence, / The winter wind, its song fulfilled.)\n## scenario.language\nGerman\n\n---\n\n# Observation: 2\n## model.model\ngemini-1.5-flash\n## answer.poem\nDe winter komt, de dagen kort,\nDe sneeuw valt zacht, een wit decor.\nDe bomen staan, kaal en stil,\nEen ijzige wind, een koude tril.\n\n(Translation: Winter comes, the days are short, / The snow falls softly, a white décor. / The trees stand, bare and still, / An icy wind, a cold shiver.)\n## scenario.language\nDutch\n"
Accessing results with SQL
We can interact with results via SQL using the sql method. This is done by passing a SQL query and a shape (“long” or “wide”) for the resulting table, where the table name in the query is “self”.
For example, the following code will return a table showing the model, persona, read and important columns for the first 4 results:
results.sql("select model, persona, read, important from self limit 4")
This following table will be displayed
model |
persona |
read |
important |
---|---|---|---|
gemini-1.5-flash |
student |
Yes |
5 |
gpt-4o |
student |
Yes |
5 |
gemini-1.5-flash |
student |
No |
1 |
gpt-4o |
student |
No |
3 |
Dataframes
We can also export results to other formats. The to_pandas method will turn our results into a Pandas dataframe:
results.to_pandas()
For example, here we use it to create a dataframe consisting of the models, personas and the answers to the important question:
results.to_pandas()[["model.model", "agent.persona", "answer.important"]]
Exporting to CSV or JSON
The to_csv method will write the results to a CSV file:
results.to_pandas().to_csv("results.csv")
The to_json method will write the results to a JSON file:
results.to_pandas().to_json("results.json")
Revising prompts to improve results
If any of your results are missing model responses, you can use the spot_issues() method to help identify the issues and then revise the prompts to improve the results. This method runs a meta-survey of (2) questions for any prompts that generated a bad or null response, and then returns the results of the meta-survey.
The first question in the survey is a QuestionFreeText question which prompts the model to describe the likely issues with the prompts:
The following prompts generated a bad or null response: '{{ original_prompts }}'
What do you think was the likely issue(s)?
The second question in the survey is a QuestionDict question which prompts the model to return a dictionary consisting of revised user and system prompts:
The following prompts generated a bad or null response: '{{ original_prompts }}'
You identified the issue(s) as '{{ issues.answer }}'.
Please revise the prompts to address the issue(s).
You can optionally pass a list of models to use with the meta-survey, instead of the default model.
Example usage:
# Returns a Results object with the results of the meta-survey
results.spot_issues(models=["gpt-4o"])
# You can inspect the metadata for your original prompts together with the results of the meta-survey
results.select(
"original_question", # The name of the question that generated a bad or null response
"original_agent_index", # The index of the agent that generated a bad or null response
"original_scenario_index", # The index of the scenario that generated a bad or null response
"original_prompts", # The original prompts that generated a bad or null response
"answer.issues", # Free text description of potential issues in the original prompts
"answer.revised" # A dictionary of revised user and system prompts
)
See an example of the method.
Exceptions
If any exceptions are raised when the survey is run a detailed exceptions report is generated and can be opened in your browser. See the Exceptions & Debugging section for more information on exceptions.
Result class
- class edsl.results.Result(agent: Agent, scenario: Scenario, model: LanguageModel, iteration: int, answer: dict[QuestionName, AnswerValue], prompt: dict[QuestionName, str] = None, raw_model_response: dict | None = None, survey: 'Survey' | None = None, question_to_attributes: dict[QuestionName, Any] | None = None, generated_tokens: dict | None = None, comments_dict: dict | None = None, reasoning_summaries_dict: dict | None = None, cache_used_dict: dict[QuestionName, bool] | None = None, indices: dict | None = None, cache_keys: dict[QuestionName, str] | None = None, validated_dict: dict[QuestionName, bool] | None = None)[source]
Bases:
Base
,UserDict
The Result class captures the complete data from one agent interview.
A Result object stores the agent, scenario, language model, and all answers provided during an interview, along with metadata such as token usage, caching information, and raw model responses. It provides a rich interface for accessing this data and supports serialization for storage and retrieval.
The Result class inherits from both Base (for serialization) and UserDict (for dictionary-like behavior), allowing it to be accessed like a dictionary while maintaining a rich object model.
- Attributes:
agent: The Agent object that was interviewed. scenario: The Scenario object that was presented to the agent. model: The LanguageModel object that was used to generate responses. answer: Dictionary mapping question names to answer values. sub_dicts: Organized sub-dictionaries for different data types. combined_dict: Flattened dictionary combining all sub-dictionaries. problem_keys: List of keys that have naming conflicts.
- Note:
Results are typically created by the Jobs system when running interviews and collected into a Results collection for analysis. You rarely need to create Result objects manually.
- Examples:
>>> result = Result.example() >>> result['answer']['how_feeling'] 'OK'
- class ClassOrInstanceMethod(func)[source]
Bases:
object
Descriptor that allows a method to be called as both a class method and an instance method.
- __init__(agent: Agent, scenario: Scenario, model: LanguageModel, iteration: int, answer: dict[QuestionName, AnswerValue], prompt: dict[QuestionName, str] = None, raw_model_response: dict | None = None, survey: 'Survey' | None = None, question_to_attributes: dict[QuestionName, Any] | None = None, generated_tokens: dict | None = None, comments_dict: dict | None = None, reasoning_summaries_dict: dict | None = None, cache_used_dict: dict[QuestionName, bool] | None = None, indices: dict | None = None, cache_keys: dict[QuestionName, str] | None = None, validated_dict: dict[QuestionName, bool] | None = None)[source]
Initialize a Result object.
- Args:
agent: The Agent object that was interviewed. scenario: The Scenario object that was presented. model: The LanguageModel object that generated responses. iteration: The iteration number for this result. answer: Dictionary mapping question names to answer values. prompt: Dictionary of prompts used for each question. Defaults to None. raw_model_response: The raw response from the language model. Defaults to None. survey: The Survey object containing the questions. Defaults to None. question_to_attributes: Dictionary of question attributes. Defaults to None. generated_tokens: Dictionary of token usage statistics. Defaults to None. comments_dict: Dictionary of comments for each question. Defaults to None. reasoning_summaries_dict: Dictionary of reasoning summaries. Defaults to None. cache_used_dict: Dictionary indicating cache usage for each question. Defaults to None. indices: Dictionary of indices for data organization. Defaults to None. cache_keys: Dictionary of cache keys for each question. Defaults to None. validated_dict: Dictionary indicating validation status for each question. Defaults to None.
- by_question_data(flatten_nested_dicts: bool = False, separator: str = '_')[source]
Organize result data by question with optional flattening of nested dictionaries.
This method reorganizes the result data structure to be organized by question name, making it easier to analyze answers and related metadata on a per-question basis.
- Args:
- flatten_nested_dicts: Whether to flatten nested dictionaries using the separator.
Defaults to False.
- separator: The separator to use when flattening nested dictionaries.
Defaults to “_”.
- Returns:
A dictionary organized by question name, with each question containing its associated data (answer, prompt, metadata, etc.).
- check_expression(expression: str) None [source]
Check if an expression references a problematic key.
- Args:
expression: The expression string to check for problematic keys.
- Raises:
- ResultsColumnNotFoundError: If the expression contains a problematic key
that should use the full qualified name instead.
- clipboard()[source]
Copy this object’s representation to the system clipboard.
This method first checks if the object has a custom clipboard_data() method. If it does, it uses that method’s output. Otherwise, it serializes the object to a dictionary (without version info) and copies it to the system clipboard as JSON text.
- Returns:
None, but prints a confirmation message
- code()[source]
Return a string of code that can be used to recreate the Result object.
- Raises:
ResultsError: This method is not implemented for Result objects.
- copy() Result [source]
Return a copy of the Result object.
- Returns:
A new Result object that is a copy of this one.
- Examples:
>>> r = Result.example() >>> r2 = r.copy() >>> r == r2 True >>> id(r) == id(r2) False
- create_download_link()[source]
Generate a downloadable link for this object.
Creates a temporary file containing the serialized object and generates a download link that can be shared with others.
- Returns:
str: A URL that can be used to download the object
- display_dict()[source]
Create a flattened dictionary representation for display purposes.
This method creates a flattened view of nested structures using colon notation in keys to represent hierarchy.
- Returns:
dict: A flattened dictionary suitable for display
- display_transcript(show_options: bool = True, show_agent_info: bool = True) None [source]
Display a rich-formatted chat transcript of the interview.
This method creates a ChatTranscript object and displays the conversation between questions and agent responses in a beautiful, chat-like format using the Rich library.
- Args:
show_options: Whether to display question options if available. Defaults to True. show_agent_info: Whether to show agent information at the top. Defaults to True.
- duplicate(add_edsl_version=False)[source]
Create and return a deep copy of the object.
- Args:
add_edsl_version: Whether to include EDSL version information in the duplicated object
- Returns:
A new instance of the same class with identical properties
- classmethod example() Result [source]
Return an example Result object.
- Returns:
A sample Result object for testing and demonstration purposes.
- Examples:
>>> result = Result.example() >>> type(result) <class 'edsl.results.result.Result'> >>> isinstance(result, Result) True
- classmethod from_dict(data: dict) Result [source]
Return a Result object from a dictionary representation.
- Args:
json_dict: Dictionary containing Result data.
- Returns:
A new Result object created from the dictionary data.
- classmethod from_interview(interview) Result [source]
Return a Result object from an interview dictionary.
This method ensures no reference to the original interview is maintained, creating a clean Result object from the interview data.
- Args:
interview: An interview dictionary containing the raw interview data.
- Returns:
A new Result object created from the interview data.
- classmethod from_yaml(yaml_str: str | None = None, filename: str | None = None)[source]
Create an instance from YAML data.
Deserializes a YAML string or file into a new instance of the class.
- Args:
yaml_str: YAML string containing object data filename: Path to a YAML file containing object data
- Returns:
A new instance of the class populated with the deserialized data
- Raises:
BaseValueError: If neither yaml_str nor filename is provided
- get_hash() str [source]
Get a string hash representation of this object based on its content.
- Returns:
str: A string representation of the hash value
- get_uuid() str [source]
Get the UUID of this object from the Expected Parrot cloud service based on its hash.
This method calculates the hash of the object and queries the cloud service to find if there’s an uploaded version with the same content. If found, it returns the UUID of that object.
- Returns:
str: The UUID of the object in the cloud service if found
- Raises:
- CoopServerResponseError: If the object is not found or there’s an error
communicating with the server
- get_value(data_type: str, key: str) Any [source]
Return the value for a given data type and key.
This method provides a consistent way to access values across different sub-dictionaries in the Result object. It’s particularly useful when you need to programmatically access values without knowing which data type a particular key belongs to.
- Args:
- data_type: The category of data to retrieve from. Valid options include:
“agent”, “scenario”, “model”, “answer”, “prompt”, “comment”, “generated_tokens”, “raw_model_response”, “question_text”, “question_options”, “question_type”, “cache_used”, “cache_keys”.
key: The specific attribute name within that data type.
- Returns:
The value associated with the key in the specified data type.
- Examples:
>>> r = Result.example() >>> r.get_value("answer", "how_feeling") 'OK' >>> r.get_value("scenario", "period") 'morning'
- classmethod help()[source]
Display the class documentation string.
This is a convenience method to quickly access the docstring of the class.
- Returns:
None, but prints the class docstring to stdout
- inspect()[source]
Create an interactive inspector widget for this object.
This method uses the InspectorWidget registry system to find the appropriate inspector widget class for this object’s type and returns an instance of it.
- Returns:
InspectorWidget subclass instance: Interactive widget for inspecting this object
- Raises:
KeyError: If no inspector widget is registered for this object’s class ImportError: If the widgets module cannot be imported
- json()[source]
Get a formatted JSON representation of this object.
- Returns:
DisplayJSON: A displayable JSON representation
- keys()[source]
Get the key names in the object’s dictionary representation.
This method returns all the keys in the serialized form of the object, excluding metadata keys like version information.
- Returns:
list: A list of key names
- classmethod list(visibility: Literal['private', 'public', 'unlisted'] | List[Literal['private', 'public', 'unlisted']] | None = None, job_status: Literal['queued', 'running', 'completed', 'failed', 'cancelled', 'cancelling', 'partial_failed'] | List[Literal['queued', 'running', 'completed', 'failed', 'cancelled', 'cancelling', 'partial_failed']] | None = None, search_query: str | None = None, page: int = 1, page_size: int = 10, sort_ascending: bool = False) CoopObjects [source]
List objects from coop.
Notes: - The visibility parameter is not supported for remote inference jobs. - The job_status parameter is not supported for objects. - search_query only works with the description field. - If sort_ascending is False, then the most recently created objects are returned first.
- classmethod load(filename)[source]
Load the object from a JSON file (compressed or uncompressed).
This method deserializes an object from a file, automatically detecting whether the file is compressed with gzip or not.
- Args:
filename: Path to the file to load
- Returns:
An instance of the class populated with data from the file
- Raises:
Various exceptions may be raised if the file doesn’t exist or contains invalid data
- property model: LanguageModel[source]
Return the LanguageModel object.
- classmethod old_pull(url_or_uuid: str | UUID | None = None)[source]
Pull the object from coop.
- Args:
url_or_uuid: Either a UUID string or a URL pointing to the object
- static open_compressed_file(filename)[source]
Read and parse a compressed JSON file.
- Args:
filename: Path to a gzipped JSON file
- Returns:
dict: The parsed JSON content
- static open_regular_file(filename)[source]
Read and parse an uncompressed JSON file.
- Args:
filename: Path to a JSON file
- Returns:
dict: The parsed JSON content
- classmethod patch_cls(url_or_uuid: str | UUID, description: str | None = None, value: Any | None = None, visibility: str | None = None)[source]
Patch an uploaded object’s attributes (class method version). - description changes the description of the object on Coop - value changes the value of the object on Coop. has to be an EDSL object - visibility changes the visibility of the object on Coop
- pop(k[, d]) v, remove specified key and return the corresponding value. [source]
If key is not found, d is returned if given, otherwise KeyError is raised.
- popitem() (k, v), remove and return some (key, value) pair [source]
as a 2-tuple; but raise KeyError if D is empty.
- print(format='rich')[source]
Print a formatted table representation of this object.
- Args:
format: The output format (currently only ‘rich’ is supported)
- Returns:
None, but prints a formatted table to the console
- classmethod pull(url_or_uuid: str | UUID | None = None, expected_parrot_url: str | None = None) dict [source]
Get a signed URL for directly downloading an object from Google Cloud Storage.
This method provides a more efficient way to download objects compared to the old pull() method, especially for large files, by generating a direct signed URL to the storage bucket.
- Args:
- url_or_uuid (Union[str, UUID], optional): Identifier for the object to retrieve.
Can be one of: - UUID string (e.g., “123e4567-e89b-12d3-a456-426614174000”) - Full URL (e.g., “https://expectedparrot.com/content/123e4567…”) - Alias URL (e.g., “https://expectedparrot.com/content/username/my-survey”)
expected_parrot_url (str, optional): Optional custom URL for the coop service
- Returns:
dict: A response containing the signed_url for direct download
- Example:
>>> response = SurveyClass.pull("123e4567-e89b-12d3-a456-426614174000") >>> response = SurveyClass.pull("https://expectedparrot.com/content/username/my-survey") >>> print(f"Download URL: {response['signed_url']}") >>> # Use the signed_url to download the object directly
- push(description: str | None = None, alias: str | None = None, visibility: str | None = 'unlisted', expected_parrot_url: str | None = None) dict [source]
Get a signed URL for directly uploading an object to Google Cloud Storage.
This method provides a more efficient way to upload objects compared to the push() method, especially for large files, by generating a direct signed URL to the storage bucket.
- Args:
expected_parrot_url (str, optional): Optional custom URL for the coop service
- Returns:
dict: A response containing the signed_url for direct upload and optionally a job_id
- Example:
>>> from edsl.surveys import Survey >>> survey = Survey(...) >>> response = survey.push() >>> print(f"Upload URL: {response['signed_url']}") >>> # Use the signed_url to upload the object directly
- save(filename: str | None = None, compress: bool = True)[source]
Save the object to a file as JSON with optional compression.
Serializes the object to JSON and writes it to the specified file. By default, the file will be compressed using gzip. File extensions are handled automatically.
- Args:
filename: Path where the file should be saved compress: If True, compress the file using gzip (default: True)
- Returns:
None
- Examples:
>>> obj.save("my_object.json.gz") # Compressed >>> obj.save("my_object.json", compress=False) # Uncompressed
- score(scoring_function: Callable) int | float [source]
Score the result using a passed-in scoring function.
- Args:
- scoring_function: A callable that takes parameters from the Result’s combined_dict
and returns a numeric score.
- Returns:
The numeric score returned by the scoring function.
- Raises:
- ResultsError: If a required parameter for the scoring function is not found
in the Result object.
- Examples:
>>> def f(status): return 1 if status == 'Joyful' else 0 >>> result = Result.example() >>> result.score(f) 1
- score_with_answer_key(answer_key: dict) dict[str, int] [source]
Score the result against a reference answer key.
This method evaluates the correctness of answers by comparing them to a provided answer key. It returns a dictionary with counts of correct, incorrect, and missing answers.
The answer key can contain either single values or lists of acceptable values. If a list is provided, the answer is considered correct if it matches any value in the list.
- Args:
- answer_key: A dictionary mapping question names to expected answers.
Values can be single items or lists of acceptable answers.
- Returns:
A dictionary with keys ‘correct’, ‘incorrect’, and ‘missing’, indicating the counts of each answer type.
- Examples:
>>> result = Result.example() >>> result.answer {'how_feeling': 'OK', 'how_feeling_yesterday': 'Great'}
>>> # Using exact match answer key >>> answer_key = {'how_feeling': 'OK', 'how_feeling_yesterday': 'Great'} >>> result.score_with_answer_key(answer_key) {'correct': 2, 'incorrect': 0, 'missing': 0}
>>> # Using answer key with multiple acceptable answers >>> answer_key = {'how_feeling': 'OK', 'how_feeling_yesterday': ['Great', 'Good']} >>> result.score_with_answer_key(answer_key) {'correct': 2, 'incorrect': 0, 'missing': 0}
- show_methods(show_docstrings=True)[source]
Display all public methods available on this object.
This utility method helps explore the capabilities of an object by listing all its public methods and optionally their documentation.
- Args:
- show_docstrings: If True, print method names with docstrings;
if False, return the list of method names
- Returns:
- None or list: If show_docstrings is True, prints methods and returns None.
If show_docstrings is False, returns a list of method names.
- store(d: dict, key_name: str | None = None)[source]
Store this object in a dictionary with an optional key.
- Args:
d: The dictionary in which to store the object key_name: Optional key to use (defaults to the length of the dictionary)
- Returns:
None
- to_dataset(flatten_nested_dicts: bool = False, separator: str = '_')[source]
Convert the result to a dataset format.
This method transforms the result data into a Dataset object suitable for analysis and data manipulation.
- Args:
- flatten_nested_dicts: Whether to flatten nested dictionaries using the separator.
Defaults to False.
- separator: The separator to use when flattening nested dictionaries.
Defaults to “_”.
- Returns:
A Dataset object containing the result data organized for analysis.
- to_dict(add_edsl_version: bool = True, include_cache_info: bool = False, full_dict: bool = False) dict[str, Any] [source]
Return a dictionary representation of the Result object.
- Args:
- add_edsl_version: Whether to include EDSL version information in the output.
Defaults to True.
- include_cache_info: Whether to include cache information in the output.
Defaults to False.
- Returns:
A dictionary representation of the Result object containing all relevant data.
- Examples:
>>> r = Result.example() >>> data = r.to_dict() >>> data['scenario']['period'] 'morning'
- to_json()[source]
Serialize this object to a JSON string.
- Returns:
str: A JSON string representation of the object
- to_yaml(add_edsl_version=False, filename: str = None) str | None [source]
Convert the object to YAML format.
Serializes the object to YAML format and optionally writes it to a file.
- Args:
add_edsl_version: Whether to include EDSL version information filename: If provided, write the YAML to this file path
- Returns:
str: The YAML string representation if no filename is provided None: If written to file
- transcript(format: str = 'simple') str [source]
Return the questions and answers in a human-readable transcript.
- Args:
- format: The format for the transcript. Either ‘simple’ or ‘rich’.
‘simple’ (default) returns plain-text format with questions, options, and answers separated by blank lines. ‘rich’ uses the rich library to wrap each Q&A block in a Panel with colors and formatting.
- Returns:
A formatted transcript string of the interview.
- Raises:
ImportError: If ‘rich’ format is requested but the rich library is not installed.
- Examples:
>>> result = Result.example() >>> transcript = result.transcript(format="simple") >>> print(transcript) QUESTION: How are you this {{ period }}? OPTIONS: Good / Great / OK / Terrible ANSWER: OK QUESTION: How were you feeling yesterday {{ period }}? OPTIONS: Good / Great / OK / Terrible ANSWER: Great
- update([E, ]**F) None. Update D from mapping/iterable E and F. [source]
If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
- values()[source]
Get the values in the object’s dictionary representation.
- Returns:
set: A set containing all the values in the object
Results class
- class edsl.results.Results(survey: Optional['Survey'] = None, data: Optional[list['Result']] = None, name: Optional[str] = None, created_columns: Optional[list[str]] = None, cache: Optional['Cache'] = None, job_uuid: Optional[str] = None, total_results: Optional[int] = None, task_history: Optional['TaskHistory'] = None, sort_by_iteration: bool = False, data_class: Optional[type] = <class 'list'>)[source]
Bases:
MutableSequence
,ResultsOperationsMixin
,Base
A collection of Result objects with powerful data analysis capabilities.
The Results class is the primary container for working with data from EDSL surveys. It provides a rich set of methods for data analysis, transformation, and visualization inspired by data manipulation libraries like dplyr and pandas. The Results class implements a functional, fluent interface for data manipulation where each method returns a new Results object, allowing method chaining.
- Attributes:
survey: The Survey object containing the questions used to generate results. data: A list of Result objects containing the responses. created_columns: A list of column names created through transformations. cache: A Cache object for storing model responses. completed: Whether the Results object is ready for use. task_history: A TaskHistory object containing information about the tasks. known_data_types: List of valid data type strings for accessing data.
- Key features:
List-like interface for accessing individual Result objects
Selection of specific data columns with select()
Filtering results with boolean expressions using filter()
Creating new derived columns with mutate()
Recoding values with recode() and answer_truncate()
Sorting results with order_by()
Converting to other formats (dataset, table, pandas DataFrame)
Serialization for storage and retrieval
Support for remote execution and result retrieval
- Results objects have a hierarchical structure with the following components:
Each Results object contains multiple Result objects
Each Result object contains data organized by type (agent, scenario, model, answer, etc.)
Each data type contains multiple attributes (e.g., “how_feeling” in the answer type)
You can access data in a Results object using dot notation (answer.how_feeling) or using just the attribute name if it’s not ambiguous (how_feeling).
The Results class also tracks “created columns” - new derived values that aren’t part of the original data but were created through transformations.
- Examples:
>>> # Create a simple Results object from example data >>> r = Results.example() >>> len(r) > 0 # Contains Result objects True >>> # Filter and transform data >>> filtered = r.filter("how_feeling == 'Great'") >>> # Access hierarchical data >>> 'agent' in r.known_data_types True
- class ClassOrInstanceMethod(func)[source]
Bases:
object
Descriptor that allows a method to be called as both a class method and an instance method.
- __init__(survey: Optional['Survey'] = None, data: Optional[list['Result']] = None, name: Optional[str] = None, created_columns: Optional[list[str]] = None, cache: Optional['Cache'] = None, job_uuid: Optional[str] = None, total_results: Optional[int] = None, task_history: Optional['TaskHistory'] = None, sort_by_iteration: bool = False, data_class: Optional[type] = <class 'list'>)[source]
Instantiate a Results object with a survey and a list of Result objects.
- Args:
survey: A Survey object containing the questions used to generate results. data: A list of Result objects containing the responses. created_columns: A list of column names created through transformations. cache: A Cache object for storing model responses. job_uuid: A string representing the job UUID. total_results: An integer representing the total number of results. task_history: A TaskHistory object containing information about the tasks. sort_by_iteration: Whether to sort data by iteration before initializing. data_class: The class to use for the data container (default: list).
- agent_answers_by_question(agent_key_fields: List[str] | None = None, separator: str = ',') dict [source]
Returns a dictionary of agent answers.
The keys are the agent names and the values are the answers.
>>> result = Results.example().agent_answers_by_question() >>> sorted(result['how_feeling'].values()) ['Great', 'OK', 'OK', 'Terrible'] >>> sorted(result['how_feeling_yesterday'].values()) ['Good', 'Great', 'OK', 'Terrible']
- property agent_keys: list[str][source]
Return a set of all of the keys that are in the Agent data.
Example:
>>> r = Results.example() >>> r.agent_keys ['agent_index', 'agent_instruction', 'agent_name', 'status']
- property agents: AgentList[source]
Return a list of all of the agents in the Results.
Example:
>>> r = Results.example() >>> r.agents AgentList([Agent(traits = {'status': 'Joyful'}), Agent(traits = {'status': 'Joyful'}), Agent(traits = {'status': 'Sad'}), Agent(traits = {'status': 'Sad'})])
- property all_keys: list[str][source]
Return a set of all of the keys that are in the Results.
Example:
>>> r = Results.example() >>> r.all_keys ['agent_index', ...]
- property answer_keys: dict[str, str][source]
Return a mapping of answer keys to question text.
Example:
>>> r = Results.example() >>> r.answer_keys {'how_feeling': 'How are you this {{ period }}?', 'how_feeling_yesterday': 'How were you feeling yesterday {{ period }}?'}
- bucket_by(*columns: str) dict[tuple, list[Result]] [source]
Group Result objects into buckets keyed by the specified column values.
Each key in the returned dictionary is a tuple containing the values of the requested columns (in the same order as supplied). The associated value is a list of
Result
instances whose values match that key.- Args:
- Returns:
dict[tuple, list[Result]]: Mapping from value tuples to lists of
Result
objects.- Raises:
- ResultsError: If no columns are provided or an invalid column name is
supplied.
- Examples:
>>> r = Results.example() >>> buckets = r.bucket_by('how_feeling') >>> list(buckets.keys()) [('OK',), ('Great',), ('Terrible',)] >>> all(isinstance(v, list) for v in buckets.values()) True
- clipboard()[source]
Copy this object’s representation to the system clipboard.
This method first checks if the object has a custom clipboard_data() method. If it does, it uses that method’s output. Otherwise, it serializes the object to a dictionary (without version info) and copies it to the system clipboard as JSON text.
- Returns:
None, but prints a confirmation message
- clipboard_data() str [source]
Return TSV representation of this object for clipboard operations.
This method is called by the clipboard() method in the base class to provide a custom format for copying objects to the system clipboard.
- Returns:
str: Tab-separated values representation of the object
- code()[source]
Method for generating code representations.
- Raises:
ResultsError: This method is not implemented for Results objects.
- Examples:
>>> from edsl.results import Results >>> r = Results.example() >>> try: ... r.code() ... except ResultsError as e: ... str(e).startswith("The code() method is not implemented") True
- property columns: list[str][source]
Return a list of all of the columns that are in the Results.
Example:
>>> r = Results.example() >>> r.columns ['agent.agent_index', ...]
- compare(other_results: Results) dict [source]
Compare two Results objects and return the differences.
- compute_job_cost(include_cached_responses_in_cost: bool = False) float [source]
Compute the cost of a completed job in USD.
This method delegates to the JobCostCalculator class to calculate the total cost of all model responses in the results. By default, it only counts the cost of responses that were not cached.
- Args:
- include_cached_responses_in_cost: Whether to include the cost of cached
responses in the total. Defaults to False.
- Returns:
float: The total cost in USD.
- Examples:
>>> from edsl.results import Results >>> r = Results.example() >>> r.compute_job_cost() 0.0
- create_download_link()[source]
Generate a downloadable link for this object.
Creates a temporary file containing the serialized object and generates a download link that can be shared with others.
- Returns:
str: A URL that can be used to download the object
- display_dict()[source]
Create a flattened dictionary representation for display purposes.
This method creates a flattened view of nested structures using colon notation in keys to represent hierarchy.
- Returns:
dict: A flattened dictionary suitable for display
- duplicate(add_edsl_version=False)[source]
Create and return a deep copy of the object.
- Args:
add_edsl_version: Whether to include EDSL version information in the duplicated object
- Returns:
A new instance of the same class with identical properties
- classmethod example(randomize: bool = False) Results [source]
Return an example Results object.
Example usage:
>>> r = Results.example()
- Parameters:
randomize – if True, randomizes agent and scenario combinations
- extend_sorted(other)[source]
Extend the Results list with items from another iterable.
This method preserves ordering based on ‘order’ attribute if present, otherwise falls back to ‘iteration’ attribute.
- fetch(polling_interval: float | int = 1.0) Results [source]
Poll the server for job completion and update this Results instance.
This method delegates to the ResultsRemoteFetcher class to handle the polling and fetching operation.
- Args:
polling_interval: Number of seconds to wait between polling attempts (default: 1.0)
- Returns:
Results: The updated Results instance
- Raises:
ResultsError: If no job info is available or if there’s an error during fetch.
- fetch_remote(job_info: Any) bool [source]
Fetch remote Results object and update this instance with the data.
This method delegates to the ResultsRemoteFetcher class to handle the remote fetching operation.
- Args:
job_info: RemoteJobInfo object containing the job_uuid and other remote job details
- Returns:
bool: True if the fetch was successful, False if the job is not yet completed.
- Raises:
ResultsError: If there’s an error during the fetch process.
- filter(expression: str) Results [source]
Filter results based on a boolean expression.
This method delegates to the ResultsFilter class to evaluate a boolean expression against each Result object in the collection and returns a new Results object containing only those that match.
- Args:
- expression: A string containing a Python expression that evaluates to a boolean.
The expression is applied to each Result object individually. Can be a multi-line string for better readability. Supports template-style syntax with {{ field }} notation.
- Returns:
A new Results object containing only the Result objects that satisfy the expression.
- Raises:
- ResultsFilterError: If the expression is invalid or uses improper syntax
(like using ‘=’ instead of ‘==’).
- Examples:
>>> r = Results.example()
>>> # Simple equality filter >>> r.filter("how_feeling == 'Great'").select('how_feeling') Dataset([{'answer.how_feeling': ['Great']}])
>>> # Using OR condition >>> r.filter("how_feeling == 'Great' or how_feeling == 'Terrible'").select('how_feeling') Dataset([{'answer.how_feeling': ['Great', 'Terrible']}])
>>> # Filter on agent properties >>> r.filter("agent.status == 'Joyful'").select('agent.status') Dataset([{'agent.status': ['Joyful', 'Joyful']}])
- first() Result [source]
Return the first observation in the results.
Example:
>>> r = Results.example() >>> r.first() Result(agent...
- flatten(field: str, keep_original: bool = False) Dataset [source]
Expand a field containing dictionaries into separate fields.
This method takes a field that contains a list of dictionaries and expands it into multiple fields, one for each key in the dictionaries. This is useful when working with nested data structures or results from extraction operations.
- Parameters:
field: The field containing dictionaries to flatten keep_original: Whether to retain the original field in the result
- Returns:
A new Dataset with the dictionary keys expanded into separate fields
- Notes:
Each key in the dictionaries becomes a new field with name pattern “{field}.{key}”
All dictionaries in the field must have compatible structures
If a dictionary is missing a key, the corresponding value will be None
Non-dictionary values in the field will cause a warning
- Examples:
>>> from edsl.dataset import Dataset
# Basic flattening of nested dictionaries >>> Dataset([{‘a’: [{‘a’: 1, ‘b’: 2}]}, {‘c’: [5]}]).flatten(‘a’) Dataset([{‘c’: [5]}, {‘a.a’: [1]}, {‘a.b’: [2]}])
# Works with prefixed fields too >>> Dataset([{‘answer.example’: [{‘a’: 1, ‘b’: 2}]}, {‘c’: [5]}]).flatten(‘answer.example’) Dataset([{‘c’: [5]}, {‘answer.example.a’: [1]}, {‘answer.example.b’: [2]}])
# Keep the original field if needed >>> d = Dataset([{‘a’: [{‘a’: 1, ‘b’: 2}]}, {‘c’: [5]}]) >>> d.flatten(‘a’, keep_original=True) Dataset([{‘a’: [{‘a’: 1, ‘b’: 2}]}, {‘c’: [5]}, {‘a.a’: [1]}, {‘a.b’: [2]}])
# Can also use unambiguous unprefixed field name >>> result = Dataset([{‘answer.pros_cons’: [{‘pros’: [‘Safety’], ‘cons’: [‘Cost’]}]}]).flatten(‘pros_cons’) >>> sorted(result.keys()) == [‘answer.pros_cons.cons’, ‘answer.pros_cons.pros’] True >>> sorted(result.to_dicts()[0].items()) == sorted({‘cons’: [‘Cost’], ‘pros’: [‘Safety’]}.items()) True
- classmethod from_dict(data: dict[str, Any]) Results [source]
Convert a dictionary to a Results object.
This method delegates to the ResultsSerializer class to handle the conversion of a dictionary representation back to a Results object.
- Args:
data: A dictionary representation of a Results object.
- Returns:
Results: A new Results object created from the dictionary data
- Examples:
>>> r = Results.example() >>> d = r.to_dict() >>> r2 = Results.from_dict(d) >>> r == r2 True
- classmethod from_disk(filepath: str) Results [source]
Load a Results object from a zip file.
This method delegates to the ResultsSerializer class to handle the disk deserialization.
This method: 1. Extracts the SQLite database file 2. Loads the metadata 3. Creates a new Results instance with the restored data
- Args:
filepath: Path to the zip file containing the serialized Results
- Returns:
Results: A new Results instance with the restored data
- Raises:
ResultsError: If there’s an error during deserialization
- classmethod from_job_info(job_info: dict) Results [source]
Instantiate a Results object from a job info dictionary.
This method creates a Results object in a not-ready state that will fetch its data from a remote source when methods are called on it.
- Args:
job_info: Dictionary containing information about a remote job.
- Returns:
- Results: A new Results instance with completed=False that will
fetch remote data when needed.
- Examples:
>>> # Create a job info dictionary >>> job_info = {'job_uuid': '12345', 'creation_data': {'model': 'gpt-4'}} >>> # Create a Results object from the job info >>> results = Results.from_job_info(job_info) >>> results.completed False >>> hasattr(results, 'job_info') True
- classmethod from_yaml(yaml_str: str | None = None, filename: str | None = None)[source]
Create an instance from YAML data.
Deserializes a YAML string or file into a new instance of the class.
- Args:
yaml_str: YAML string containing object data filename: Path to a YAML file containing object data
- Returns:
A new instance of the class populated with the deserialized data
- Raises:
BaseValueError: If neither yaml_str nor filename is provided
- get_answers(question_name: str) list [source]
Get the answers for a given question name.
- Args:
question_name: The name of the question to fetch answers for.
- Returns:
list: A list of answers, one from each result in the data.
- Examples:
>>> from edsl.results import Results >>> r = Results.example() >>> answers = r.get_answers('how_feeling') >>> isinstance(answers, list) True >>> len(answers) == len(r) True
- get_hash() str [source]
Get a string hash representation of this object based on its content.
- Returns:
str: A string representation of the hash value
- get_shelved_result(key: str) Result [source]
Retrieve a Result object from persistent storage.
This method delegates to the ResultsSerializer class to handle the retrieval operation.
- Args:
key: The hash key of the Result to retrieve
- Returns:
Result: The stored Result object
- Raises:
ResultsError: If the key doesn’t exist or if there’s an error retrieving the Result
- get_tabular_data(remove_prefix: bool = False, pretty_labels: dict | None = None) Tuple[List[str], List[List]] [source]
Internal method to get tabular data in a standard format.
- Args:
remove_prefix: Whether to remove the prefix from column names pretty_labels: Dictionary mapping original column names to pretty labels
- Returns:
Tuple containing (header_row, data_rows)
- get_uuid() str [source]
Get the UUID of this object from the Expected Parrot cloud service based on its hash.
This method calculates the hash of the object and queries the cloud service to find if there’s an uploaded version with the same content. If found, it returns the UUID of that object.
- Returns:
str: The UUID of the object in the cloud service if found
- Raises:
- CoopServerResponseError: If the object is not found or there’s an error
communicating with the server
- ggplot2(ggplot_code: str, shape: str = 'wide', sql: str | None = None, remove_prefix: bool = True, debug: bool = False, height: float = 4, width: float = 6, factor_orders: dict | None = None)[source]
Create visualizations using R’s ggplot2 library.
This method provides a bridge to R’s powerful ggplot2 visualization library, allowing you to create sophisticated plots directly from EDSL data structures.
- Parameters:
ggplot_code: R code string containing ggplot2 commands shape: Data shape to use (“wide” or “long”) sql: Optional SQL query to transform data before visualization remove_prefix: Whether to remove prefixes (like “answer.”) from column names debug: Whether to display debugging information height: Plot height in inches width: Plot width in inches factor_orders: Dictionary mapping factor variables to their desired order
- Returns:
A plot object that renders in Jupyter notebooks
- Notes:
Requires R and the ggplot2 package to be installed
Data is automatically converted to a format suitable for ggplot2
The ggplot2 code should reference column names as they appear after any transformations from the shape and remove_prefix parameters
- Examples:
>>> from edsl.results import Results >>> r = Results.example() >>> # The following would create a plot if R is installed (not shown in doctest): >>> # r.ggplot2(''' >>> # ggplot(df, aes(x=how_feeling)) + >>> # geom_bar() + >>> # labs(title="Distribution of Feelings") >>> # ''')
- classmethod help()[source]
Display the class documentation string.
This is a convenience method to quickly access the docstring of the class.
- Returns:
None, but prints the class docstring to stdout
- insert_from_shelf() None [source]
Move all shelved results into memory using insert_sorted method.
This method delegates to the ResultsSerializer class to handle the shelf operations. Clears the shelf after successful insertion.
This method preserves the original order of results by using their ‘order’ attribute if available, which ensures consistent ordering even after serialization/deserialization.
- Raises:
ResultsError: If there’s an error accessing or clearing the shelf
- insert_sorted(item: Result) None [source]
Insert a Result object into the Results list while maintaining sort order.
Uses the ‘order’ attribute if present, otherwise falls back to ‘iteration’ attribute. Utilizes bisect for efficient insertion point finding.
- Args:
item: A Result object to insert
- Examples:
>>> r = Results.example() >>> new_result = r[0].copy() >>> new_result.order = 1.5 # Insert between items >>> r.insert_sorted(new_result)
- inspect()[source]
Create an interactive inspector widget for this object.
This method uses the InspectorWidget registry system to find the appropriate inspector widget class for this object’s type and returns an instance of it.
- Returns:
InspectorWidget subclass instance: Interactive widget for inspecting this object
- Raises:
KeyError: If no inspector widget is registered for this object’s class ImportError: If the widgets module cannot be imported
- json()[source]
Get a formatted JSON representation of this object.
- Returns:
DisplayJSON: A displayable JSON representation
- keys()[source]
Get the key names in the object’s dictionary representation.
This method returns all the keys in the serialized form of the object, excluding metadata keys like version information.
- Returns:
list: A list of key names
- classmethod list(visibility: Literal['private', 'public', 'unlisted'] | List[Literal['private', 'public', 'unlisted']] | None = None, job_status: Literal['queued', 'running', 'completed', 'failed', 'cancelled', 'cancelling', 'partial_failed'] | List[Literal['queued', 'running', 'completed', 'failed', 'cancelled', 'cancelling', 'partial_failed']] | None = None, search_query: str | None = None, page: int = 1, page_size: int = 10, sort_ascending: bool = False) CoopObjects [source]
List objects from coop.
Notes: - The visibility parameter is not supported for remote inference jobs. - The job_status parameter is not supported for objects. - search_query only works with the description field. - If sort_ascending is False, then the most recently created objects are returned first.
- classmethod load(filename)[source]
Load the object from a JSON file (compressed or uncompressed).
This method deserializes an object from a file, automatically detecting whether the file is compressed with gzip or not.
- Args:
filename: Path to the file to load
- Returns:
An instance of the class populated with data from the file
- Raises:
Various exceptions may be raised if the file doesn’t exist or contains invalid data
- make_tabular(remove_prefix: bool, pretty_labels: dict | None = None) tuple[list, List[list]] [source]
Turn the results into a tabular format.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').make_tabular(remove_prefix = True) (['how_feeling'], [['OK'], ['Great'], ['Terrible'], ['OK']])
>>> r.select('how_feeling').make_tabular(remove_prefix = True, pretty_labels = {'how_feeling': "How are you feeling"}) (['How are you feeling'], [['OK'], ['Great'], ['Terrible'], ['OK']])
- property model_keys: list[str][source]
Return a set of all of the keys that are in the LanguageModel data.
>>> r = Results.example() >>> r.model_keys ['canned_response', 'inference_service', 'model', 'model_index', 'temperature']
- property models: ModelList[source]
Return a list of all of the models in the Results.
Example:
>>> r = Results.example() >>> r.models[0] Model(model_name = ...)
- mutate(new_var_string: str, functions_dict: dict | None = None) Results [source]
Create a new column based on a computational expression.
This method delegates to the ResultsTransformer class to handle the mutation operation.
- Args:
- new_var_string: A string containing an assignment expression in the form
“new_column_name = expression”. The expression can reference any existing column and use standard Python syntax.
- functions_dict: Optional dictionary of custom functions that can be used in
the expression. Keys are function names, values are function objects.
- Returns:
A new Results object with the additional column.
- Examples:
>>> r = Results.example()
>>> # Create a simple derived column >>> r.mutate('how_feeling_x = how_feeling + "x"').select('how_feeling_x') Dataset([{'answer.how_feeling_x': ['OKx', 'Greatx', 'Terriblex', 'OKx']}])
>>> # Create a binary indicator column >>> r.mutate('is_great = 1 if how_feeling == "Great" else 0').select('is_great') Dataset([{'answer.is_great': [0, 1, 0, 0]}])
>>> # Create a column with custom functions >>> def sentiment(text): ... return len(text) > 5 >>> r.mutate('is_long = sentiment(how_feeling)', ... functions_dict={'sentiment': sentiment}).select('is_long') Dataset([{'answer.is_long': [False, False, True, False]}])
- num_observations()[source]
Return the number of observations in the dataset.
>>> from edsl.results import Results >>> Results.example().num_observations() 4
- classmethod old_pull(url_or_uuid: str | UUID | None = None)[source]
Pull the object from coop.
- Args:
url_or_uuid: Either a UUID string or a URL pointing to the object
- static open_compressed_file(filename)[source]
Read and parse a compressed JSON file.
- Args:
filename: Path to a gzipped JSON file
- Returns:
dict: The parsed JSON content
- static open_regular_file(filename)[source]
Read and parse an uncompressed JSON file.
- Args:
filename: Path to a JSON file
- Returns:
dict: The parsed JSON content
- order_by(*columns: str, reverse: bool = False) Results [source]
Sort the results by one or more columns.
This method delegates to the ResultsTransformer class to handle the sorting operation.
- Args:
columns: One or more column names as strings. reverse: A boolean that determines whether to sort in reverse order.
- Returns:
Results: A new Results object with sorted data.
- Examples:
>>> r = Results.example() >>> sorted_results = r.order_by('how_feeling') >>> len(sorted_results) == len(r) True
- classmethod patch_cls(url_or_uuid: str | UUID, description: str | None = None, value: Any | None = None, visibility: str | None = None)[source]
Patch an uploaded object’s attributes (class method version). - description changes the description of the object on Coop - value changes the value of the object on Coop. has to be an EDSL object - visibility changes the visibility of the object on Coop
- print(format='rich')[source]
Print a formatted table representation of this object.
- Args:
format: The output format (currently only ‘rich’ is supported)
- Returns:
None, but prints a formatted table to the console
- print_long()[source]
Print the results in a long format. >>> from edsl.results import Results >>> r = Results.example() >>> r.select(‘how_feeling’).print_long() answer.how_feeling: OK answer.how_feeling: Great answer.how_feeling: Terrible answer.how_feeling: OK
- classmethod pull(url_or_uuid: str | UUID | None = None, expected_parrot_url: str | None = None) dict [source]
Get a signed URL for directly downloading an object from Google Cloud Storage.
This method provides a more efficient way to download objects compared to the old pull() method, especially for large files, by generating a direct signed URL to the storage bucket.
- Args:
- url_or_uuid (Union[str, UUID], optional): Identifier for the object to retrieve.
Can be one of: - UUID string (e.g., “123e4567-e89b-12d3-a456-426614174000”) - Full URL (e.g., “https://expectedparrot.com/content/123e4567…”) - Alias URL (e.g., “https://expectedparrot.com/content/username/my-survey”)
expected_parrot_url (str, optional): Optional custom URL for the coop service
- Returns:
dict: A response containing the signed_url for direct download
- Example:
>>> response = SurveyClass.pull("123e4567-e89b-12d3-a456-426614174000") >>> response = SurveyClass.pull("https://expectedparrot.com/content/username/my-survey") >>> print(f"Download URL: {response['signed_url']}") >>> # Use the signed_url to download the object directly
- push(description: str | None = None, alias: str | None = None, visibility: str | None = 'unlisted', expected_parrot_url: str | None = None) dict [source]
Get a signed URL for directly uploading an object to Google Cloud Storage.
This method provides a more efficient way to upload objects compared to the push() method, especially for large files, by generating a direct signed URL to the storage bucket.
- Args:
expected_parrot_url (str, optional): Optional custom URL for the coop service
- Returns:
dict: A response containing the signed_url for direct upload and optionally a job_id
- Example:
>>> from edsl.surveys import Survey >>> survey = Survey(...) >>> response = survey.push() >>> print(f"Upload URL: {response['signed_url']}") >>> # Use the signed_url to upload the object directly
- property question_names: list[str][source]
Return a list of all of the question names.
Example:
>>> r = Results.example() >>> r.question_names ['how_feeling', 'how_feeling_yesterday']
- relevant_columns(data_type: str | None = None, remove_prefix: bool = False) list [source]
Return the set of keys that are present in the dataset.
- Parameters:
data_type – The data type to filter by.
remove_prefix – Whether to remove the prefix from the column names.
>>> from ..dataset import Dataset >>> d = Dataset([{'a.b':[1,2,3,4]}]) >>> d.relevant_columns() ['a.b']
>>> d.relevant_columns(remove_prefix=True) ['b']
>>> d = Dataset([{'a':[1,2,3,4]}, {'b':[5,6,7,8]}]) >>> d.relevant_columns() ['a', 'b']
>>> from edsl.results import Results; Results.example().select('how_feeling', 'how_feeling_yesterday').relevant_columns() ['answer.how_feeling', 'answer.how_feeling_yesterday']
>>> from edsl.results import Results >>> sorted(Results.example().select().relevant_columns(data_type = "model")) ['model.canned_response', 'model.inference_service', 'model.model', 'model.model_index', 'model.temperature']
>>> # Testing relevant_columns with invalid data_type raises DatasetValueError - tested in unit tests
- remove_prefix()[source]
Returns a new Dataset with the prefix removed from all column names.
The prefix is defined as everything before the first dot (.) in the column name. If removing prefixes would result in duplicate column names, an exception is raised.
- Returns:
Dataset: A new Dataset with prefixes removed from column names
- Raises:
ValueError: If removing prefixes would result in duplicate column names
- Examples:
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling', 'how_feeling_yesterday').relevant_columns() ['answer.how_feeling', 'answer.how_feeling_yesterday'] >>> r.select('how_feeling', 'how_feeling_yesterday').remove_prefix().relevant_columns() ['how_feeling', 'how_feeling_yesterday']
>>> from edsl.dataset import Dataset >>> d = Dataset([{'a.x': [1, 2, 3]}, {'b.x': [4, 5, 6]}]) >>> # d.remove_prefix()
# Testing remove_prefix with duplicate column names raises DatasetValueError - tested in unit tests
- rename(old_name: str, new_name: str) Results [source]
Rename an answer column in a Results object.
This method delegates to the ResultsTransformer class to handle the renaming operation.
- Args:
old_name: The current name of the column to rename new_name: The new name for the column
- Returns:
Results: A new Results object with the column renamed
- Examples:
>>> s = Results.example() >>> s.rename('how_feeling', 'how_feeling_new').select('how_feeling_new') Dataset([{'answer.how_feeling_new': ['OK', 'Great', 'Terrible', 'OK']}])
- report(*fields: str | None, top_n: int | None = None, header_fields: List[str] | None = None, divider: bool = True, return_string: bool = False, format: str = 'markdown', filename: str | None = None) str | Document | None [source]
Generates a report of the results by iterating through rows.
- Args:
*fields: The fields to include in the report. If none provided, all fields are used. top_n: Optional limit on the number of observations to include. header_fields: Optional list of fields to include in the main header instead of as sections. divider: If True, adds a horizontal rule between observations (markdown only). return_string: If True, returns the markdown string. If False (default in notebooks),
only displays the markdown without returning.
format: Output format - either “markdown” or “docx”. filename: If provided and format is “docx”, saves the document to this file.
- Returns:
Depending on format and return_string: - For markdown: A string if return_string is True, otherwise None (displays in notebook) - For docx: A docx.Document object, or None if filename is provided (saves to file)
- Examples:
>>> from edsl.results import Results >>> r = Results.example() >>> report = r.select('how_feeling').report(return_string=True) >>> "# Observation: 1" in report True >>> doc = r.select('how_feeling').report(format="docx") >>> isinstance(doc, object) True
- report_from_template(template: str, *fields: str | None, top_n: int | None = None, remove_prefix: bool = True, return_string: bool = False, format: str = 'text', filename: str | None = None, separator: str = '\n\n', observation_title_template: str | None = None, explode: bool = False, filestore: bool = False) str | Document | List | FileStore | None [source]
Generates a report using a Jinja2 template for each row in the dataset.
This method renders a user-provided Jinja2 template for each observation in the dataset, with template variables populated from the row data. This allows for completely customized report formatting using pandoc for advanced output formats.
- Args:
template: Jinja2 template string to render for each row *fields: The fields to include in template context. If none provided, all fields are used. top_n: Optional limit on the number of observations to include. remove_prefix: Whether to remove type prefixes (e.g., “answer.”) from field names in template context. return_string: If True, returns the rendered content. If False (default in notebooks),
only displays the content without returning.
format: Output format - one of “text”, “html”, “pdf”, or “docx”. Formats other than “text” require pandoc. filename: If provided, saves the rendered content to this file. For exploded output,
this becomes a template (e.g., “report_{index}.html”).
separator: String to use between rendered templates for each row (ignored when explode=True). observation_title_template: Optional Jinja2 template for observation titles.
Defaults to “Observation {index}” where index is 1-based. Template has access to all row data plus ‘index’ and ‘index0’ variables.
explode: If True, creates separate files for each observation instead of one combined file. filestore: If True, wraps the generated file(s) in FileStore object(s). If no filename is provided,
creates temporary files. For exploded output, returns a list of FileStore objects.
- Returns:
Depending on explode, format, return_string, and filestore: - For text format: String content or None (if displayed in notebook) - For html format: HTML string content or None (if displayed in notebook) - For docx format: Document object or None (if saved to file) - For pdf format: PDF bytes or None (if saved to file) - If explode=True: List of created filenames (when filename provided) or list of documents/content - If filestore=True: FileStore object(s) containing the generated file(s)
- Notes:
Pandoc is required for HTML, PDF, and DOCX output formats
Templates are treated as Markdown for all non-text formats
PDF output uses XeLaTeX engine through pandoc
HTML output includes standalone document structure
- Examples:
>>> from edsl.results import Results >>> r = Results.example() >>> template = "Person feels: {{ how_feeling }}" >>> report = r.select('how_feeling').report_from_template(template, return_string=True) >>> "Person feels: OK" in report True >>> "Person feels: Great" in report True
# Custom observation titles >>> custom_title = “Response {{ index }}: {{ how_feeling }}” >>> report = r.select(‘how_feeling’).report_from_template( … template, observation_title_template=custom_title, return_string=True) >>> “Response 1: OK” in report True
# HTML output (requires pandoc) >>> html_report = r.select(‘how_feeling’).report_from_template( … template, format=”html”, return_string=True) # doctest: +SKIP >>> # Creates HTML with proper document structure
# PDF output (requires pandoc with XeLaTeX) >>> pdf_report = r.select(‘how_feeling’).report_from_template( … template, format=”pdf”) # doctest: +SKIP >>> # Returns PDF bytes
# Basic template functionality >>> template2 = “Feeling: {{ how_feeling }}, Index: {{ index }}” >>> report2 = r.select(‘how_feeling’).report_from_template( … template2, return_string=True, top_n=2) >>> “Feeling: OK, Index: 1” in report2 True
- sample(n: int | None = None, frac: float | None = None, with_replacement: bool = True, seed: str | None = None) Results [source]
Return a random sample of the results.
- Args:
n: The number of samples to take. frac: The fraction of samples to take (alternative to n). with_replacement: Whether to sample with replacement. seed: Random seed for reproducibility.
- Returns:
Results: A new Results object containing the sampled data.
- save(filename: str | None = None, compress: bool = True)[source]
Save the object to a file as JSON with optional compression.
Serializes the object to JSON and writes it to the specified file. By default, the file will be compressed using gzip. File extensions are handled automatically.
- Args:
filename: Path where the file should be saved compress: If True, compress the file using gzip (default: True)
- Returns:
None
- Examples:
>>> obj.save("my_object.json.gz") # Compressed >>> obj.save("my_object.json", compress=False) # Uncompressed
- property scenario_keys: list[str][source]
Return a set of all of the keys that are in the Scenario data.
>>> r = Results.example() >>> r.scenario_keys ['period', 'scenario_index']
- property scenarios: ScenarioList[source]
Return a list of all of the scenarios in the Results.
Example:
>>> r = Results.example() >>> r.scenarios ScenarioList([Scenario({'period': 'morning'}), Scenario({'period': 'afternoon'}), Scenario({'period': 'morning'}), Scenario({'period': 'afternoon'})])
- score(f: Callable) list [source]
Score the results using a function.
This method delegates to the ResultsScorer class to handle the scoring operation.
- Args:
f: A function that takes values from a Result object and returns a score.
- Returns:
list: A list of scores, one for each Result object.
- Examples:
>>> r = Results.example() >>> def f(status): return 1 if status == 'Joyful' else 0 >>> r.score(f) [1, 1, 0, 0]
- score_with_answer_key(answer_key: dict) list [source]
Score the results using an answer key.
This method delegates to the ResultsScorer class to handle the scoring operation.
- Args:
answer_key: A dictionary that maps answer values to scores.
- Returns:
list: A list of scores, one for each Result object.
- select(*columns: str | list[str]) Dataset [source]
Extract specific columns from the Results into a Dataset.
This method allows you to select specific columns from the Results object and transforms the data into a Dataset for further analysis and visualization. A Dataset is a more general-purpose data structure optimized for analysis operations rather than the hierarchical structure of Result objects.
- Args:
- *columns: Column names to select. Each column can be:
A simple attribute name (e.g., “how_feeling”)
A fully qualified name with type (e.g., “answer.how_feeling”)
A wildcard pattern (e.g., “answer.*” to select all answer fields)
If no columns are provided, selects all data.
- Returns:
A Dataset object containing the selected data.
- Notes:
Column names are automatically disambiguated if needed
When column names are ambiguous, specify the full path with data type
You can use wildcard patterns with “*” to select multiple related fields
Selecting with no arguments returns all data
Results are restructured in a columnar format in the Dataset
- Examples:
>>> results = Results.example()
>>> # Select a single column by name >>> results.select('how_feeling') Dataset([{'answer.how_feeling': ['OK', 'Great', 'Terrible', 'OK']}])
>>> # Select multiple columns >>> ds = results.select('how_feeling', 'how_feeling_yesterday') >>> sorted([list(d.keys())[0] for d in ds]) ['answer.how_feeling', 'answer.how_feeling_yesterday']
>>> # Using fully qualified names with data type >>> results.select('answer.how_feeling') Dataset([{'answer.how_feeling': ['OK', 'Great', 'Terrible', 'OK']}])
>>> # Using partial matching for column names >>> results.select('answer.how_feeling_y') Dataset([{'answer.how_feeling_yesterday': ['Great', 'Good', 'OK', 'Terrible']}])
>>> # Select all columns (same as calling select with no arguments) >>> results.select('*.*') Dataset([...])
- property shelf_keys: set[source]
Return a copy of the set of shelved result keys.
This property delegates to the ResultsSerializer class.
- shelve_result(result: Result) str [source]
Store a Result object in persistent storage using its hash as the key.
This method delegates to the ResultsSerializer class to handle the shelving operation.
- Args:
result: A Result object to store
- Returns:
str: The hash key for retrieving the result later
- Raises:
ResultsError: If there’s an error storing the Result
- show_methods(show_docstrings=True)[source]
Display all public methods available on this object.
This utility method helps explore the capabilities of an object by listing all its public methods and optionally their documentation.
- Args:
- show_docstrings: If True, print method names with docstrings;
if False, return the list of method names
- Returns:
- None or list: If show_docstrings is True, prints methods and returns None.
If show_docstrings is False, returns a list of method names.
- shuffle(seed: str | None = 'edsl') Results [source]
Return a shuffled copy of the results using Fisher-Yates algorithm.
- Args:
seed: Random seed for reproducibility.
- Returns:
Results: A new Results object with shuffled data.
- sort_by(*columns: str, reverse: bool = False) Results [source]
Sort the results by one or more columns.
This method delegates to the ResultsTransformer class to handle the sorting operation.
- Args:
columns: One or more column names as strings. reverse: A boolean that determines whether to sort in reverse order.
- Returns:
Results: A new Results object with sorted data.
- Examples:
>>> r = Results.example() >>> sorted_results = r.order_by('how_feeling') >>> len(sorted_results) == len(r) True
- spot_issues(models: ModelList | None = None) Results [source]
Run a survey to spot issues and suggest improvements for prompts that had no model response.
This method delegates to the ResultsAnalyzer class to handle the analysis and debugging.
- Args:
models: Optional ModelList to use for the analysis. If None, uses the default model.
- Returns:
Results: A new Results object containing the analysis and suggestions for improvement.
- Notes:
Future version: Allow user to optionally pass a list of questions to review, regardless of whether they had a null model response.
- sql(query: str, transpose: bool = None, transpose_by: str = None, remove_prefix: bool = True, shape: str = 'wide') Dataset [source]
Execute SQL queries on the dataset.
This powerful method allows you to use SQL to query and transform your data, combining the expressiveness of SQL with EDSL’s data structures. It works by creating an in-memory SQLite database from your data and executing the query against it.
- Parameters:
query: SQL query string to execute transpose: Whether to transpose the resulting table (rows become columns) transpose_by: Column to use as the new index when transposing remove_prefix: Whether to remove type prefixes (e.g., “answer.”) from column names shape: Data shape to use (“wide” or “long”)
“wide”: Default tabular format with columns for each field
“long”: Melted format with key-value pairs, useful for certain queries
- Returns:
A Dataset object containing the query results
- Notes:
The data is stored in a table named “self” in the SQLite database
In wide format, column names include their type prefix unless remove_prefix=True
In long format, the data is melted into columns: row_number, key, value, data_type
Complex objects like lists and dictionaries are converted to strings
- Examples:
>>> from edsl import Results >>> r = Results.example()
# Basic selection >>> len(r.sql(“SELECT * FROM self”, shape=”wide”)) 4
# Filtering with WHERE clause >>> r.sql(“SELECT * FROM self WHERE how_feeling = ‘Great’”).num_observations() 1
# Aggregation >>> r.sql(“SELECT how_feeling, COUNT(*) as count FROM self GROUP BY how_feeling”).keys() [‘how_feeling’, ‘count’]
# Using long format >>> len(r.sql(“SELECT * FROM self”, shape=”long”)) 200
- store(d: dict, key_name: str | None = None)[source]
Store this object in a dictionary with an optional key.
- Args:
d: The dictionary in which to store the object key_name: Optional key to use (defaults to the length of the dictionary)
- Returns:
None
- table(*fields, tablefmt: str | None = 'rich', pretty_labels: dict | None = None, print_parameters: dict | None = None)[source]
- tally(*fields: str | None, top_n: int | None = None, output='Dataset') dict | Dataset [source]
Count frequency distributions of values in specified fields.
This method tallies the occurrence of unique values within one or more fields, similar to a GROUP BY and COUNT in SQL. When multiple fields are provided, it performs cross-tabulation across those fields.
- Parameters:
*fields: Field names to tally. If none provided, uses all available fields. top_n: Optional limit to return only the top N most frequent values. output: Format for results, either “Dataset” (recommended) or “dict”.
- Returns:
By default, returns a Dataset with columns for the field(s) and a ‘count’ column. If output=”dict”, returns a dictionary mapping values to counts.
- Notes:
For single fields, returns counts of each unique value
For multiple fields, returns counts of each unique combination of values
Results are sorted in descending order by count
Fields can be specified with or without their type prefix
- Examples:
>>> from edsl import Results >>> r = Results.example()
# Single field frequency count >>> r.select(‘how_feeling’).tally(‘answer.how_feeling’, output=”dict”) {‘OK’: 2, ‘Great’: 1, ‘Terrible’: 1}
# Return as Dataset (default) >>> from edsl.dataset import Dataset >>> expected = Dataset([{‘answer.how_feeling’: [‘OK’, ‘Great’, ‘Terrible’]}, {‘count’: [2, 1, 1]}]) >>> r.select(‘how_feeling’).tally(‘answer.how_feeling’, output=”Dataset”) == expected True
# Multi-field cross-tabulation - exact output varies based on data >>> result = r.tally(‘how_feeling’, ‘how_feeling_yesterday’) >>> ‘how_feeling’ in result.keys() and ‘how_feeling_yesterday’ in result.keys() and ‘count’ in result.keys() True
- to_agent_list(remove_prefix: bool = True)[source]
Convert the results to a list of dictionaries, one per agent.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').to_agent_list() AgentList([Agent(traits = {'how_feeling': 'OK'}), Agent(traits = {'how_feeling': 'Great'}), Agent(traits = {'how_feeling': 'Terrible'}), Agent(traits = {'how_feeling': 'OK'})])
- to_csv(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None) FileStore [source]
Export the results to a FileStore instance containing CSV data.
- to_dataset() Dataset [source]
Convert this object to a Dataset for advanced data operations.
- Returns:
Dataset: A Dataset object containing this object’s data
- to_dict(sort: bool = False, add_edsl_version: bool = True, include_cache: bool = True, include_task_history: bool = False, include_cache_info: bool = True, offload_scenarios: bool = True, full_dict: bool = False) dict[str, Any] [source]
Convert the Results object to a dictionary representation.
This method delegates to the ResultsSerializer class to handle the conversion of the Results object to a dictionary format suitable for serialization.
- Args:
sort: Whether to sort the results data by hash before serialization add_edsl_version: Whether to include the EDSL version in the output include_cache: Whether to include cache data in the output include_task_history: Whether to include task history in the output include_cache_info: Whether to include cache information in result data offload_scenarios: Whether to optimize scenarios before serialization
- Returns:
dict[str, Any]: Dictionary representation of the Results object
- to_dicts(remove_prefix: bool = True) list[dict] [source]
Convert the results to a list of dictionaries.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').to_dicts() [{'how_feeling': 'OK'}, {'how_feeling': 'Great'}, {'how_feeling': 'Terrible'}, {'how_feeling': 'OK'}]
- to_disk(filepath: str) None [source]
Serialize the Results object to a zip file, preserving the SQLite database.
This method delegates to the ResultsSerializer class to handle the disk serialization.
This method creates a zip file containing: 1. The SQLite database file from the data container 2. A metadata.json file with the survey, created_columns, and other non-data info 3. The cache data if present
- Args:
filepath: Path where the zip file should be saved
- Raises:
ResultsError: If there’s an error during serialization
- to_docx(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None) FileStore [source]
Export the results to a FileStore instance containing DOCX data.
Each row of the dataset will be rendered on its own page, with a 2-column table that lists the keys and associated values for that observation.
- to_excel(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None, sheet_name: str | None = None)[source]
Export the results to a FileStore instance containing Excel data.
- to_json()[source]
Serialize this object to a JSON string.
- Returns:
str: A JSON string representation of the object
- to_jsonl(filename: str | None = None)[source]
Export the results to a FileStore instance containing JSONL data.
- to_list(flatten=False, remove_none=False, unzipped=False) list[list] [source]
Convert the results to a list of lists.
- Parameters:
flatten – Whether to flatten the list of lists.
remove_none – Whether to remove None values from the list.
>>> from edsl.results import Results >>> Results.example().select('how_feeling', 'how_feeling_yesterday') Dataset([{'answer.how_feeling': ['OK', 'Great', 'Terrible', 'OK']}, {'answer.how_feeling_yesterday': ['Great', 'Good', 'OK', 'Terrible']}])
>>> Results.example().select('how_feeling', 'how_feeling_yesterday').to_list() [('OK', 'Great'), ('Great', 'Good'), ('Terrible', 'OK'), ('OK', 'Terrible')]
>>> r = Results.example() >>> r.select('how_feeling').to_list() ['OK', 'Great', 'Terrible', 'OK']
>>> from edsl.dataset import Dataset >>> Dataset([{'a.b': [[1, 9], 2, 3, 4]}]).select('a.b').to_list(flatten = True) [1, 9, 2, 3, 4]
>>> from edsl.dataset import Dataset >>> # Testing to_list flatten with multiple columns raises DatasetValueError - tested in unit tests
- to_pandas(remove_prefix: bool = False, lists_as_strings=False)[source]
Convert the results to a pandas DataFrame, ensuring that lists remain as lists.
- Args:
remove_prefix: Whether to remove the prefix from the column names. lists_as_strings: Whether to convert lists to strings.
- Returns:
A pandas DataFrame.
- to_polars(remove_prefix: bool = False, lists_as_strings=False)[source]
Convert the results to a Polars DataFrame.
- Args:
remove_prefix: Whether to remove the prefix from the column names. lists_as_strings: Whether to convert lists to strings.
- Returns:
A Polars DataFrame.
- to_scenario_list(remove_prefix: bool = True) list[dict] [source]
Convert the results to a list of dictionaries, one per scenario.
- Parameters:
remove_prefix – Whether to remove the prefix from the column names.
>>> from edsl.results import Results >>> r = Results.example() >>> r.select('how_feeling').to_scenario_list() ScenarioList([Scenario({'how_feeling': 'OK'}), Scenario({'how_feeling': 'Great'}), Scenario({'how_feeling': 'Terrible'}), Scenario({'how_feeling': 'OK'})])
- to_sqlite(filename: str | None = None, remove_prefix: bool = False, pretty_labels: dict | None = None, table_name: str = 'results', if_exists: str = 'replace')[source]
Export the results to a SQLite database file.
- to_yaml(add_edsl_version=False, filename: str = None) str | None [source]
Convert the object to YAML format.
Serializes the object to YAML format and optionally writes it to a file.
- Args:
add_edsl_version: Whether to include EDSL version information filename: If provided, write the YAML to this file path
- Returns:
str: The YAML string representation if no filename is provided None: If written to file
- tree(node_order: List[str] | None = None)[source]
Convert the results to a Tree.
- Args:
node_order: The order of the nodes.
- Returns:
A Tree object.
- unpack_list(field: str, new_names: List[str] | None = None, keep_original: bool = True) Dataset [source]
Unpack list columns into separate columns with provided names or numeric suffixes.
For example, if a dataset contains: [{‘data’: [[1, 2, 3], [4, 5, 6]], ‘other’: [‘x’, ‘y’]}]
After d.unpack_list(‘data’), it should become: [{‘other’: [‘x’, ‘y’], ‘data_1’: [1, 4], ‘data_2’: [2, 5], ‘data_3’: [3, 6]}]
- Args:
field: The field containing lists to unpack new_names: Optional list of names for the unpacked fields. If None, uses numeric suffixes. keep_original: If True, keeps the original field in the dataset
- Returns:
A new Dataset with unpacked columns
- Examples:
>>> from edsl.dataset import Dataset >>> d = Dataset([{'data': [[1, 2, 3], [4, 5, 6]]}]) >>> d.unpack_list('data') Dataset([{'data': [[1, 2, 3], [4, 5, 6]]}, {'data_1': [1, 4]}, {'data_2': [2, 5]}, {'data_3': [3, 6]}])
>>> d.unpack_list('data', new_names=['first', 'second', 'third']) Dataset([{'data': [[1, 2, 3], [4, 5, 6]]}, {'first': [1, 4]}, {'second': [2, 5]}, {'third': [3, 6]}])