Results

A Results object represents the outcome of running a Survey. It contains a list of individual Result objects, where each Result corresponds to a response to the survey for a unique combination of Agent, Model, and Scenario objects used with the survey.

For example, if a survey (of one more more questions) is administered to 2 agents and 2 language models (without any scenarios for the questions), the Results will contain 4 Result objects: one for each combination of agent and model used with the survey. If the survey questions are parameterized with 2 scenarios, the Results will expand to include 8 Result objects, accounting for all combinations of agents, models, and scenarios.

Generating results

A Results object is not typically instantiated directly, but is returned by calling the run() method of a Survey after any agents, language models and scenarios are added to it.

In order to demonstrate how to access and interact with results, we use the following code to generate results for a simple survey. Note that specifying agent traits, scenarios (question parameter values) and language models is optional, and we include those steps here for illustration purposes. See the Agents, Scenarios and models sections for more details on these components.

Note: You must store API keys for language models in order to generate results. Please see the Managing Keys section for instructions on activating Remote Inference or storing your own API keys for inference service providers.

To construct a survey we start by creating questions:

from edsl import QuestionLinearScale, QuestionMultipleChoice

q1 = QuestionLinearScale(
  question_name = "important",
  question_text = "On a scale from 1 to 5, how important to you is {{ topic }}?",
  question_options = [0, 1, 2, 3, 4, 5],
  option_labels = {0:"Not at all important", 5:"Very important"}
)

q2 = QuestionMultipleChoice(
  question_name = "read",
  question_text = "Have you read any books about {{ topic }}?",
  question_options = ["Yes", "No", "I do not know"]
)

We combine them in a survey to administer them together:

from edsl import Survey

survey = Survey([q1, q2])

We have parameterized our questions, so we can use them with different scenarios:

from edsl import ScenarioList

scenarios = ScenarioList.from_list("topic", ["climate change", "house prices"])

We can optionally create agents with personas or other relevant traits to answer the survey:

from edsl import AgentList, Agent

agents = AgentList(
  Agent(traits = {"persona": p}) for p in ["student", "celebrity"]
)

We can specify the language models that we want to use to generate responses:

from edsl import ModelList, Model

models = ModelList(
  Model(m) for m in ["gemini-1.5-flash", "gpt-4o"]
)

Finally, we generate results by adding the scenarios, agents and models to the survey and calling the run() method:

results = survey.by(scenarios).by(agents).by(models).run()

For more details on each of the above steps, please see the Agents, Scenarios and models sections of the docs.

Result objects

We can check the number of Result objects created by inspecting the length of the Results:

len(results)

This will count 2 (scenarios) x 2 (agents) x 2 (models) = 8 Result objects:

8

Generating multiple results

If we want to generate multiple results for a survey–i.e., more than 1 result for each combination of Agent, Model and Scenario objects used–we can pass the desired number of iterations when calling the run() method. For example, the following code will generate 3 results for our survey (n=3):

results = survey.by(scenarios).by(agents).by(models).run(n=3)

We can verify that the number of Result objects created is now 24 = 3 iterations x 2 scenarios x 2 agents x 2 models:

len(results)
24

We can readily inspect a result:

results[0]

Output:

key

value

agent:traits

{‘persona’: ‘student’}

scenario:topic

climate change

model:model

gemini-1.5-flash

model:parameters

{‘temperature’: 0.5, ‘topP’: 1, ‘topK’: 1, ‘maxOutputTokens’: 2048, ‘stopSequences’: []}

iteration

0

answer:important

5

answer:read

Yes

prompt:important_user_prompt

{‘text’: ‘On a scale from 1 to 5, how important to you is climate change?nn0 : Not at all importantnn1 : nn2 : nn3 : nn4 : nn5 : Very importantnnOnly 1 option may be selected.nnRespond only with the code corresponding to one of the options. E.g., “1” or “5” by itself.nnAfter the answer, you can put a comment explaining why you chose that option on the next line.’, ‘class_name’: ‘Prompt’}

prompt:important_system_prompt

{‘text’: “You are answering questions as if you were a human. Do not break character. Your traits: {‘persona’: ‘student’}”, ‘class_name’: ‘Prompt’}

prompt:read_user_prompt

{‘text’: ‘nHave you read any books about climate change?nn nYesn nNon nI do not known nnOnly 1 option may be selected.nnRespond only with a string corresponding to one of the options.nnnAfter the answer, you can put a comment explaining why you chose that option on the next line.’, ‘class_name’: ‘Prompt’}

prompt:read_system_prompt

{‘text’: “You are answering questions as if you were a human. Do not break character. Your traits: {‘persona’: ‘student’}”, ‘class_name’: ‘Prompt’}

raw_model_response:important_raw_model_response

{‘candidates’: [{‘content’: {‘parts’: [{‘text’: “5nnIt’s, like, a huge deal. The future of the planet is at stake, you know? We’re talking about everything from extreme weather to rising sea levels – it affects everyone, and it’s something we all need to be seriously concerned about.n”}], ‘role’: ‘model’}, ‘finish_reason’: 1, ‘safety_ratings’: [{‘category’: 8, ‘probability’: 1, ‘blocked’: False}, {‘category’: 10, ‘probability’: 1, ‘blocked’: False}, {‘category’: 7, ‘probability’: 1, ‘blocked’: False}, {‘category’: 9, ‘probability’: 1, ‘blocked’: False}], ‘avg_logprobs’: -0.19062816490561274, ‘token_count’: 0, ‘grounding_attributions’: []}], ‘usage_metadata’: {‘prompt_token_count’: 129, ‘candidates_token_count’: 59, ‘total_token_count’: 188, ‘cached_content_token_count’: 0}}

raw_model_response:important_cost

0.000027

raw_model_response:important_one_usd_buys

36529.685735

raw_model_response:read_raw_model_response

{‘candidates’: [{‘content’: {‘parts’: [{‘text’: “YesnnI’ve read a few articles and some chapters from textbooks for my environmental science class, which touched upon climate change. It’s not exactly the same as reading a whole book dedicated to the topic, but it counts, right?n”}], ‘role’: ‘model’}, ‘finish_reason’: 1, ‘safety_ratings’: [{‘category’: 8, ‘probability’: 1, ‘blocked’: False}, {‘category’: 10, ‘probability’: 1, ‘blocked’: False}, {‘category’: 7, ‘probability’: 1, ‘blocked’: False}, {‘category’: 9, ‘probability’: 1, ‘blocked’: False}], ‘avg_logprobs’: -0.13118227790383732, ‘token_count’: 0, ‘grounding_attributions’: []}], ‘usage_metadata’: {‘prompt_token_count’: 96, ‘candidates_token_count’: 51, ‘total_token_count’: 147, ‘cached_content_token_count’: 0}}

raw_model_response:read_cost

0.000022

raw_model_response:read_one_usd_buys

44444.451200

question_to_attributes:important

{‘question_text’: ‘On a scale from 1 to 5, how important to you is {{ topic }}?’, ‘question_type’: ‘linear_scale’, ‘question_options’: [0, 1, 2, 3, 4, 5]}

question_to_attributes:read

{‘question_text’: ‘Have you read any books about {{ topic }}?’, ‘question_type’: ‘multiple_choice’, ‘question_options’: [‘Yes’, ‘No’, ‘I do not know’]}

generated_tokens:important_generated_tokens

5 It’s, like, a huge deal. The future of the planet is at stake, you know? We’re talking about everything from extreme weather to rising sea levels – it affects everyone, and it’s something we all need to be seriously concerned about.

generated_tokens:read_generated_tokens

Yes I’ve read a few articles and some chapters from textbooks for my environmental science class, which touched upon climate change. It’s not exactly the same as reading a whole book dedicated to the topic, but it counts, right?

comments_dict:important_comment

It’s, like, a huge deal. The future of the planet is at stake, you know? We’re talking about everything from extreme weather to rising sea levels – it affects everyone, and it’s something we all need to be seriously concerned about.

comments_dict:read_comment

I’ve read a few articles and some chapters from textbooks for my environmental science class, which touched upon climate change. It’s not exactly the same as reading a whole book dedicated to the topic, but it counts, right?

cache_keys:important

98d6961d0529335b74f2363ba9b7a8de

cache_keys:read

12af825953d89c1f776bd3af40e37cfb

Results components

Results contain components that can be accessed and analyzed individually or collectively. We can see a list of these components by calling the columns method:

results.columns

The following list will be returned for the results generated by the above code:

The columns include information about each agent, model and corresponding prompts used to simulate the answer to each question and scenario in the survey, together with each raw model response. If the survey was run multiple times (run(n=<integer>)) then the iteration.iteration column will show the iteration number for each result.

Agent information:

  • agent.instruction: The instruction for the agent. This field is the optional instruction that was passed to the agent when it was created.

  • agent.agent_name: This field is always included in any Results object. It contains a unique identifier for each Agent that can be specified when an agent is is created (Agent(name=<name>, traits={<traits_dict>})). If not specified, it is added automatically when results are generated (in the form Agent_0, etc.).

  • agent.persona: Each of the traits that we pass to an agent is represented in a column of the results. Our example code created a “persona” trait for each agent, so our results include a “persona” column for this information. Note that the keys for the traits dictionary should be a valid Python keys.

Answer information:

  • answer.important: Agent responses to the linear scale important question.

  • answer.read: Agent responses to the multiple choice read question.

Cache information:

  • cache_keys.important_cache_key: The cache key for the important question.

  • cache_keys.important_cache_used: Whether the existing cache was used for the important question.

  • cache_keys.read_cache_key: The cache key for the read question.

  • cache_keys.read_cache_used: Whether the existing cache was used for the read question.

Comment information:

A “comment” field is automatically included for every question in a survey other than free text questions, to allow the model to provide additional information about its response. The default instruction for the agent to provide a comment is included in user_prompt for a question, and can be modified or omitted when creating the question. (See the Prompts section for details on modifying user and system prompts, and information about prompts in results below. Comments can also be automatically excluded by passing a parameter include_comment=False a question when creating it.)

  • comment.important_comment: Agent commentary on responses to the important question.

  • comment.read_comment: Agent commentary on responses to the read question.

Generated tokens information:

  • generated_tokens.important_generated_tokens: The generated tokens for the important question.

  • generated_tokens.read_generated_tokens: The generated tokens for the read question.

Iteration information:

The iteration column shows the number of the run (run(n=<integer>)) for the combination of components used (scenarios, agents and models).

Model information:

Each of model columns is a modifiable parameter of the models used to generate the responses.

  • model.frequency_penalty: The frequency penalty for the model.

  • model.logprobs: The logprobs for the model.

  • model.maxOutputTokens: The maximum number of output tokens for the model.

  • model.max_tokens: The maximum number of tokens for the model.

  • model.model: The name of the model used.

  • model.presence_penalty: The presence penalty for the model.

  • model.stopSequences: The stop sequences for the model.

  • model.temperature: The temperature for the model.

  • model.topK: The top k for the model.

  • model.topP: The top p for the model.

  • model.top_logprobs: The top logprobs for the model.

  • model.top_p: The top p for the model.

  • model.use_cache: Whether the model uses cache.

Note: Some of the above fields are particular to specific models, and may have different names (e.g., top_p vs. topP).

Prompt information:

  • prompt.important_system_prompt: The system prompt for the important question.

  • prompt.important_user_prompt: The user prompt for the important question.

  • prompt.read_system_prompt: The system prompt for the read question.

  • prompt.read_user_prompt: The user prompt for the read question.

For more details about prompts, please see the Prompts section.

Question information:

  • question_options.important_question_options: The options for the important question, if any.

  • question_options.read_question_options: The options for the read question, if any.

  • question_text.important_question_text: The text of the important question.

  • question_text.read_question_text: The text of the read question.

  • question_type.important_question_type: The type of the important question.

  • question_type.read_question_type: The type of the read question.

Raw model response information:

  • raw_model_response.important_cost: The cost of the result for the important question, applying the token quanities & prices.

  • raw_model_response.important_one_usd_buys: The number of identical results for the important question that 1USD would cover.

  • raw_model_response.important_raw_model_response: The raw model response for the important question.

  • raw_model_response.read_cost: The cost of the result for the read question, applying the token quanities & prices.

  • raw_model_response.read_one_usd_buys: The number of identical results for the read question that 1USD would cover.

  • raw_model_response.read_raw_model_response: The raw model response for the read question.

Note that the cost of a result for a question is specific to the components (scenario, agent, model used with it).

Scenario information:

  • scenario.scenario_index: The index of the scenario.

  • scenario.topic: The values provided for the “topic” scenario for the questions.

Creating tables by selecting columns

Each of these columns can be accessed directly by calling the select() method and passing the column names. Alternatively, we can specify the columns to exclude by calling the drop() method. These methods can be chained together to display the specified columns in a table format.

For example, the following code will print a table showing the answers for read and important together with model, persona and topic columns (because the column names are unique we can drop the model, agent, scenario and answer prefixes when selecting them):

results = survey.by(scenarios).by(agents).by(models).run() # Running the survey once
results.select("model", "persona", "topic", "read", "important")

A table with the selected columns will be printed:

model.model

agent.persona

scenario.topic

answer.read

answer.important

gemini-1.5-flash

student

climate change

Yes

5

gpt-4o

student

climate change

Yes

5

gemini-1.5-flash

student

house prices

No

1

gpt-4o

student

house prices

No

3

gemini-1.5-flash

celebrity

climate change

Yes

5

gpt-4o

celebrity

climate change

Yes

5

gemini-1.5-flash

celebrity

house prices

Yes

3

gpt-4o

celebrity

house prices

No

3

Sorting results

We can sort the columns by calling the sort_by method and passing it the column names to sort by:

(
  results
  .sort_by("model", "persona", reverse=False)
  .select("model", "persona", "topic", "read", "important")
)

The following table will be printed:

model.model

agent.persona

scenario.topic

answer.read

answer.important

gemini-1.5-flash

celebrity

climate change

Yes

5

gemini-1.5-flash

celebrity

house prices

Yes

3

gemini-1.5-flash

student

climate change

Yes

5

gemini-1.5-flash

student

house prices

No

1

gpt-4o

celebrity

climate change

Yes

5

gpt-4o

celebrity

house prices

No

3

gpt-4o

student

climate change

Yes

5

gpt-4o

student

house prices

No

3

Labeling results

We can also add some table labels by passing a dictionary to the pretty_labels argument of the print method (note that we need to include the column prefixes when specifying the table labels, as shown below):

(
  results
  .sort_by("model", "persona", reverse=True)
  .select("model", "persona", "topic", "read", "important")
  .print(pretty_labels={
      "model.model": "LLM",
      "agent.persona": "Agent",
      "scenario.topic": "Topic",
      "answer.read": q2.question_text,
      "answer.important": q1.question_text
      }, format="rich")
)

The following table will be printed:

LLM

Agent

Topic

Have you read any books about {{ topic }}?

On a scale from 1 to 5, how important to you is {{ topic }}?

gpt-4o

student

climate change

Yes

5

gpt-4o

student

house prices

No

3

gpt-4o

celebrity

climate change

Yes

5

gpt-4o

celebrity

house prices

No

3

gemini-1.5-flash

student

climate change

Yes

5

gemini-1.5-flash

student

house prices

No

1

gemini-1.5-flash

celebrity

climate change

Yes

5

gemini-1.5-flash

celebrity

house prices

Yes

3

Filtering results

Results can be filtered by using the filter method and passing it a logical expression identifying the results that should be selected. For example, the following code will filter results where the answer to important is “5” and then just print the topic and important_comment columns:

(
  results
  .filter("important == 5")
  .select("topic", "important", "important_comment")
)

This will return an abbreviated table:

scenario.topic

answer.important

comment.important_comment

climate change

5

It’s, like, a huge deal. The future of the planet is at stake, and that affects everything – from the environment to the economy to social justice. It’s something I worry about a lot.

climate change

5

As a student, I’m really concerned about climate change because it affects our future and the planet we’ll inherit. It’s crucial to understand and address it to ensure a sustainable world for generations to come.

climate change

5

It’s a huge issue, you know? We only have one planet, and if we don’t take care of it, what kind of world are we leaving for future generations? It’s not just about polar bears; it’s about everything. It’s my responsibility, as someone with a platform, to speak out about it.

climate change

5

Climate change is a critical issue that affects everyone globally, and as a public figure, I believe it’s important to use my platform to raise awareness and advocate for sustainable practices.

Note: The filter method allows us to pass the unique short names of the columns (without the prefixes) when specifying the logical expression. However, because the model.model column name is also a prefix, we need to include the prefix when filtering by this column, as shown in the example below:

(
  results
  .filter("model.model == 'gpt-4o'")
  .select("model", "persona", "topic", "read", "important")
)

This will return a table of results where the model is “gpt-4o”:

model.model

agent.persona

scenario.topic

answer.read

answer.important

gpt-4o

student

climate change

Yes

5

gpt-4o

student

house prices

No

3

gpt-4o

celebrity

climate change

Yes

5

gpt-4o

celebrity

house prices

No

3

Limiting results

We can select and print a limited number of results by passing the desired number of max_rows to the print() method. This can be useful for quickly checking the first few results:

(
  results
  .select("model", "persona", "topic", "read", "important")
  .print(max_rows=4, format="rich")
)

This will return a table of the selected components of the first 4 results:

model.model

agent.persona

scenario.topic

answer.read

answer.important

gemini-1.5-flash

student

climate change

Yes

5

gpt-4o

student

climate change

Yes

5

gemini-1.5-flash

student

house prices

No

1

gpt-4o

student

house prices

No

3

Sampling results

We can select a sample of n results by passing the desired number of random results to the sample() method. This can be useful for checking a random subset of the results with different parameters:

sample_results = results.sample(2)

(
  sample_results
  .sort_by("model")
  .select("model", "persona", "topic", "read", "important")
)

This will return a table of the specified number of randomly selected results:

model.model

agent.persona

scenario.topic

answer.read

answer.important

gpt-4o

celebrity

house prices

No

3

gpt-4o

celebrity

climate change

Yes

5

Shuffling results

We can shuffle results by calling the shuffle() method. This can be useful for quickly checking the first few results:

shuffle_results = results.shuffle()

(
  shuffle_results
  .select("model", "persona", "topic", "read", "important")
)

This will return a table of shuffled results:

model.model

agent.persona

scenario.topic

answer.read

answer.important

gemini-1.5-flash

celebrity

climate change

Yes

5

gpt-4o

student

house prices

No

3

gemini-1.5-flash

celebrity

house prices

Yes

3

gemini-1.5-flash

student

house prices

No

1

gpt-4o

celebrity

house prices

No

3

gpt-4o

celebrity

climate change

Yes

5

gpt-4o

student

climate change

Yes

5

gemini-1.5-flash

student

climate change

Yes

5

Adding results

We can add results together straightforwardly by using the + operator:

add_results = results + results

We can see that the results have doubled:

len(add_results)

This will return the number of results:

16

Flattening results

If a field of results contains dictionaries we can flatten them into separate fields by calling the flatten() method. This method takes a list of the fields to flatten and a boolean indicator whether to preserve the original fields in the new Results object that is returned.

For example:

from edsl import QuestionDict, Model

 m = Model("gemini-1.5-flash")

 q = QuestionDict(
   question_name = "recipe",
   question_text = "Please provide a simple recipe for hot chocolate.",
   answer_keys = ["title", "ingredients", "instructions"]
 )

 r = q.by(m).run()

 r.select("model", "recipe").flatten(field="answer.recipe", keep_original=True)

This will return a table of the flattened results:

model.model

answer.recipe

answer.recipe.title

answer.recipe.ingredients

answer.recipe.instructions

gemini-1.5-flash

{‘title’: ‘Simple Hot Chocolate’, ‘ingredients’: [‘1 cup milk (dairy or non-dairy)’, ‘1 tablespoon unsweetened cocoa powder’, ‘1-2 tablespoons sugar (or to taste)’, ‘Pinch of salt’], ‘instructions’: [‘Combine milk, cocoa powder, sugar, and salt in a small saucepan.’, ‘Heat over medium heat, stirring constantly, until the mixture is smooth and heated through.’, ‘Do not boil.’, ‘Pour into a mug and enjoy!’]}

Simple Hot Chocolate

[‘1 cup milk (dairy or non-dairy)’, ‘1 tablespoon unsweetened cocoa powder’, ‘1-2 tablespoons sugar (or to taste)’, ‘Pinch of salt’]

[‘Combine milk, cocoa powder, sugar, and salt in a small saucepan.’, ‘Heat over medium heat, stirring constantly, until the mixture is smooth and heated through.’, ‘Do not boil.’, ‘Pour into a mug and enjoy!’]

Generating a report

We can create a report of the results by calling the report() method and passing the columns to be included (all columns are included by default). This generates a report in markdown by iterating through the rows, presented as observations. You can optionally pass headers, a divider and a limit on the number of observations to include. It can be useful if you want to display some sample part of larger results in a working notebook you are sharing.

For example, the following code will generate a report of the first 4 results:

from edsl import QuestionFreeText, ScenarioList, Model

m = Model("gemini-1.5-flash")

s = ScenarioList.from_list("language", ["German", "Dutch", "French", "English"])

q = QuestionFreeText(
    question_name = "poem",
    question_text = "Please write me a short poem about winter in {{ language }}."
)

r = q.by(s).by(m).run()

r.select("model", "poem", "language").report(top_n=2, divider=False, return_string=True)

This will return a report of the first 2 results:

Observation: 1

model.model
gemini-1.5-flash

answer.poem
Der Schnee fällt leis', ein weicher Flor, Die Welt in Weiß, ein Zauberchor. Die Bäume stehn, in Stille gehüllt, Der Winterwind, sein Lied erfüllt.

(Translation: The snow falls softly, a gentle veil, / The world in white, a magic choir. / The trees stand, wrapped in silence, / The winter wind, its song fulfilled.)

scenario.language
German

Observation: 2
model.model
gemini-1.5-flash

answer.poem
De winter komt, de dagen kort, De sneeuw valt zacht, een wit decor. De bomen staan, kaal en stil, Een ijzige wind, een koude tril.

(Translation: Winter comes, the days are short, / The snow falls softly, a white décor. / The trees stand, bare and still, / An icy wind, a cold shiver.)

scenario.language
Dutch

"# Observation: 1\n## model.model\ngemini-1.5-flash\n## answer.poem\nDer Schnee fällt leis', ein weicher Flor,\nDie Welt in Weiß, ein Zauberchor.\nDie Bäume stehn, in Stille gehüllt,\nDer Winterwind, sein Lied erfüllt.\n\n(Translation: The snow falls softly, a gentle veil, / The world in white, a magic choir. / The trees stand, wrapped in silence, / The winter wind, its song fulfilled.)\n## scenario.language\nGerman\n\n---\n\n# Observation: 2\n## model.model\ngemini-1.5-flash\n## answer.poem\nDe winter komt, de dagen kort,\nDe sneeuw valt zacht, een wit decor.\nDe bomen staan, kaal en stil,\nEen ijzige wind, een koude tril.\n\n(Translation: Winter comes, the days are short, / The snow falls softly, a white décor. / The trees stand, bare and still, / An icy wind, a cold shiver.)\n## scenario.language\nDutch\n"

Accessing results with SQL

We can interact with results via SQL using the sql method. This is done by passing a SQL query and a shape (“long” or “wide”) for the resulting table, where the table name in the query is “self”.

For example, the following code will return a table showing the model, persona, read and important columns for the first 4 results:

results.sql("select model, persona, read, important from self limit 4")

This following table will be displayed

model

persona

read

important

gemini-1.5-flash

student

Yes

5

gpt-4o

student

Yes

5

gemini-1.5-flash

student

No

1

gpt-4o

student

No

3

Dataframes

We can also export results to other formats. The to_pandas method will turn our results into a Pandas dataframe:

results.to_pandas()

For example, here we use it to create a dataframe consisting of the models, personas and the answers to the important question:

results.to_pandas()[["model.model", "agent.persona", "answer.important"]]

Exporting to CSV or JSON

The to_csv method will write the results to a CSV file:

results.to_pandas().to_csv("results.csv")

The to_json method will write the results to a JSON file:

results.to_pandas().to_json("results.json")

Revising prompts to improve results

If any of your results are missing model responses, you can use the spot_issues() method to help identify the issues and then revise the prompts to improve the results. This method runs a meta-survey of (2) questions for any prompts that generated a bad or null response, and then returns the results of the meta-survey.

The first question in the survey is a QuestionFreeText question which prompts the model to describe the likely issues with the prompts:

The following prompts generated a bad or null response: '{{ original_prompts }}'
What do you think was the likely issue(s)?

The second question in the survey is a QuestionDict question which prompts the model to return a dictionary consisting of revised user and system prompts:

The following prompts generated a bad or null response: '{{ original_prompts }}'
You identified the issue(s) as '{{ issues.answer }}'.
Please revise the prompts to address the issue(s).

You can optionally pass a list of models to use with the meta-survey, instead of the default model.

Example usage:

# Returns a Results object with the results of the meta-survey
results.spot_issues(models=["gpt-4o"])

# You can inspect the metadata for your original prompts together with the results of the meta-survey
results.select(
  "original_question", # The name of the question that generated a bad or null response
  "original_agent_index", # The index of the agent that generated a bad or null response
  "original_scenario_index", # The index of the scenario that generated a bad or null response
  "original_prompts", # The original prompts that generated a bad or null response
  "answer.issues", # Free text description of potential issues in the original prompts
  "answer.revised" # A dictionary of revised user and system prompts
)

See an example of the method.

Exceptions

If any exceptions are raised when the survey is run a detailed exceptions report is generated and can be opened in your browser. See the Exceptions & Debugging section for more information on exceptions.

Result class

The Result class captures the complete data from one agent interview.

A Result object stores the agent, scenario, language model, and all answers provided during an interview, along with metadata such as token usage, caching information, and raw model responses. It provides a rich interface for accessing this data and supports serialization for storage and retrieval.

Key features:

  • Dictionary-like access to all data through the UserDict interface

  • Properties for convenient access to common attributes (agent, scenario, model, answer)

  • Rich data structure with sub-dictionaries for organization

  • Support for scoring results against reference answers

  • Serialization to/from dictionaries for storage

Results are typically created by the Jobs system when running interviews and collected into a Results collection for analysis. You rarely need to create Result objects manually.

Results class

A collection of Result objects with powerful data analysis capabilities.

The Results class is the primary container for working with data from EDSL surveys. It provides a rich set of methods for data analysis, transformation, and visualization inspired by data manipulation libraries like dplyr and pandas. The Results class implements a functional, fluent interface for data manipulation where each method returns a new Results object, allowing method chaining.

Attributes:

survey: The Survey object containing the questions used to generate results. data: A list of Result objects containing the responses. created_columns: A list of column names created through transformations. cache: A Cache object for storing model responses. completed: Whether the Results object is ready for use. task_history: A TaskHistory object containing information about the tasks. known_data_types: List of valid data type strings for accessing data.

Key features:
  • List-like interface for accessing individual Result objects

  • Selection of specific data columns with select()

  • Filtering results with boolean expressions using filter()

  • Creating new derived columns with mutate()

  • Recoding values with recode() and answer_truncate()

  • Sorting results with order_by()

  • Converting to other formats (dataset, table, pandas DataFrame)

  • Serialization for storage and retrieval

  • Support for remote execution and result retrieval

Results objects have a hierarchical structure with the following components:
  1. Each Results object contains multiple Result objects

  2. Each Result object contains data organized by type (agent, scenario, model, answer, etc.)

  3. Each data type contains multiple attributes (e.g., “how_feeling” in the answer type)

You can access data in a Results object using dot notation (answer.how_feeling) or using just the attribute name if it’s not ambiguous (how_feeling).

The Results class also tracks “created columns” - new derived values that aren’t part of the original data but were created through transformations.

Examples:
>>> # Create a simple Results object from example data
>>> r = Results.example()
>>> len(r) > 0  # Contains Result objects
True
>>> # Filter and transform data
>>> filtered = r.filter("how_feeling == 'Great'")
>>> # Access hierarchical data
>>> 'agent' in r.known_data_types
True