Using EDSL to sense check data

This notebook provides example code for sense checking survey data using EDSL, an open-source library for simulating surveys, experiments and market research with AI agents and large language models.

Contents

Using a set of responses to a survey about online marketplaces as an example, we demonstrate EDSL methods for:

Evaluating survey questions (e.g., for clarity and improvements)
Analyzing each respondent’s set of answers (e.g., to summarize or identify sentiment, themes, etc.)
Reviewing each answer individually (e.g., to evaluate its relevance or usefulness)

Coop

We also show how to post EDSL questions, surveys, results and notebooks (like this one) to Coop: an integrated platform for creating and sharing LLM-based research.

How EDSL works

EDSL is a flexible library that can be used to perform a broad variety of research tasks. A typical workflow consists of the following steps:

Construct questions
Add data to the questions (e.g., for data labeling tasks)
Use an AI agent to answer the questions
Select a language model to generate the answers
Analyze results in a formatted dataset

Technical setup

Before running the code below please ensure that you have completed setup:

Install the EDSL library.
Create a Coop account and activate remote inference OR store your own API Keys for language models that you want to use.

Our Starter Tutorial provides examples of EDSL basic components.

Example data

Our example data is a CSV consisting of several questions and a few rows of responses. Here we store it at the Coop and then re-import it.

To post an object:

from edsl import FileStore

fs = FileStore("marketplace_survey_results.csv")
fs.push(
    description = "mock marketplace survey results",
    alias = "mock-marketplace-survey-results",
    visibility = "public"
)

This returns details of the object we can use to retrieve it:

{'description': 'mock marketplace survey results',
 'object_type': 'scenario',
 'url': 'https://www.expectedparrot.com/content/2e56db7a-27de-4c40-bb72-36a5b1d21e6a',
 'alias_url': 'https://www.expectedparrot.com/content/RobinHorton/mock-marketplace-survey-results',
 'uuid': '2e56db7a-27de-4c40-bb72-36a5b1d21e6a',
 'version': '0.1.62.dev1',
 'visibility': 'public'}

[1]:

from edsl import FileStore

[2]:

csv_file = FileStore.pull("https://www.expectedparrot.com/content/RobinHorton/mock-marketplace-survey-results")

Creating questions about the data

There are many questions we might want to ask about the data, such as:

Does this survey question have any logical or syntactical problems? {{ question }}
What is the overall sentiment of this respondent’s answers? {{ responses }}
Is this answer responsive to the question that was asked? {{ question }} {{ answer }}

Question types

EDSL comes with many common question types that we can select from based on the form of the response that we want to get back from the model: multiple choice, checkbox, linear scale, free text, etc. Learn more about EDSL question types.

Here we construct Question objects for the questions that we want to ask about the data, using {{ placeholders }} for the information that we will add to the questions in the steps that follow:

[3]:

from edsl import QuestionFreeText, QuestionMultipleChoice, QuestionYesNo

[4]:

q_logic = QuestionFreeText(
    question_name = "logic",
    question_text = """
    Describe any logical or syntactical problems in the following survey question:
    {{ scenario.question }}
    """
)

[5]:

q_sentiment = QuestionMultipleChoice(
    question_name = "sentiment",
    question_text = """
    Identify the overall sentiment of this respondent's survey answers:
    {{ scenario.responses }}
    """,
    question_options = ["Very unsatisfied", "Somewhat unsatisfied", "Somewhat satisfied", "Very satisfied"]
)

[6]:

q_responsive = QuestionYesNo(
    question_name = "responsive",
    question_text = """
    Is this answer responsive to the question that was asked?
    Question: {{ scenario.question }}
    Answer: {{ scenario.answer }}
    """
)

Adding survey data to the questions

Next we’ll add our data to our questions. This can be done efficiently by creating a ScenarioList representing the data. The individual Scenario objects in the list can be constructed in a variety of ways depending on the information that we want to include in a particular question.

We start by calling the from_source() method to create a ScenarioList for the data in its original form. We can see that this generates a Scenario dictionary for each respondent’s set of answers with key/value pairs for the individual questions and answers:

[7]:

from edsl import ScenarioList

[8]:

sl = ScenarioList.from_source("csv", csv_file.to_tempfile()) # equivalent to importing a local file
sl

[8]:

ScenarioList scenarios: 3; keys: ['Respondent ID', 'What do you like most about using our online marketplace?', 'What is one feature you would like to see added to improve your shopping experience?', 'Is there anything else you would like to share about your experience with us?', 'How do you feel about the current product search and filtering options?', 'Can you describe a recent experience where you were dissatisfied with our service?'];

	Respondent ID	What do you like most about using our online marketplace?	What is one feature you would like to see added to improve your shopping experience?	Can you describe a recent experience where you were dissatisfied with our service?	How do you feel about the current product search and filtering options?	Is there anything else you would like to share about your experience with us?
0	101	The wide variety of products and the ease of use.	It would be great to have a personalized recommendation system based on my browsing history.	I was disappointed when an item I ordered arrived damaged, but customer service quickly resolved it.	The search and filtering options are intuitive and work well for me.	No, keep up the great work!
1	102	I enjoy the simplicity of the interface.	A feature that helps compare similar products side by side would be useful.	No complaints here.	I find the product search to be pretty effective.	I think the sky is a beautiful shade of purple today.
2	103	The platform is user-friendly and offers a vast selection of products.	Would love to see an option to save and compare different products.	My delivery was late by a few days, which was frustrating.	It’s okay.	No.

Evaluating the questions

For our first question we want to create a Scenario for each survey question:

[9]:

from edsl import QuestionFreeText, Survey

q_logic = QuestionFreeText(
    question_name = "logic",
    question_text = """
    Describe any logical or syntactical problems in the following survey question:
    {{ scenario.question }}
    """
)

q_improved = QuestionFreeText(
    question_name = "improved",
    question_text = """
    You were previously asked to describe any problems with the following survey question:
    {{ scenario.question }}
    You answered: '{{ logic.answer }}'
    Now please draft an improved version of this survey question.
    Return only the revised question text.
    """
)

survey = Survey(questions = [q_logic, q_improved])

The survey questions are the parameters of the ScenarioList created above:

[10]:

questions = list(sl.parameters)
questions

[10]:

['Respondent ID',
 'What do you like most about using our online marketplace?',
 'What is one feature you would like to see added to improve your shopping experience?',
 'Is there anything else you would like to share about your experience with us?',
 'How do you feel about the current product search and filtering options?',
 'Can you describe a recent experience where you were dissatisfied with our service?']

We can pass them to the from_source() method to create a new ScenarioList, specifying that the key for each Scenario will be question in order to match the parameter of our logic question:

[11]:

sl_questions = ScenarioList.from_source("list", "question", questions)
sl_questions

[11]:

ScenarioList scenarios: 6; keys: ['question'];

	question
0	Respondent ID
1	What do you like most about using our online marketplace?
2	What is one feature you would like to see added to improve your shopping experience?
3	Is there anything else you would like to share about your experience with us?
4	How do you feel about the current product search and filtering options?
5	Can you describe a recent experience where you were dissatisfied with our service?

We select a model to use, and then add the scenarios to the survey when we run it:

[12]:

from edsl import Model

m = Model("gemini-1.5-flash", service_name = "google")

[13]:

results = survey.by(sl_questions).by(m).run()

⌃ Job Status 🦜

Completed (6 completed, 0 failed)

Job Links

Results

Progress Report

Content

Remote Jobs

Remote Cache

Identifiers

Results UUID:

f73ff440...fb48

Use Results.pull(uuid) to fetch results.

Job UUID:

1aa97373...7b6f

Use Jobs.pull(uuid) to fetch job.