Cognitive testing & creating new methods

This notebook shows some ways of using EDSL to conduct research, including data labeling, cognitive testing and creating new methods.

Open In Colab

[1]:
# ! pip install edsl

Cognitive testing

In this example we use the tools to evaluate some draft survey questions and suggest improvements.

[2]:
from edsl.questions import QuestionFreeText
from edsl import Agent, Scenario, Model

Create a relevant persona and assign it to an agent:

[3]:
agent_description = (
    "You are an expert in survey methodology and evaluating questionnaires."
)

agent = Agent(traits={"background": agent_description})

Identify a set of texts for review (these can also be imported):

[4]:
draft_texts = [
    "Do you feel the product is almost always of good quality?",
    "On a scale of 1 to 5, where 1 means strongly agree and 5 means strongly disagree, how satisfied are you with our service?",
    "Do you believe our IT team's collaborative synergy effectively optimizes our digital infrastructure?",
    "What do you think of our recent implementation of Project X57?",
]

Construct a question about the texts, which will be added as a parameter of the question individually:

[5]:
question = QuestionFreeText(
    question_name="review_questions",
    question_text="""Consider the following survey question: {{ draft_text }}
    Identify the problematic phrases in the excerpt and suggestion a revised version of it.""",
)

Create “scenarios” of the question with the texts as paraemeters:

[6]:
scenarios = [Scenario({"draft_text": text}) for text in draft_texts]

Check available language models:

[7]:
Model.available()
[7]:
['claude-3-haiku-20240307',
 'claude-3-opus-20240229',
 'claude-3-sonnet-20240229',
 'dbrx-instruct',
 'gemini_pro',
 'gpt-3.5-turbo',
 'gpt-4-1106-preview',
 'llama-2-13b-chat-hf',
 'llama-2-70b-chat-hf',
 'mixtral-8x7B-instruct-v0.1']

Select a language model (the default is also GPT 4):

[8]:
model = Model("gpt-4-1106-preview")

Administer the survey:

[9]:
results = question.by(scenarios).by(agent).by(model).run()

List the components of the results that are generated:

[10]:
results.columns
[10]:
['agent.agent_name',
 'agent.background',
 'answer.review_questions',
 'iteration.iteration',
 'model.frequency_penalty',
 'model.logprobs',
 'model.max_tokens',
 'model.model',
 'model.presence_penalty',
 'model.temperature',
 'model.top_logprobs',
 'model.top_p',
 'prompt.review_questions_system_prompt',
 'prompt.review_questions_user_prompt',
 'raw_model_response.review_questions_raw_model_response',
 'scenario.draft_text']

Print select components of the results:

[11]:
(
    results.select("scenario.*", "answer.*").print(
        pretty_labels={
            "scenario.draft_text": "Draft text",
            "answer.review_questions": "Evaluation",
        }
    )
)
Draft text Evaluation
Do you feel the product is almost always of good quality? The phrase 'almost always' is problematic because it is vague and can be interpreted differently by respondents, leading to inconsistent data. Additionally, 'good quality' is subjective without a clear standard for what constitutes 'good.' A revised version could be: 'How often do you find the product to be of high quality?' with response options such as 'Always,' 'Often,' 'Sometimes,' 'Rarely,' or 'Never.' This provides a clearer metric for respondents to evaluate the product's quality.
On a scale of 1 to 5, where 1 means strongly agree and 5 means strongly disagree, how satisfied are you with our service? The problematic phrase in the question is the scale description, which mixes agreement with satisfaction levels. The scale should consistently measure one concept. A revised version could be: 'On a scale of 1 to 5, where 1 means very satisfied and 5 means very dissatisfied, how satisfied are you with our service?'
Do you believe our IT team's collaborative synergy effectively optimizes our digital infrastructure? The original survey question contains jargon and complex phrases that may not be clearly understood by all respondents. The phrase 'collaborative synergy' is buzzword-heavy and may not convey a specific meaning, while 'effectively optimizes' could be too technical or vague. A revised version of the question could be: 'Do you think our IT team works well together to improve our digital systems?' This version is more straightforward and uses simpler language, making it easier for respondents to understand and answer accurately.
What do you think of our recent implementation of Project X57? The phrase 'What do you think of' in the survey question 'What do you think of our recent implementation of Project X57?' is somewhat vague and open-ended, which could lead to a wide range of responses that may not be easily quantifiable or directly actionable. Additionally, 'our recent implementation' and 'Project X57' assume that the respondent is familiar with the project and its implementation timeline, which may not always be the case. A revised version of the question could be: 'How satisfied are you with the implementation of Project X57? Please rate your satisfaction on a scale from 1 (very dissatisfied) to 5 (very satisfied).' This version provides a specific scale for respondents to express their level of satisfaction, which makes the data easier to analyze and compare. It also implies a direct question about the implementation itself, which is the focus of the inquiry.

Qualitative reviews

In this example we use a set of hypothetical customer service tickets and prompt a model to extract a set of themes that we could use in follow-on questions (e.g., as a set of options to multiple choice questions).

[12]:
from edsl.questions import QuestionList
[13]:
tickets = [
    "I waited for 20 minutes past the estimated arrival time, and the driver still hasn't arrived. This made me late for my appointment.",
    "The driver was very rude and had an unpleasant attitude during the entire ride. It was an uncomfortable experience.",
    "The driver was speeding and frequently changing lanes without signaling. I felt unsafe throughout the ride.",
    "The car I rode in was dirty and messy. There were crumbs on the seats, and it didn't look like it had been cleaned in a while.",
    "The driver took a longer route, which resulted in a significantly higher fare than expected. I believe they intentionally extended the trip.",
    "I was charged for a ride that I did not take. The ride appears on my account, but I was not in the vehicle at that time.",
    "I left my wallet in the car during my last ride. I've tried contacting the driver, but I haven't received a response.",
]

Create an agent with a relevant persona:

[14]:
a_customer_service = Agent(
    traits={
        "background": "You are an experienced customer service agent for a ridesharing company."
    }
)

Create a question about the texts:

[15]:
q_topics = QuestionList(
    question_name="ticket_topics",
    question_text="Create a list of the topics raised in these customer service tickets: {{ tickets_texts }}.",
)

Add the texts to the question:

[16]:
scenario = Scenario({"tickets_texts": "; ".join(tickets)})

Generate results:

[17]:
topics = q_topics.by(scenario).by(a_customer_service).by(model).run()

Inspect the results:

[18]:
topics.select("ticket_topics").to_list()[0]
[18]:
['late arrival',
 'driver rudeness',
 'unsafe driving',
 'dirty vehicle',
 'route longer than expected',
 'incorrect fare',
 'unauthorized charge',
 'lost property']

Data labeling

In this example we prompt an LLM to rating the seriousness of tickets about safety issues.

See this notebook as well for a more complex data labeling exercise: Data Labeling Agents.

[19]:
from edsl.questions import QuestionLinearScale
[20]:
safety_tickets = [
    "During my ride, I noticed that the driver was frequently checking their phone for directions, which made me a bit uncomfortable. It didn't feel like they were fully focused on the road.",
    "The driver had to brake abruptly to avoid a collision with another vehicle. It was a close call, and it left me feeling quite shaken. Please address this issue.",
    "I had a ride with a driver who was clearly speeding and weaving in and out of traffic. Their reckless driving put my safety at risk, and I'm very concerned about it.",
    "My ride was involved in a minor accident, and although no one was seriously injured, it was a scary experience. The driver is handling the situation, but I wanted to report it.",
    "I had a ride with a driver who exhibited aggressive and threatening behavior towards me during the trip. I felt genuinely unsafe and want this matter to be taken seriously.",
]
[21]:
q_rating = QuestionLinearScale(
    question_name="safety_rating",
    question_text="""On a scale from 0-10 rate the seriousness of the issue raised in this customer service ticket
    (0 = Not serious, 10 = Extremely serious): {{ ticket }}""",
    question_options=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
)
[22]:
scenarios = [Scenario({"ticket": safety_ticket}) for safety_ticket in safety_tickets]
[23]:
r_rating = q_rating.by(scenarios).by(a_customer_service).by(model).run()
[24]:
(r_rating.select("scenario.*", "answer.*").print())
scenario.ticket answer.safety_rating_comment answer.safety_rating
During my ride, I noticed that the driver was frequently checking their phone for directions, which made me a bit uncomfortable. It didn't feel like they were fully focused on the road. While the driver was not involved in an accident, being distracted by a phone for directions can lead to unsafe driving conditions. It is important for drivers to prioritize safety and ensure they are fully focused on the road. This issue will be addressed with the driver to prevent future occurrences. 6
The driver had to brake abruptly to avoid a collision with another vehicle. It was a close call, and it left me feeling quite shaken. Please address this issue. The situation described was potentially dangerous and the customer's safety is our top priority. Although no actual collision occurred, the experience was distressing for the passenger and warrants a thorough investigation into the driver's conduct and the circumstances of the near-miss. 7
I had a ride with a driver who was clearly speeding and weaving in and out of traffic. Their reckless driving put my safety at risk, and I'm very concerned about it. Reckless driving is a serious safety concern and is treated with utmost urgency. Your safety is our top priority, and such behavior is unacceptable in our service. 10
My ride was involved in a minor accident, and although no one was seriously injured, it was a scary experience. The driver is handling the situation, but I wanted to report it. Any accident involving our rides is taken very seriously due to the potential for injury and the importance of safety. Even though no one was seriously injured, the experience can be traumatic and must be thoroughly investigated. 8
I had a ride with a driver who exhibited aggressive and threatening behavior towards me during the trip. I felt genuinely unsafe and want this matter to be taken seriously. Customer safety is our top priority. Aggressive and threatening behavior by a driver is completely unacceptable and is taken very seriously. We will investigate this matter immediately and take appropriate action. 10

Creating new methods

We can use the question prompts to create new methods, such as a translator:

[25]:
def translate_to_german(text):
    q = QuestionFreeText(
        question_name="deutsch",
        question_text="Please translate '{{ text }}' into German",
    )
    result = q.by(Scenario({"text": text})).run()
    return result.select("deutsch").print()
[26]:
translate_to_german("Hello, friend, have you been traveling?")
answer.deutsch
Hallo, Freund, bist du gereist?