Skip to main content
EDSL is an open-source library for simulating surveys, experiments and other research with AI agents and large language models. Before running the code below, please ensure that you have installed the EDSL library and either activated remote inference from your Coop account or stored API keys for the language models that you want to use with EDSL. Please also see our documentation page for tips and tutorials on getting started using EDSL.

Selecting language models

A list of current available models can be viewed here. To see a list of service providers:
from edsl import Model

Model.services()
Service Name
0anthropic
1azure
2bedrock
3deep_infra
4deepseek
5google
6groq
7mistral
8ollama
9openai
10perplexity
11together
12xai
To inspect the default model:
Model()
LanguageModel
keyvalue
0modelgpt-4o
1parameters:temperature0.500000
2parameters:max_tokens1000
3parameters:top_p1
4parameters:frequency_penalty0
5parameters:presence_penalty0
6parameters:logprobsFalse
7parameters:top_logprobs3
8inference_serviceopenai
Here we select several models to compare their responses for the survey that we create in the steps below:
from edsl import ModelList

models = ModelList(
    Model(m) for m in ["gemini-1.5-flash", "gpt-4o", "claude-3-5-sonnet-20240620"]
)

Generating content

EDSL comes with a variety of standard survey question types, such as multiple choice, free text, etc. These can be selected based on the desired format of the response. See details about all types here. We can use QuestionFreeText to prompt the models to generate some content for our experiment:
from edsl import QuestionFreeText

q = QuestionFreeText(
    question_name = "poem",
    question_text = "Please draft a short poem about any topic. Return only the poem."
)
We generate a response to the question by adding the models to use with the by method and then calling the run method. This generates a Results object with a Result for each response to the question:
results = q.by(models).run()
To see a list of all components of results:
results.columns
0
0agent.agent_index
1agent.agent_instruction
2agent.agent_name
3answer.poem
4cache_keys.poem_cache_key
5cache_used.poem_cache_used
6comment.poem_comment
7generated_tokens.poem_generated_tokens
8iteration.iteration
9model.frequency_penalty
10model.inference_service
11model.logprobs
12model.maxOutputTokens
13model.max_tokens
14model.model
15model.model_index
16model.presence_penalty
17model.stopSequences
18model.temperature
19model.topK
20model.topP
21model.top_logprobs
22model.top_p
23prompt.poem_system_prompt
24prompt.poem_user_prompt
25question_options.poem_question_options
26question_text.poem_question_text
27question_type.poem_question_type
28raw_model_response.poem_cost
29raw_model_response.poem_one_usd_buys
30raw_model_response.poem_raw_model_response
31scenario.scenario_index
We can inspect components of the results individually:
results.select("model", "poem")
model.modelanswer.poem
0gemini-1.5-flashThe old oak sighs, a whispered plea, Of sun-drenched days and memory. Its leaves, like coins, fall to the ground, A rustling song, without a sound.
1gpt-4oIn the hush of dawn’s embrace, Where whispers dance on morning’s face, A gentle breeze begins to weave, Stories that the night must leave. Petals wake with dewdrop dreams, Reflecting light in golden streams. The world, anew, in softest hues, Paints a canvas, fresh and true. Birds compose their morning song, A symphony where hearts belong. Nature’s chorus, pure and clear, Fills the air with hope and cheer. In this moment, time stands still, A promise held in every thrill. The day unfolds, a tender grace, In the hush of dawn’s embrace.
2claude-3-5-sonnet-20240620Whispers of Autumn Golden leaves dance on the breeze, A crisp chill nips at my knees. Pumpkins grin with candlelit faces, As nature dons her russet laces. Harvest moon hangs low and bright, Guiding spirits through the night. Autumn’s spell, so bittersweet, Makes time both linger and fleet.

Conducting a review

Next we create a question to have a model evaluating a response that we use as an input to the new question:
from edsl import QuestionLinearScale

q_score = QuestionLinearScale(
    question_name = "score",
    question_text = "Please give the following poem a score. No easy grading! Poem: {{ scenario.poem }}",
    question_options = [0, 1, 2, 3, 4, 5],
    option_labels = {0: "Very poor", 5: "Excellent"},
)

Parameterizing questions

We use Scenario objects to add each response to the new question. EDSL comes with many methods for creating scenarios from different data sources (PDFs, CSVs, docs, images, lists, etc.), as well as Results objects:
scenarios = (
    results.to_scenario_list()
    .select("model", "poem")
    .rename({"model": "drafting_model"}) # renaming the 'model' field to distinguish the evaluating model
)
scenarios
ScenarioList scenarios: 3; keys: [‘drafting_model’, ‘poem’];
drafting_modelpoem
0gemini-1.5-flashThe old oak sighs, a whispered plea, Of sun-drenched days and memory. Its leaves, like coins, fall to the ground, A rustling song, without a sound.
1gpt-4oIn the hush of dawn’s embrace, Where whispers dance on morning’s face, A gentle breeze begins to weave, Stories that the night must leave. Petals wake with dewdrop dreams, Reflecting light in golden streams. The world, anew, in softest hues, Paints a canvas, fresh and true. Birds compose their morning song, A symphony where hearts belong. Nature’s chorus, pure and clear, Fills the air with hope and cheer. In this moment, time stands still, A promise held in every thrill. The day unfolds, a tender grace, In the hush of dawn’s embrace.
2claude-3-5-sonnet-20240620Whispers of Autumn Golden leaves dance on the breeze, A crisp chill nips at my knees. Pumpkins grin with candlelit faces, As nature dons her russet laces. Harvest moon hangs low and bright, Guiding spirits through the night. Autumn’s spell, so bittersweet, Makes time both linger and fleet.
Finally, we conduct the evaluation by having each model score each haiku that was generated (without information about whether the model itself was the source):
results = q_score.by(scenarios).by(models).run()
results.columns
0
0agent.agent_index
1agent.agent_instruction
2agent.agent_name
3answer.score
4cache_keys.score_cache_key
5cache_used.score_cache_used
6comment.score_comment
7generated_tokens.score_generated_tokens
8iteration.iteration
9model.frequency_penalty
10model.inference_service
11model.logprobs
12model.maxOutputTokens
13model.max_tokens
14model.model
15model.model_index
16model.presence_penalty
17model.stopSequences
18model.temperature
19model.topK
20model.topP
21model.top_logprobs
22model.top_p
23prompt.score_system_prompt
24prompt.score_user_prompt
25question_options.score_question_options
26question_text.score_question_text
27question_type.score_question_type
28raw_model_response.score_cost
29raw_model_response.score_one_usd_buys
30raw_model_response.score_raw_model_response
31scenario.drafting_model
32scenario.poem
33scenario.scenario_index
results.sort_by("drafting_model", "model").select("drafting_model", "model", "poem", "score", "score_comment")
scenario.drafting_modelmodel.modelscenario.poemanswer.scorecomment.score_comment
0claude-3-5-sonnet-20240620claude-3-5-sonnet-20240620Whispers of Autumn Golden leaves dance on the breeze, A crisp chill nips at my knees. Pumpkins grin with candlelit faces, As nature dons her russet laces. Harvest moon hangs low and bright, Guiding spirits through the night. Autumn’s spell, so bittersweet, Makes time both linger and fleet.4This poem demonstrates strong imagery and evocative language, capturing the essence of autumn effectively. The rhyme scheme and meter are consistent, and there are some nice poetic devices like alliteration and personification. While it’s a well-crafted poem, it doesn’t quite reach the level of excellence due to its somewhat conventional approach to the subject matter.
1claude-3-5-sonnet-20240620gemini-1.5-flashWhispers of Autumn Golden leaves dance on the breeze, A crisp chill nips at my knees. Pumpkins grin with candlelit faces, As nature dons her russet laces. Harvest moon hangs low and bright, Guiding spirits through the night. Autumn’s spell, so bittersweet, Makes time both linger and fleet.3The poem is competently written, employing imagery and rhythm effectively to evoke the feeling of autumn. However, it lacks originality or depth; the imagery is fairly standard for autumnal poems, and the “bittersweet” sentiment is somewhat cliché. It’s well-crafted but not particularly memorable or innovative.
2claude-3-5-sonnet-20240620gpt-4oWhispers of Autumn Golden leaves dance on the breeze, A crisp chill nips at my knees. Pumpkins grin with candlelit faces, As nature dons her russet laces. Harvest moon hangs low and bright, Guiding spirits through the night. Autumn’s spell, so bittersweet, Makes time both linger and fleet.4The poem captures the essence of autumn beautifully with vivid imagery and a rhythmic flow. It effectively conveys the season’s atmosphere and emotions, though it could benefit from more depth or unique perspective to achieve an “excellent” score.
3gemini-1.5-flashclaude-3-5-sonnet-20240620The old oak sighs, a whispered plea, Of sun-drenched days and memory. Its leaves, like coins, fall to the ground, A rustling song, without a sound.4This poem demonstrates strong imagery, effective use of metaphor, and a pleasing rhythm. The personification of the oak tree and the comparison of leaves to coins are evocative. The final line creates a nice paradox. While not perfect, it’s a well-crafted short poem.
4gemini-1.5-flashgemini-1.5-flashThe old oak sighs, a whispered plea, Of sun-drenched days and memory. Its leaves, like coins, fall to the ground, A rustling song, without a sound.4The poem uses strong imagery (“sun-drenched days,” “leaves, like coins,” “rustling song, without a sound”) and personification (“old oak sighs, a whispered plea”) to create a evocative mood. While not groundbreaking, it’s well-crafted and emotionally resonant.
5gemini-1.5-flashgpt-4oThe old oak sighs, a whispered plea, Of sun-drenched days and memory. Its leaves, like coins, fall to the ground, A rustling song, without a sound.4The poem effectively uses imagery and personification to evoke a sense of nostalgia and the passage of time. The comparison of leaves to coins is particularly vivid, and the phrase “rustling song, without a sound” adds an intriguing paradox. However, the poem is relatively short and could benefit from further development to achieve a higher score.
6gpt-4oclaude-3-5-sonnet-20240620In the hush of dawn’s embrace, Where whispers dance on morning’s face, A gentle breeze begins to weave, Stories that the night must leave. Petals wake with dewdrop dreams, Reflecting light in golden streams. The world, anew, in softest hues, Paints a canvas, fresh and true. Birds compose their morning song, A symphony where hearts belong. Nature’s chorus, pure and clear, Fills the air with hope and cheer. In this moment, time stands still, A promise held in every thrill. The day unfolds, a tender grace, In the hush of dawn’s embrace.4This poem demonstrates strong imagery, consistent rhythm, and effective use of literary devices like alliteration and metaphor. The language is evocative and creates a vivid sensory experience. While it’s a well-crafted piece, it doesn’t quite reach the level of excellence due to its somewhat conventional theme and imagery.
7gpt-4ogemini-1.5-flashIn the hush of dawn’s embrace, Where whispers dance on morning’s face, A gentle breeze begins to weave, Stories that the night must leave. Petals wake with dewdrop dreams, Reflecting light in golden streams. The world, anew, in softest hues, Paints a canvas, fresh and true. Birds compose their morning song, A symphony where hearts belong. Nature’s chorus, pure and clear, Fills the air with hope and cheer. In this moment, time stands still, A promise held in every thrill. The day unfolds, a tender grace, In the hush of dawn’s embrace.4The poem is well-written and evokes a strong sense of imagery and emotion. The language is beautiful and the structure is consistent. However, it lacks a certain level of originality or complexity to reach a “5”. There’s a predictability to the imagery and metaphors that prevents it from being truly exceptional.
8gpt-4ogpt-4oIn the hush of dawn’s embrace, Where whispers dance on morning’s face, A gentle breeze begins to weave, Stories that the night must leave. Petals wake with dewdrop dreams, Reflecting light in golden streams. The world, anew, in softest hues, Paints a canvas, fresh and true. Birds compose their morning song, A symphony where hearts belong. Nature’s chorus, pure and clear, Fills the air with hope and cheer. In this moment, time stands still, A promise held in every thrill. The day unfolds, a tender grace, In the hush of dawn’s embrace.5The poem beautifully captures the serene and hopeful atmosphere of dawn with vivid imagery and a rhythmic flow. Each stanza contributes to a cohesive narrative that evokes a sense of peace and renewal, making it an excellent piece.

Posting to the Coop

The Coop is a platform for creating, storing and sharing LLM-based research. It is fully integrated with EDSL and accessible from your workspace or Coop account page. Learn more about creating an account and using the Coop. Here we post this notebook:
from edsl import Notebook

nb = Notebook(path = "explore_llm_biases.ipynb")

if refresh := False:
    nb.push(
        description = "Example code for comparing model responses and biases",
        alias = "explore-llm-biases-notebook",
        visibility = "public"
    )
else:
    nb.patch("https://www.expectedparrot.com/content/RobinHorton/explore-llm-biases-notebook", value = nb)
I