River Problem

This notebook provides sample EDSL code exploring capabilities of large language models to provide and evaluate solutions for a river crossing problem, where the object is to efficiently transport items across a river subject to conditions on the number of items that can be transported at once and combinations of items than can be left together unattended.

In a popular version of the problem, a farmer needs to transport a wolf, a goat and cabbage but cannot leave the wolf with the dog or the dog with the cabbage, as the dog and the cabbage would be eaten.

There are several things we want to learn in using LLMs to explore this problem:

  1. Are models capable of providing valid, efficient solutions? If so, what level of instruction is needed, and does it matter how we prompt the model to format its solution?

  2. When models do provide solutions, are they easily disuaded from trusting those solutions?

  3. When models are given valid solutions, can they easily be convinced that the solutions are incorrect?

The notebook has multiple sections:

Proposing solutions: We prompt models to provide solutions for the problem in different ways, and then ask the models about their confidence in their solutions.

Selecting solutions: We prompt models to identify a correct solution from a list of otherwise incorrect options, and then ask them about their confidence in their selections.

Evaluating solutions: We give models valid solutions and then see whether the models can be convinced that the solutions are incorrect.

EDSL

EDSL is an open-source Python library for simulating surveys and experiments with AI agents and large langugae models. Please see our documention page for tips and tutorials on getting started.

Proposing solutions

We start by describing the problem (Wikipedia) and constructing a question to prompt a model to provide an efficient solution for it:

[1]:
problem = """
A farmer with a wolf, a goat, and a cabbage must cross a river by boat.
The boat can carry only the farmer and a single item. If left unattended
together, the wolf would eat the goat, or the goat would eat the cabbage.
How can they cross the river without anything being eaten?
"""

Special instructions

The model may perform better if we specifically note that items may be brought back across the river multiple times, as this often trips people up who assume it is not allowed. We can store this tip separately to compare how models perform with and without it:

[2]:
tip = "(Note that items may be carried back and forth across the river.)"

Constructing questions

EDSL comes with many standard question types that we can choose from based on the form of the response that we want to get back (see examples of all question types). Here, we first ask the model to propose a solution to the problem as a textual response. We create 2 different versions of the question with and without the tip:

[3]:
from edsl.questions import QuestionFreeText

q_solution_text = QuestionFreeText(
    question_name="solution_text",
    question_text="Please provide an efficient, concise solution to this problem: "
    + problem,
)

q_solution_text_tip = QuestionFreeText(
    question_name="solution_text_tip",
    question_text="Please provide an efficient, concise solution to this problem: "
    + problem
    + tip,
)

We can also try prompting the model to format its response differently, for example as a list of steps instead of a text:

[4]:
from edsl.questions import QuestionList

q_solution_list = QuestionList(
    question_name="solution_list",
    question_text="Please provide an efficient, concise solution to this problem: "
    + problem
    + tip
    + """ Format your response as a list of steps like these:
    'Farmer moves <item> from left to right.' or 'Farmer moves alone from left to right.'""",
)

We can add a follow-on question asking the model about its confidence in its solution. Here we pose the same follow-on question using several different question types to compare responses:

[5]:
from edsl.questions import (
    QuestionYesNo,
    QuestionFreeText,
    QuestionMultipleChoice,
    QuestionLinearScale,
)

question_text = "Are you confident in your solution?"

q_confidence1 = QuestionYesNo(
    question_name="confidence_yn", question_text=question_text
)

q_confidence2 = QuestionFreeText(
    question_name="confidence_ft", question_text=question_text
)

q_confidence3 = QuestionMultipleChoice(
    question_name="confidence_mc",
    question_text=question_text,
    question_options=["No", "Yes", "Somewhat"],
)

q_confidence4 = QuestionLinearScale(
    question_name="confidence_ls",
    question_text=question_text,
    question_options=[0, 1, 2, 3, 4, 5],
    option_labels={0: "I am not at all confident.", 5: "I am very confident."},
)

We combine these questions in a Survey in order to administer them together. Here we create separate surveys to compare responses with and without the tip and as a list of steps:

[6]:
from edsl import Survey

survey = Survey(
    [q_solution_text, q_confidence1, q_confidence2, q_confidence3, q_confidence4]
)

survey_tip = Survey(
    [q_solution_text_tip, q_confidence1, q_confidence2, q_confidence3, q_confidence4]
)

survey_list = Survey(
    [q_solution_list, q_confidence1, q_confidence2, q_confidence3, q_confidence4]
)

Adding survey rules

Survey questions are administered to models asynchornously by default (for speed and minimizing tokens consumed). We can also choose whether to give the model information about prior questions and responses in answering other questions. Here we want the model to know about its proposed solution in answering each of the follow-on questions. We do this by adding a memory of the solution question to each individual follow-on question, and we repeat this for the solution questions with and without the tip. Note that this is different from giving the model cumulative information, so that we can ask each version of the confidence question freshly:

[7]:
survey = (
    survey.add_targeted_memory(q_confidence1, q_solution_text)
    .add_targeted_memory(q_confidence2, q_solution_text)
    .add_targeted_memory(q_confidence3, q_solution_text)
    .add_targeted_memory(q_confidence4, q_solution_text)
)

survey_tip = (
    survey_tip.add_targeted_memory(q_confidence1, q_solution_text_tip)
    .add_targeted_memory(q_confidence2, q_solution_text_tip)
    .add_targeted_memory(q_confidence3, q_solution_text_tip)
    .add_targeted_memory(q_confidence4, q_solution_text_tip)
)

survey_list = (
    survey_list.add_targeted_memory(q_confidence1, q_solution_list)
    .add_targeted_memory(q_confidence2, q_solution_list)
    .add_targeted_memory(q_confidence3, q_solution_list)
    .add_targeted_memory(q_confidence4, q_solution_list)
)

Designing AI agents to answer questions

We can optionally create one ore more agents with relevant traits and instructions for a model to use in answering the questions. We do this by passing a dictionary of traits to an Agent object that we add to the survey when we run it. (Learn more about using agents to answer surveys.) Here we create a set of agents with and without personas and special instructions to explore potential impacts to responses:

[8]:
from edsl import Agent

instructions = [
    "",  # An empty instruction for comparison
    """You are being asked to provide and evaluate solutions to a classic
                'river crossing problem'. In answering questions, be sure to carefully
                consider the constraints of the given problem and strategies that may
                be helpful in identifying correct solutions, such as backtracking.""",
]

personas = [
    "",  # An empty persona description for comparison
    "You are a computer scientist.",
]

agents = [
    Agent(traits={"persona": p}, instruction=i) for p in personas for i in instructions
]
agents
[8]:
[Agent(traits = {'persona': ''}),
 Agent(traits = {'persona': ''}, instruction = 'You are being asked to provide and evaluate solutions to a classic
                 'river crossing problem'. In answering questions, be sure to carefully
                 consider the constraints of the given problem and strategies that may
                 be helpful in identifying correct solutions, such as backtracking.'),
 Agent(traits = {'persona': 'You are a computer scientist.'}),
 Agent(traits = {'persona': 'You are a computer scientist.'}, instruction = 'You are being asked to provide and evaluate solutions to a classic
                 'river crossing problem'. In answering questions, be sure to carefully
                 consider the constraints of the given problem and strategies that may
                 be helpful in identifying correct solutions, such as backtracking.')]

Selecting language models

We can also specify language models that we want to use to generate responses. If none are specified, EDSL will use GPT 4 preview by default (learn more about specifying models). Here we specify that we will use it for purposes of demonstration:

[9]:
from edsl import Model

# To see a list of currently available models:
# Model.available()

We create Model objects for the models that we want to add to the survey. Here we’ll compare GPT 3.5 and 4:

[10]:
models = [Model(m) for m in ["gpt-3.5-turbo", "gpt-4-1106-preview"]]

Generating results

Now we can generate responses by calling the run method on the surveys, after adding agents and models with the by method:

[12]:
results = survey.by(agents).by(models).run()
[13]:
results_tip = survey_tip.by(agents).by(models).run()
[14]:
results_list = survey_list.by(agents).by(models).run()

This generates Results which contain information about all the components of the responses. We can view these components: we fan access as datasets.

[15]:
results.columns
[15]:
['agent.agent_instruction',
 'agent.agent_name',
 'agent.persona',
 'answer.confidence_ft',
 'answer.confidence_ls',
 'answer.confidence_mc',
 'answer.confidence_yn',
 'answer.solution_text',
 'comment.confidence_ls_comment',
 'comment.confidence_mc_comment',
 'comment.confidence_yn_comment',
 'iteration.iteration',
 'model.frequency_penalty',
 'model.logprobs',
 'model.max_tokens',
 'model.model',
 'model.presence_penalty',
 'model.temperature',
 'model.top_logprobs',
 'model.top_p',
 'prompt.confidence_ft_system_prompt',
 'prompt.confidence_ft_user_prompt',
 'prompt.confidence_ls_system_prompt',
 'prompt.confidence_ls_user_prompt',
 'prompt.confidence_mc_system_prompt',
 'prompt.confidence_mc_user_prompt',
 'prompt.confidence_yn_system_prompt',
 'prompt.confidence_yn_user_prompt',
 'prompt.solution_text_system_prompt',
 'prompt.solution_text_user_prompt',
 'question_options.confidence_ft_question_options',
 'question_options.confidence_ls_question_options',
 'question_options.confidence_mc_question_options',
 'question_options.confidence_yn_question_options',
 'question_options.solution_text_question_options',
 'question_text.confidence_ft_question_text',
 'question_text.confidence_ls_question_text',
 'question_text.confidence_mc_question_text',
 'question_text.confidence_yn_question_text',
 'question_text.solution_text_question_text',
 'question_type.confidence_ft_question_type',
 'question_type.confidence_ls_question_type',
 'question_type.confidence_mc_question_type',
 'question_type.confidence_yn_question_type',
 'question_type.solution_text_question_type',
 'raw_model_response.confidence_ft_raw_model_response',
 'raw_model_response.confidence_ls_raw_model_response',
 'raw_model_response.confidence_mc_raw_model_response',
 'raw_model_response.confidence_yn_raw_model_response',
 'raw_model_response.solution_text_raw_model_response']

EDSL has many built-in methods for analyzing results as datasets. Here we first print just the answers:

[16]:
results.select("model", "persona", "agent_instruction", "solution_text").print(
    format="rich"
)
┏━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ model               agent                         agent                         answer                       ┃
┃ .model              .persona                      .agent_instruction            .solution_text               ┃
┡━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ gpt-3.5-turbo                                     You are answering questions   The farmer first takes the   │
│                                                   as if you were a human. Do    goat across the river,       │
│                                                   not break character.          leaving the wolf and cabbage │
│                                                                                 on the original side. Then,  │
│                                                                                 the farmer goes back alone   │
│                                                                                 and takes the wolf across    │
│                                                                                 the river. The farmer leaves │
│                                                                                 the wolf on the other side   │
│                                                                                 and takes the goat back to   │
│                                                                                 the original side. Finally,  │
│                                                                                 the farmer leaves the goat   │
│                                                                                 and takes the cabbage across │
│                                                                                 the river. The farmer then   │
│                                                                                 goes back for the goat, and  │
│                                                                                 all three items and the      │
│                                                                                 farmer are safely across the │
│                                                                                 river without anything being │
│                                                                                 eaten.                       │
├────────────────────┼──────────────────────────────┼──────────────────────────────┼──────────────────────────────┤
│ gpt-4-1106-preview                                You are answering questions   The farmer takes the goat    │
│                                                   as if you were a human. Do    across the river first and   │
│                                                   not break character.          leaves it on the other side. │
│                                                                                 Then he goes back and takes  │
│                                                                                 the wolf across the river.   │
│                                                                                 He leaves the wolf on the    │
│                                                                                 other side but takes the     │
│                                                                                 goat back with him. Next, he │
│                                                                                 takes the cabbage across the │
│                                                                                 river and leaves it with the │
│                                                                                 wolf. Finally, he returns to │
│                                                                                 pick up the goat and brings  │
│                                                                                 it across the river. This    │
│                                                                                 way, the goat and the        │
│                                                                                 cabbage are never left alone │
│                                                                                 together, and the wolf is    │
│                                                                                 not left alone with the      │
│                                                                                 goat.                        │
├────────────────────┼──────────────────────────────┼──────────────────────────────┼──────────────────────────────┤
│ gpt-3.5-turbo                                     You are being asked to        The farmer first takes the   │
│                                                   provide and evaluate          goat across the river,       │
│                                                   solutions to a classic        leaving the wolf and cabbage │
│                                                                   'river        on the original side. The    │
│                                                   crossing problem'. In         farmer then goes back alone  │
│                                                   answering questions, be sure  and takes the wolf across    │
│                                                   to carefully                  the river. The farmer leaves │
│                                                                   consider the  the wolf on the other side   │
│                                                   constraints of the given      and takes the goat back with │
│                                                   problem and strategies that   him. Finally, the farmer     │
│                                                   may                           leaves the goat on the       │
│                                                                   be helpful    original side and takes the  │
│                                                   in identifying correct        cabbage across the river.    │
│                                                   solutions, such as            The farmer then goes back    │
│                                                   backtracking.                 alone to the original side   │
│                                                                                 to complete the crossing.    │
├────────────────────┼──────────────────────────────┼──────────────────────────────┼──────────────────────────────┤
│ gpt-4-1106-preview                                You are being asked to        1. The farmer takes the goat │
│                                                   provide and evaluate          across the river and leaves  │
│                                                   solutions to a classic        it on the other side. He     │
│                                                                   'river        then returns alone to the    │
│                                                   crossing problem'. In         original side. 2. The farmer │
│                                                   answering questions, be sure  takes the wolf across the    │
│                                                   to carefully                  river. He leaves the wolf on │
│                                                                   consider the  the other side and takes the │
│                                                   constraints of the given      goat back with him to the    │
│                                                   problem and strategies that   original side. 3. The farmer │
│                                                   may                           leaves the goat on the       │
│                                                                   be helpful    original side and takes the  │
│                                                   in identifying correct        cabbage across the river,    │
│                                                   solutions, such as            leaving it with the wolf. He │
│                                                   backtracking.                 then returns alone to the    │
│                                                                                 original side. 4. Finally,   │
│                                                                                 the farmer takes the goat    │
│                                                                                 across the river again. All  │
│                                                                                 four are now on the other    │
│                                                                                 side safely, and nothing was │
│                                                                                 eaten.                       │
├────────────────────┼──────────────────────────────┼──────────────────────────────┼──────────────────────────────┤
│ gpt-3.5-turbo       You are a computer            You are answering questions   The farmer first takes the   │
│                     scientist.                    as if you were a human. Do    goat across the river and    │
│                                                   not break character.          leaves it there. Then, he    │
│                                                                                 goes back and takes the wolf │
│                                                                                 across. He leaves the wolf   │
│                                                                                 on the other side and takes  │
│                                                                                 the goat back with him. He   │
│                                                                                 leaves the goat and takes    │
│                                                                                 the cabbage across. Finally, │
│                                                                                 he goes back for the goat    │
│                                                                                 and takes it across. Now,    │
│                                                                                 all three are safely on the  │
│                                                                                 other side of the river.     │
├────────────────────┼──────────────────────────────┼──────────────────────────────┼──────────────────────────────┤
│ gpt-4-1106-preview  You are a computer            You are answering questions   The farmer takes the goat    │
│                     scientist.                    as if you were a human. Do    across the river first and   │
│                                                   not break character.          leaves it on the other side. │
│                                                                                 He then returns alone to the │
│                                                                                 original side and takes the  │
│                                                                                 cabbage across next.         │
│                                                                                 However, instead of leaving  │
│                                                                                 the cabbage with the goat,   │
│                                                                                 he brings the goat back with │
│                                                                                 him to the original side. He │
│                                                                                 leaves the goat and takes    │
│                                                                                 the wolf across the river,   │
│                                                                                 leaving it with the cabbage. │
│                                                                                 Finally, he returns alone to │
│                                                                                 pick up the goat and takes   │
│                                                                                 it across the river one last │
│                                                                                 time. This way, the wolf is  │
│                                                                                 never left alone with the    │
│                                                                                 goat, and the goat is never  │
│                                                                                 left alone with the cabbage. │
├────────────────────┼──────────────────────────────┼──────────────────────────────┼──────────────────────────────┤
│ gpt-3.5-turbo       You are a computer            You are being asked to        The farmer first takes the   │
│                     scientist.                    provide and evaluate          goat across the river, then  │
│                                                   solutions to a classic        goes back alone. Next, the   │
│                                                                   'river        farmer takes the cabbage     │
│                                                   crossing problem'. In         across the river and leaves  │
│                                                   answering questions, be sure  it there with the cabbage.   │
│                                                   to carefully                  The farmer then goes back to │
│                                                                   consider the  get the goat and brings it   │
│                                                   constraints of the given      across the river. Finally,   │
│                                                   problem and strategies that   the farmer goes back alone   │
│                                                   may                           to get the cabbage and       │
│                                                                   be helpful    brings it across the river.  │
│                                                   in identifying correct        In this way, all three items │
│                                                   solutions, such as            and the farmer safely cross  │
│                                                   backtracking.                 the river without anything   │
│                                                                                 being eaten.                 │
├────────────────────┼──────────────────────────────┼──────────────────────────────┼──────────────────────────────┤
│ gpt-4-1106-preview  You are a computer            You are being asked to        The farmer takes the         │
│                     scientist.                    provide and evaluate          following steps to cross the │
│                                                   solutions to a classic        river without anything being │
│                                                                   'river        eaten: 1. The farmer takes   │
│                                                   crossing problem'. In         the goat across the river    │
│                                                   answering questions, be sure  and leaves it on the other   │
│                                                   to carefully                  side. 2. The farmer returns  │
│                                                                   consider the  alone to the original side   │
│                                                   constraints of the given      to get either the wolf or    │
│                                                   problem and strategies that   the cabbage (let's say the   │
│                                                   may                           wolf). 3. The farmer takes   │
│                                                                   be helpful    the wolf across the river,   │
│                                                   in identifying correct        but he must bring the goat   │
│                                                   solutions, such as            back to the original side to │
│                                                   backtracking.                 prevent the wolf from eating │
│                                                                                 the goat. 4. The farmer      │
│                                                                                 leaves the goat on the       │
│                                                                                 original side and takes the  │
│                                                                                 cabbage across the river,    │
│                                                                                 leaving it with the wolf. 5. │
│                                                                                 Finally, the farmer returns  │
│                                                                                 alone to the original side   │
│                                                                                 to get the goat and brings   │
│                                                                                 it across the river. This    │
│                                                                                 way, all items are on the    │
│                                                                                 other side of the river, and │
│                                                                                 none has been eaten.         │
└────────────────────┴──────────────────────────────┴──────────────────────────────┴──────────────────────────────┘

Here we select the confidence responses:

[17]:
results.select(
    "model",
    "persona",
    "agent_instruction",
    "confidence_yn",
    "confidence_ft",
    "confidence_mc",
    "confidence_ls",
).print(format="rich")
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ model          agent          agent          answer          answer         answer          answer        ┃
┃ .model         .persona       .agent_instr…  .confidence_yn  .confidence_…  .confidence_mc  .confidence_… ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ gpt-3.5-turbo                 You are        Yes             Yes, I am      Yes             5             │
│                               answering                      confident in                                 │
│                               questions as                   the solution                                 │
│                               if you were a                  provided for                                 │
│                               human. Do not                  the farmer,                                  │
│                               break                          wolf, goat,                                  │
│                               character.                     and cabbage                                  │
│                                                              problem.                                     │
├───────────────┼───────────────┼───────────────┼────────────────┼───────────────┼────────────────┼───────────────┤
│ gpt-4-1106-p…                 You are        Yes             Yes, I am      Yes             5             │
│                               answering                      confident in                                 │
│                               questions as                   the solution                                 │
│                               if you were a                  provided. It                                 │
│                               human. Do not                  ensures that                                 │
│                               break                          all parties                                  │
│                               character.                     reach the                                    │
│                                                              other side of                                │
│                                                              the river                                    │
│                                                              safely                                       │
│                                                              without any                                  │
│                                                              incidents of                                 │
│                                                              the goat or                                  │
│                                                              cabbage being                                │
│                                                              eaten.                                       │
├───────────────┼───────────────┼───────────────┼────────────────┼───────────────┼────────────────┼───────────────┤
│ gpt-3.5-turbo                 You are being  Yes             Yes, I am      Yes             2             │
│                               asked to                       confident in                                 │
│                               provide and                    the solution                                 │
│                               evaluate                       provided for                                 │
│                               solutions to                   the river                                    │
│                               a classic                      crossing                                     │
│                                               problem with                                 │
│                               crossing                       the farmer,                                  │
│                               problem'. In                   wolf, goat,                                  │
│                               answering                      and cabbage.                                 │
│                               questions, be                                                               │
│                               sure to                                                                     │
│                               carefully                                                                   │
│                                                                                            │
│                               the                                                                         │
│                               constraints                                                                 │
│                               of the given                                                                │
│                               problem and                                                                 │
│                               strategies                                                                  │
│                               that may                                                                    │
│                                                                                            │
│                               helpful in                                                                  │
│                               identifying                                                                 │
│                               correct                                                                     │
│                               solutions,                                                                  │
│                               such as                                                                     │
│                               backtracking.                                                               │
├───────────────┼───────────────┼───────────────┼────────────────┼───────────────┼────────────────┼───────────────┤
│ gpt-4-1106-p…                 You are being  Yes             Yes, I am      Yes             5             │
│                               asked to                       confident in                                 │
│                               provide and                    the solution                                 │
│                               evaluate                       provided. The                                │
│                               solutions to                   steps                                        │
│                               a classic                      outlined                                     │
│                                               ensure that                                  │
│                               crossing                       at no point                                  │
│                               problem'. In                   are the goat                                 │
│                               answering                      and the wolf                                 │
│                               questions, be                  left alone                                   │
│                               sure to                        together                                     │
│                               carefully                      without the                                  │
│                                               farmer, nor                                  │
│                               the                            are the goat                                 │
│                               constraints                    and the                                      │
│                               of the given                   cabbage left                                 │
│                               problem and                    alone                                        │
│                               strategies                     together.                                    │
│                               that may                       This prevents                                │
│                                               the scenario                                 │
│                               helpful in                     where the                                    │
│                               identifying                    goat could be                                │
│                               correct                        eaten by the                                 │
│                               solutions,                     wolf or the                                  │
│                               such as                        cabbage could                                │
│                               backtracking.                  be eaten by                                  │
│                                                              the goat. The                                │
│                                                              solution is                                  │
│                                                              both                                         │
│                                                              efficient and                                │
│                                                              concise,                                     │
│                                                              adhering to                                  │
│                                                              the                                          │
│                                                              constraints                                  │
│                                                              of the                                       │
│                                                              problem.                                     │
├───────────────┼───────────────┼───────────────┼────────────────┼───────────────┼────────────────┼───────────────┤
│ gpt-3.5-turbo  You are a      You are        Yes             Yes, I am      Yes             5             │
│                computer       answering                      confident in                                 │
│                scientist.     questions as                   my solution.                                 │
│                               if you were a                                                               │
│                               human. Do not                                                               │
│                               break                                                                       │
│                               character.                                                                  │
├───────────────┼───────────────┼───────────────┼────────────────┼───────────────┼────────────────┼───────────────┤
│ gpt-4-1106-p…  You are a      You are        Yes             Yes, I am      Yes             5             │
│                computer       answering                      confident in                                 │
│                scientist.     questions as                   the solution                                 │
│                               if you were a                  provided. It                                 │
│                               human. Do not                  ensures that                                 │
│                               break                          the farmer                                   │
│                               character.                     can                                          │
│                                                              successfully                                 │
│                                                              cross the                                    │
│                                                              river with                                   │
│                                                              the wolf,                                    │
│                                                              goat, and                                    │
│                                                              cabbage                                      │
│                                                              without any                                  │
│                                                              of them being                                │
│                                                              eaten by                                     │
│                                                              following a                                  │
│                                                              specific                                     │
│                                                              sequence of                                  │
│                                                              trips.                                       │
├───────────────┼───────────────┼───────────────┼────────────────┼───────────────┼────────────────┼───────────────┤
│ gpt-3.5-turbo  You are a      You are being  Yes             Yes, I am      Yes             5             │
│                computer       asked to                       confident in                                 │
│                scientist.     provide and                    my solution                                  │
│                               evaluate                       to the river                                 │
│                               solutions to                   crossing                                     │
│                               a classic                      problem.                                     │
│                                                                                            │
│                               crossing                                                                    │
│                               problem'. In                                                                │
│                               answering                                                                   │
│                               questions, be                                                               │
│                               sure to                                                                     │
│                               carefully                                                                   │
│                                                                                            │
│                               the                                                                         │
│                               constraints                                                                 │
│                               of the given                                                                │
│                               problem and                                                                 │
│                               strategies                                                                  │
│                               that may                                                                    │
│                                                                                            │
│                               helpful in                                                                  │
│                               identifying                                                                 │
│                               correct                                                                     │
│                               solutions,                                                                  │
│                               such as                                                                     │
│                               backtracking.                                                               │
├───────────────┼───────────────┼───────────────┼────────────────┼───────────────┼────────────────┼───────────────┤
│ gpt-4-1106-p…  You are a      You are being  Yes             Yes, I am      Yes             5             │
│                computer       asked to                       confident in                                 │
│                scientist.     provide and                    the solution                                 │
│                               evaluate                       provided. It                                 │
│                               solutions to                   ensures that                                 │
│                               a classic                      at no point                                  │
│                                               are the goat                                 │
│                               crossing                       and the                                      │
│                               problem'. In                   cabbage left                                 │
│                               answering                      alone                                        │
│                               questions, be                  together                                     │
│                               sure to                        without the                                  │
│                               carefully                      farmer's                                     │
│                                               presence to                                  │
│                               the                            prevent the                                  │
│                               constraints                    goat from                                    │
│                               of the given                   eating the                                   │
│                               problem and                    cabbage, and                                 │
│                               strategies                     the goat and                                 │
│                               that may                       the wolf are                                 │
│                                               also never                                   │
│                               helpful in                     left alone                                   │
│                               identifying                    together to                                  │
│                               correct                        prevent the                                  │
│                               solutions,                     wolf from                                    │
│                               such as                        eating the                                   │
│                               backtracking.                  goat. The                                    │
│                                                              solution                                     │
│                                                              follows a                                    │
│                                                              logical                                      │
│                                                              sequence that                                │
│                                                              respects the                                 │
│                                                              constraints                                  │
│                                                              of the                                       │
│                                                              problem.                                     │
└───────────────┴───────────────┴───────────────┴────────────────┴───────────────┴────────────────┴───────────────┘

We can compare responses for the question prompting the models to provide a solution as a list of steps:

[18]:
results_list.select(
    "model",
    "persona",
    "agent_instruction",
    "solution_list",
    "confidence_yn",
    "confidence_ft",
    "confidence_mc",
    "confidence_ls",
).print(format="rich")
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ model        agent        agent        answer       answer        answer       answer        answer      ┃
┃ .model       .persona     .agent_ins…  .solution_…  .confidence…  .confidenc…  .confidence…  .confidenc… ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ gpt-3.5-tu…               You are      ['Farmer     Yes           Yes, I am    Yes           5           │
│                           answering    moves goat                 confident                              │
│                           questions    from left                  in the                                 │
│                           as if you    to right.',                solution                               │
│                           were a       'Farmer                    provided                               │
│                           human. Do    moves alone                for the                                │
│                           not break    from right                 farmer,                                │
│                           character.   to left.',                 wolf, goat,                            │
│                                        'Farmer                    and cabbage                            │
│                                        moves wolf                 river                                  │
│                                        from left                  crossing                               │
│                                        to right.',                problem.                               │
│                                        'Farmer                                                           │
│                                        moves goat                                                        │
│                                        from right                                                        │
│                                        to left.',                                                        │
│                                        'Farmer                                                           │
│                                        moves                                                             │
│                                        cabbage                                                           │
│                                        from left                                                         │
│                                        to right.',                                                       │
│                                        'Farmer                                                           │
│                                        moves alone                                                       │
│                                        from right                                                        │
│                                        to left.',                                                        │
│                                        'Farmer                                                           │
│                                        moves goat                                                        │
│                                        from left                                                         │
│                                        to right.']                                                       │
├─────────────┼─────────────┼─────────────┼─────────────┼──────────────┼─────────────┼──────────────┼─────────────┤
│ gpt-4-1106…               You are      ['Farmer     Yes           Yes, I am    Yes           5           │
│                           answering    moves goat                 confident                              │
│                           questions    from left                  in the                                 │
│                           as if you    to right.',                solution                               │
│                           were a       'Farmer                    provided.                              │
│                           human. Do    moves alone                It ensures                             │
│                           not break    from right                 that the                               │
│                           character.   to left.',                 farmer                                 │
│                                        'Farmer                    successful…                            │
│                                        moves                      transports                             │
│                                        cabbage                    the goat,                              │
│                                        from left                  the                                    │
│                                        to right.',                cabbage,                               │
│                                        'Farmer                    and the                                │
│                                        moves goat                 wolf across                            │
│                                        from right                 the river                              │
│                                        to left.',                 without any                            │
│                                        'Farmer                    of them                                │
│                                        moves wolf                 being                                  │
│                                        from left                  eaten.                                 │
│                                        to right.',                                                       │
│                                        'Farmer                                                           │
│                                        moves alone                                                       │
│                                        from right                                                        │
│                                        to left.',                                                        │
│                                        'Farmer                                                           │
│                                        moves goat                                                        │
│                                        from left                                                         │
│                                        to right.']                                                       │
├─────────────┼─────────────┼─────────────┼─────────────┼──────────────┼─────────────┼──────────────┼─────────────┤
│ gpt-3.5-tu…               You are      ['Farmer     Yes           Yes, I am    Yes           4           │
│                           being asked  moves goat                 confident                              │
│                           to provide   from left                  in my                                  │
│                           and          to right.',                solution.                              │
│                           evaluate     'Farmer                                                           │
│                           solutions    moves alone                                                       │
│                           to a         from right                                                        │
│                           classic      to left.',                                                        │
│                           'Farmer                                                           │
│                           crossing     moves wolf                                                        │
│                           problem'.    from left                                                         │
│                           In           to right.',                                                       │
│                           answering    'Farmer                                                           │
│                           questions,   moves goat                                                        │
│                           be sure to   from right                                                        │
│                           carefully    to left.',                                                        │
│                           'Farmer                                                           │
│                           the          moves                                                             │
│                           constraints  cabbage                                                           │
│                           of the       from left                                                         │
│                           given        to right.',                                                       │
│                           problem and  'Farmer                                                           │
│                           strategies   moves alone                                                       │
│                           that may     from right                                                        │
│                           to left.',                                                        │
│                           helpful in   'Farmer                                                           │
│                           identifying  moves goat                                                        │
│                           correct      from left                                                         │
│                           solutions,   to right.']                                                       │
│                           such as                                                                        │
│                           backtracki…                                                                    │
├─────────────┼─────────────┼─────────────┼─────────────┼──────────────┼─────────────┼──────────────┼─────────────┤
│ gpt-4-1106…               You are      ['Farmer     Yes           Yes, I am    Yes           5           │
│                           being asked  moves goat                 confident                              │
│                           to provide   from left                  in the                                 │
│                           and          to right.',                solution                               │
│                           evaluate     'Farmer                    provided.                              │
│                           solutions    moves alone                It ensures                             │
│                           to a         from right                 that the                               │
│                           classic      to left.',                 wolf is                                │
│                           'Farmer                    never left                             │
│                           crossing     moves                      alone with                             │
│                           problem'.    cabbage                    the goat,                              │
│                           In           from left                  and the                                │
│                           answering    to right.',                goat is                                │
│                           questions,   'Farmer                    never left                             │
│                           be sure to   moves goat                 alone with                             │
│                           carefully    from right                 the                                    │
│                           to left.',                 cabbage,                               │
│                           the          'Farmer                    thus                                   │
│                           constraints  moves wolf                 preventing                             │
│                           of the       from left                  any of the                             │
│                           given        to right.',                items from                             │
│                           problem and  'Farmer                    being                                  │
│                           strategies   moves alone                eaten. The                             │
│                           that may     from right                 farmer                                 │
│                           to left.',                 successful…                            │
│                           helpful in   'Farmer                    transports                             │
│                           identifying  moves goat                 all items                              │
│                           correct      from left                  to the                                 │
│                           solutions,   to right.']                other side                             │
│                           such as                                 of the                                 │
│                           backtracki…                             river by                               │
│                                                                   making                                 │
│                                                                   strategic                              │
│                                                                   trips back                             │
│                                                                   and forth.                             │
├─────────────┼─────────────┼─────────────┼─────────────┼──────────────┼─────────────┼──────────────┼─────────────┤
│ gpt-3.5-tu…  You are a    You are      ['Farmer     Yes           Yes, I am    Yes           5           │
│              computer     answering    moves goat                 confident                              │
│              scientist.   questions    from left                  in my                                  │
│                           as if you    to right.',                solution.                              │
│                           were a       'Farmer                                                           │
│                           human. Do    moves alone                                                       │
│                           not break    from right                                                        │
│                           character.   to left.',                                                        │
│                                        'Farmer                                                           │
│                                        moves                                                             │
│                                        cabbage                                                           │
│                                        from left                                                         │
│                                        to right.',                                                       │
│                                        'Farmer                                                           │
│                                        moves goat                                                        │
│                                        from right                                                        │
│                                        to left.',                                                        │
│                                        'Farmer                                                           │
│                                        moves wolf                                                        │
│                                        from left                                                         │
│                                        to right.',                                                       │
│                                        'Farmer                                                           │
│                                        moves alone                                                       │
│                                        from right                                                        │
│                                        to left.',                                                        │
│                                        'Farmer                                                           │
│                                        moves goat                                                        │
│                                        from left                                                         │
│                                        to right.']                                                       │
├─────────────┼─────────────┼─────────────┼─────────────┼──────────────┼─────────────┼──────────────┼─────────────┤
│ gpt-4-1106…  You are a    You are      ['Farmer     Yes           Yes, I am    Yes           5           │
│              computer     answering    moves goat                 confident                              │
│              scientist.   questions    from left                  in the                                 │
│                           as if you    to right',                 solution                               │
│                           were a       'Farmer                    provided.                              │
│                           human. Do    moves alone                It ensures                             │
│                           not break    from right                 that the                               │
│                           character.   to left',                  wolf is                                │
│                                        'Farmer                    never left                             │
│                                        moves                      alone with                             │
│                                        cabbage                    the goat,                              │
│                                        from left                  and the                                │
│                                        to right',                 goat is                                │
│                                        'Farmer                    never left                             │
│                                        moves goat                 alone with                             │
│                                        from right                 the                                    │
│                                        to left',                  cabbage,                               │
│                                        'Farmer                    which                                  │
│                                        moves wolf                 prevents                               │
│                                        from left                  any of the                             │
│                                        to right',                 items from                             │
│                                        'Farmer                    being eaten                            │
│                                        moves alone                as the                                 │
│                                        from right                 farmer                                 │
│                                        to left',                  transports                             │
│                                        'Farmer                    them across                            │
│                                        moves goat                 the river.                             │
│                                        from left                                                         │
│                                        to right']                                                        │
├─────────────┼─────────────┼─────────────┼─────────────┼──────────────┼─────────────┼──────────────┼─────────────┤
│ gpt-3.5-tu…  You are a    You are      ['Farmer     Yes           Yes, I am    Yes           1           │
│              computer     being asked  moves wolf                 confident                              │
│              scientist.   to provide   from left                  in my                                  │
│                           and          to right.',                solution.                              │
│                           evaluate     'Farmer                                                           │
│                           solutions    moves alone                                                       │
│                           to a         from right                                                        │
│                           classic      to left.',                                                        │
│                           'Farmer                                                           │
│                           crossing     moves goat                                                        │
│                           problem'.    from left                                                         │
│                           In           to right.',                                                       │
│                           answering    'Farmer                                                           │
│                           questions,   moves wolf                                                        │
│                           be sure to   from right                                                        │
│                           carefully    to left.',                                                        │
│                           'Farmer                                                           │
│                           the          moves                                                             │
│                           constraints  cabbage                                                           │
│                           of the       from left                                                         │
│                           given        to right.',                                                       │
│                           problem and  'Farmer                                                           │
│                           strategies   moves alone                                                       │
│                           that may     from right                                                        │
│                           to left.',                                                        │
│                           helpful in   'Farmer                                                           │
│                           identifying  moves wolf                                                        │
│                           correct      from left                                                         │
│                           solutions,   to right.']                                                       │
│                           such as                                                                        │
│                           backtracki…                                                                    │
├─────────────┼─────────────┼─────────────┼─────────────┼──────────────┼─────────────┼──────────────┼─────────────┤
│ gpt-4-1106…  You are a    You are      ['Farmer     Yes           Yes, I am    Yes           5           │
│              computer     being asked  moves goat                 confident                              │
│              scientist.   to provide   from left                  in my                                  │
│                           and          to right.',                solution.                              │
│                           evaluate     'Farmer                    The                                    │
│                           solutions    moves alone                provided                               │
│                           to a         from right                 sequence                               │
│                           classic      to left.',                 ensures                                │
│                           'Farmer                    that the                               │
│                           crossing     moves                      farmer                                 │
│                           problem'.    cabbage                    never                                  │
│                           In           from left                  leaves the                             │
│                           answering    to right.',                wolf alone                             │
│                           questions,   'Farmer                    with the                               │
│                           be sure to   moves goat                 goat or the                            │
│                           carefully    from right                 goat alone                             │
│                           to left.',                 with the                               │
│                           the          'Farmer                    cabbage,                               │
│                           constraints  moves wolf                 which would                            │
│                           of the       from left                  lead to one                            │
│                           given        to right.',                eating the                             │
│                           problem and  'Farmer                    other. At                              │
│                           strategies   moves alone                each step,                             │
│                           that may     from right                 the farmer                             │
│                           to left.',                 is either                              │
│                           helpful in   'Farmer                    moving an                              │
│                           identifying  moves goat                 item to                                │
│                           correct      from left                  prevent                                │
│                           solutions,   to right.']                this                                   │
│                           such as                                 scenario or                            │
│                           backtracki…                             moving                                 │
│                                                                   alone to                               │
│                                                                   reposition                             │
│                                                                   for the                                │
│                                                                   next safe                              │
│                                                                   transfer.                              │
│                                                                   The                                    │
│                                                                   solution is                            │
│                                                                   a classic                              │
│                                                                   example of                             │
│                                                                   a                                      │
│                                                                   state-space                            │
│                                                                   search                                 │
│                                                                   problem,                               │
│                                                                   where the                              │
│                                                                   goal is to                             │
│                                                                   reach a                                │
│                                                                   state where                            │
│                                                                   all items                              │
│                                                                   are safely                             │
│                                                                   on the                                 │
│                                                                   other side                             │
│                                                                   of the                                 │
│                                                                   river                                  │
│                                                                   without any                            │
│                                                                   intermedia…                            │
│                                                                   state                                  │
│                                                                   leading to                             │
│                                                                   an                                     │
│                                                                   undesirable                            │
│                                                                   outcome.                               │
└─────────────┴─────────────┴─────────────┴─────────────┴──────────────┴─────────────┴──────────────┴─────────────┘

We can see that the models are confident, even without the tip, but with variations in responses to the linear scale question. We can investigate this by printing the models’ commentary on those responses, which is automatically collected for each question (other than free text questions). We can focus on the inconsistent responses by filtering them first:

[19]:
(
    results.filter(
        "int(confidence_ls) < 5"
    )  # A logical expression for filtering responses to select
    .select(
        "model",
        "persona",
        "agent_instruction",
        "confidence_ls",
        "confidence_ls_comment",
    )
    .print(format="rich")
)
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ model          agent     agent                             answer          comment                          ┃
┃ .model         .persona  .agent_instruction                .confidence_ls  .confidence_ls_comment           ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ gpt-3.5-turbo            You are being asked to provide    2               I am confident in my solution as │
│                          and evaluate solutions to a                       it follows the classic approach  │
│                          classic                                           of solving the river crossing    │
│                                          'river crossing                   problem with the farmer, wolf,   │
│                          problem'. In answering                            goat, and cabbage without any of │
│                          questions, be sure to carefully                   them being eaten.                │
│                                          consider the                                                       │
│                          constraints of the given problem                                                   │
│                          and strategies that may                                                            │
│                                          be helpful in                                                      │
│                          identifying correct solutions,                                                     │
│                          such as backtracking.                                                              │
└───────────────┴──────────┴──────────────────────────────────┴────────────────┴──────────────────────────────────┘

These comments indicate that one model did not understand the linear scale question in some cases but expressed no confusion, which we can keep in mind in creating new questions.

Question prompt variations

Let’s try changing the tone of our confidence questions. Note that because we are not changing the agents, models or first question asking for a solution to the problem we will retrieve the cached responses to that question and it will be used identically for our new confidence questions (learn more about caching LLMs calls):

[20]:
question_text = (
    "This problem is hard! Are you really sure that your solution actually works?"
)

q_confidence1 = QuestionYesNo(
    question_name="confidence_yn", question_text=question_text
)

q_confidence2 = QuestionFreeText(
    question_name="confidence_ft", question_text=question_text
)

q_confidence3 = QuestionMultipleChoice(
    question_name="confidence_mc",
    question_text=question_text,
    question_options=["No", "Yes", "Somewhat"],
)

q_confidence4 = QuestionLinearScale(
    question_name="confidence_ls",
    question_text=question_text,
    question_options=[0, 1, 2, 3, 4, 5],
    option_labels={0: "I am not at all confident.", 5: "I am very confident."},
)

survey = Survey(
    [q_solution_text, q_confidence1, q_confidence2, q_confidence3, q_confidence4]
)

survey = (
    survey.add_targeted_memory(q_confidence1, q_solution_text)
    .add_targeted_memory(q_confidence2, q_solution_text)
    .add_targeted_memory(q_confidence3, q_solution_text)
    .add_targeted_memory(q_confidence4, q_solution_text)
)

results = survey.by(agents).by(models).run()

results.select(
    "model",
    "persona",
    "agent_instruction",
    "confidence_yn",
    "confidence_ft",
    "confidence_mc",
    "confidence_ls",
).print(format="rich")
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ model          agent          agent          answer          answer         answer          answer        ┃
┃ .model         .persona       .agent_instr…  .confidence_yn  .confidence_…  .confidence_mc  .confidence_… ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ gpt-3.5-turbo                 You are        Yes             Yes, I am      Yes             1             │
│                               answering                      confident                                    │
│                               questions as                   that my                                      │
│                               if you were a                  solution                                     │
│                               human. Do not                  works. The                                   │
│                               break                          strategy I                                   │
│                               character.                     provided for                                 │
│                                                              crossing the                                 │
│                                                              river with                                   │
│                                                              the wolf, the                                │
│                                                              goat, and the                                │
│                                                              cabbage has                                  │
│                                                              been tried                                   │
│                                                              and tested,                                  │
│                                                              ensuring the                                 │
│                                                              safety of all                                │
│                                                              the items                                    │
│                                                              involved.                                    │
├───────────────┼───────────────┼───────────────┼────────────────┼───────────────┼────────────────┼───────────────┤
│ gpt-4-1106-p…                 You are        Yes             Yes, the       Yes             1             │
│                               answering                      solution                                     │
│                               questions as                   provided for                                 │
│                               if you were a                  the problem                                  │
│                               human. Do not                  is a classic                                 │
│                               break                          one that                                     │
│                               character.                     ensures the                                  │
│                                                              safe crossing                                │
│                                                              of all                                       │
│                                                              parties. It                                  │
│                                                              works by                                     │
│                                                              never leaving                                │
│                                                              the goat and                                 │
│                                                              the cabbage                                  │
│                                                              alone                                        │
│                                                              together, and                                │
│                                                              the wolf and                                 │
│                                                              the goat                                     │
│                                                              alone                                        │
│                                                              together.                                    │
│                                                              Each step of                                 │
│                                                              the process                                  │
│                                                              is carefully                                 │
│                                                              planned to                                   │
│                                                              prevent any                                  │
│                                                              of the                                       │
│                                                              scenarios                                    │
│                                                              where one                                    │
│                                                              would eat the                                │
│                                                              other. It is                                 │
│                                                              a well-known                                 │
│                                                              puzzle with a                                │
│                                                              well-establi…                                │
│                                                              solution.                                    │
├───────────────┼───────────────┼───────────────┼────────────────┼───────────────┼────────────────┼───────────────┤
│ gpt-3.5-turbo                 You are being  Yes             Yes, I am      Yes             2             │
│                               asked to                       confident                                    │
│                               provide and                    that the                                     │
│                               evaluate                       solution                                     │
│                               solutions to                   provided                                     │
│                               a classic                      works. The                                   │
│                                               key is to                                    │
│                               crossing                       carefully                                    │
│                               problem'. In                   plan the                                     │
│                               answering                      order in                                     │
│                               questions, be                  which the                                    │
│                               sure to                        farmer takes                                 │
│                               carefully                      the items                                    │
│                                               across the                                   │
│                               the                            river to                                     │
│                               constraints                    ensure that                                  │
│                               of the given                   no conflicts                                 │
│                               problem and                    arise between                                │
│                               strategies                     the wolf,                                    │
│                               that may                       goat, and                                    │
│                                               cabbage.                                     │
│                               helpful in                                                                  │
│                               identifying                                                                 │
│                               correct                                                                     │
│                               solutions,                                                                  │
│                               such as                                                                     │
│                               backtracking.                                                               │
├───────────────┼───────────────┼───────────────┼────────────────┼───────────────┼────────────────┼───────────────┤
│ gpt-4-1106-p…                 You are being  Yes             Yes, the       Yes             1             │
│                               asked to                       solution                                     │
│                               provide and                    provided does                                │
│                               evaluate                       work. It                                     │
│                               solutions to                   ensures that                                 │
│                               a classic                      the goat and                                 │
│                                               the cabbage                                  │
│                               crossing                       are never                                    │
│                               problem'. In                   left alone                                   │
│                               answering                      together                                     │
│                               questions, be                  without the                                  │
│                               sure to                        farmer's                                     │
│                               carefully                      supervision,                                 │
│                                               and the same                                 │
│                               the                            is true for                                  │
│                               constraints                    the wolf and                                 │
│                               of the given                   the goat. By                                 │
│                               problem and                    shuttling the                                │
│                               strategies                     goat back and                                │
│                               that may                       forth, the                                   │
│                                               farmer                                       │
│                               helpful in                     prevents                                     │
│                               identifying                    either the                                   │
│                               correct                        wolf from                                    │
│                               solutions,                     being left                                   │
│                               such as                        with the goat                                │
│                               backtracking.                  or the goat                                  │
│                                                              with the                                     │
│                                                              cabbage, thus                                │
│                                                              solving the                                  │
│                                                              problem                                      │
│                                                              according to                                 │
│                                                              the                                          │
│                                                              constraints                                  │
│                                                              given.                                       │
├───────────────┼───────────────┼───────────────┼────────────────┼───────────────┼────────────────┼───────────────┤
│ gpt-3.5-turbo  You are a      You are        Yes             Yes, I am      Yes             1             │
│                computer       answering                      confident                                    │
│                scientist.     questions as                   that my                                      │
│                               if you were a                  solution                                     │
│                               human. Do not                  works. I have                                │
│                               break                          tested it                                    │
│                               character.                     thoroughly                                   │
│                                                              and it                                       │
│                                                              follows the                                  │
│                                                              logic of the                                 │
│                                                              problem                                      │
│                                                              correctly.                                   │
├───────────────┼───────────────┼───────────────┼────────────────┼───────────────┼────────────────┼───────────────┤
│ gpt-4-1106-p…  You are a      You are        Yes             Yes, the       Yes             1             │
│                computer       answering                      solution                                     │
│                scientist.     questions as                   provided is a                                │
│                               if you were a                  well-known                                   │
│                               human. Do not                  approach to                                  │
│                               break                          solving the                                  │
│                               character.                     river                                        │
│                                                              crossing                                     │
│                                                              puzzle                                       │
│                                                              involving a                                  │
│                                                              farmer, a                                    │
│                                                              wolf, a goat,                                │
│                                                              and a                                        │
│                                                              cabbage. It                                  │
│                                                              ensures that                                 │
│                                                              at no point                                  │
│                                                              are the wolf                                 │
│                                                              and goat left                                │
│                                                              alone                                        │
│                                                              together                                     │
│                                                              without the                                  │
│                                                              farmer, nor                                  │
│                                                              are the goat                                 │
│                                                              and cabbage                                  │
│                                                              left alone                                   │
│                                                              together.                                    │
│                                                              This prevents                                │
│                                                              the scenario                                 │
│                                                              where the                                    │
│                                                              wolf eats the                                │
│                                                              goat or the                                  │
│                                                              goat eats the                                │
│                                                              cabbage. The                                 │
│                                                              steps are                                    │
│                                                              carefully                                    │
│                                                              sequenced to                                 │
│                                                              avoid these                                  │
│                                                              undesired                                    │
│                                                              outcomes.                                    │
├───────────────┼───────────────┼───────────────┼────────────────┼───────────────┼────────────────┼───────────────┤
│ gpt-3.5-turbo  You are a      You are being  Yes             Yes, I am      Yes             2             │
│                computer       asked to                       confident                                    │
│                scientist.     provide and                    that the                                     │
│                               evaluate                       solution                                     │
│                               solutions to                   provided                                     │
│                               a classic                      works. The                                   │
│                                               key is to                                    │
│                               crossing                       carefully                                    │
│                               problem'. In                   plan the                                     │
│                               answering                      sequence of                                  │
│                               questions, be                  crossings to                                 │
│                               sure to                        ensure that                                  │
│                               carefully                      the wolf,                                    │
│                                               goat, and                                    │
│                               the                            cabbage are                                  │
│                               constraints                    never left                                   │
│                               of the given                   together in a                                │
│                               problem and                    situation                                    │
│                               strategies                     where one                                    │
│                               that may                       would eat                                    │
│                                               another. By                                  │
│                               helpful in                     following the                                │
│                               identifying                    steps                                        │
│                               correct                        outlined, the                                │
│                               solutions,                     farmer can                                   │
│                               such as                        safely                                       │
│                               backtracking.                  transport all                                │
│                                                              items across                                 │
│                                                              the river                                    │
│                                                              without any                                  │
│                                                              harm.                                        │
├───────────────┼───────────────┼───────────────┼────────────────┼───────────────┼────────────────┼───────────────┤
│ gpt-4-1106-p…  You are a      You are being  Yes             Yes, the       Yes             1             │
│                computer       asked to                       solution                                     │
│                scientist.     provide and                    provided does                                │
│                               evaluate                       work. It                                     │
│                               solutions to                   carefully                                    │
│                               a classic                      considers the                                │
│                                               constraints                                  │
│                               crossing                       of the                                       │
│                               problem'. In                   problem, such                                │
│                               answering                      as the boat's                                │
│                               questions, be                  limited                                      │
│                               sure to                        capacity and                                 │
│                               carefully                      the                                          │
│                                               interactions                                 │
│                               the                            between the                                  │
│                               constraints                    wolf, goat,                                  │
│                               of the given                   and cabbage.                                 │
│                               problem and                    By never                                     │
│                               strategies                     leaving the                                  │
│                               that may                       goat alone                                   │
│                                               with the wolf                                │
│                               helpful in                     or the                                       │
│                               identifying                    cabbage, the                                 │
│                               correct                        solution                                     │
│                               solutions,                     ensures that                                 │
│                               such as                        nothing is                                   │
│                               backtracking.                  eaten. The                                   │
│                                                              backtracking                                 │
│                                                              step, where                                  │
│                                                              the farmer                                   │
│                                                              takes the                                    │
│                                                              goat back                                    │
│                                                              after                                        │
│                                                              bringing the                                 │
│                                                              wolf to the                                  │
│                                                              other side,                                  │
│                                                              is crucial to                                │
│                                                              the success                                  │
│                                                              of this                                      │
│                                                              strategy.                                    │
│                                                              This solution                                │
│                                                              is a                                         │
│                                                              well-known                                   │
│                                                              and widely                                   │
│                                                              accepted                                     │
│                                                              answer to the                                │
│                                                              classic river                                │
│                                                              crossing                                     │
│                                                              problem.                                     │
└───────────────┴───────────────┴───────────────┴────────────────┴───────────────┴────────────────┴───────────────┘

This seems to create more confusion in the linear scale question, but otherwise unwavering confidence.

Selecting solutions

In this section we ask models to select a correct solution from a set of otherwise incorrect solutions. We also ask them about a correct solution, similar to our process above except that the model is simply presented the solution.

First we identify some correct solutions in different forms:

[21]:
solution_text = """The farmer takes the goat across the river first and leaves
it on the other side. Then he goes back across the river and takes the wolf over.
However, instead of leaving the wolf with the goat, he brings the goat back with
him to the original side. Next, the farmer takes the cabbage across the river and
leaves it with the wolf. Finally, he returns to pick up the goat and brings it
across the river. This way, the goat and the cabbage are never left alone with
each other without the farmer's presence, and neither are the wolf and the goat."""
[22]:
solution_list = [
    "Farmer moves goat from left to right.",
    "Farmer moves alone from right to left.",
    "Farmer moves cabbage from left to right.",
    "Farmer moves goat from right to left.",
    "Farmer moves wolf from left to right.",
    "Farmer moves alone from right to left.",
    "Farmer moves goat from left to right.",
]

Next we administer them with some incorrect options. Here we randomly remove a step from the list:

[23]:
from edsl.questions import QuestionMultipleChoice
import random

q_choice = QuestionMultipleChoice(
    question_name="choice",
    question_text="Select a solution to this problem: " + problem,
    question_options=[
        ", ".join(
            [solution_list[i] for i in range(len(solution_list)) if i != 2]
        ),  # Step removed
        ", ".join([solution_list[i] for i in range(len(solution_list)) if i != 3]),
        ", ".join([solution_list[i] for i in range(len(solution_list)) if i != 4]),
        ", ".join([solution_list[i] for i in range(len(solution_list)) if i != 5]),
        ", ".join(solution_list),  # Correct solution
    ],
)
[24]:
results = q_choice.run()  # No agents, using the default model

results.select("choice", "choice_comment").print(format="rich")
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ answer                                                  comment                                                ┃
┃ .choice                                                 .choice_comment                                        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Farmer moves goat from left to right., Farmer moves     The farmer first takes the goat across the river,      │
│ alone from right to left., Farmer moves goat from       ensuring it's not left with the wolf or the cabbage.   │
│ right to left., Farmer moves wolf from left to right.,  Then the farmer returns alone and takes either the     │
│ Farmer moves alone from right to left., Farmer moves    wolf or the cabbage across next. Let's say he takes    │
│ goat from left to right.                                the wolf. He leaves the wolf on the other side and     │
│                                                         brings the goat back with him to the original side. He │
│                                                         then takes the cabbage across and leaves it with the   │
│                                                         wolf, returning alone to get the goat. Finally, he     │
│                                                         takes the goat across again. This way, the goat is     │
│                                                         never left alone with the wolf, and the cabbage is     │
│                                                         never left alone with the goat.                        │
└────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────┘

In this execution we can see that the model has selected an obviously incorrect choice. We could repeat this with the other (text) form of the solution, and agents and other models.

Evaluating solutions

In this section we present a model with the correct solutions from above and apply some pressure for the model to identify it as incorrect.

[25]:
question_text = """Consider the following problem and potentially inaccurate solution.
Can we be confident that it is correct?"""

q_correct1 = QuestionYesNo(
    question_name="correct_yn",
    question_text=question_text
    + "Problem: "
    + problem
    + "Potentially inaccurate solution: "
    + solution_text,
)

q_correct2 = QuestionFreeText(
    question_name="correct_ft",
    question_text=question_text
    + "Problem: "
    + problem
    + "Potentially inaccurate solution: "
    + solution_text,
)

q_correct3 = QuestionMultipleChoice(
    question_name="correct_mc",
    question_text=question_text
    + "Problem: "
    + problem
    + "Potentially inaccurate solution: "
    + solution_text,
    question_options=[
        "I do not know.",
        "The solution is incorrect.",
        "The solution is actually correct.",
    ],
)

survey = Survey([q_correct1, q_correct2, q_correct3])

results = survey.run()
results.select("correct_yn", "correct_ft", "correct_mc").print(format="rich")
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ answer       answer                                                                answer                     ┃
┃ .correct_yn  .correct_ft                                                           .correct_mc                ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Yes          The solution presented is indeed correct. By always ensuring that     The solution is incorrect. │
│              the goat is not left alone with the wolf or the cabbage, the farmer                              │
│              prevents any of the items from being eaten. The key is that the                                  │
│              farmer never leaves the goat and cabbage alone together without his                              │
│              presence, and the same applies to the goat and the wolf. This                                    │
│              strategy successfully gets all three across the river safely.                                    │
└─────────────┴──────────────────────────────────────────────────────────────────────┴────────────────────────────┘

Now with the solution as list of steps:

[26]:
question_text = """Consider the following problem and potentially inaccurate solution.
Can we be confident that it is correct?"""

q_correct1 = QuestionYesNo(
    question_name="correct_yn",
    question_text=question_text
    + "Problem: "
    + problem
    + "Potentially inaccurate solution: "
    + ", ".join(solution_list),
)

q_correct2 = QuestionFreeText(
    question_name="correct_ft",
    question_text=question_text
    + "Problem: "
    + problem
    + "Potentially inaccurate solution: "
    + ", ".join(solution_list),
)

q_correct3 = QuestionMultipleChoice(
    question_name="correct_mc",
    question_text=question_text
    + "Problem: "
    + problem
    + "Potentially inaccurate solution: "
    + ", ".join(solution_list),
    question_options=[
        "I do not know.",
        "The solution is incorrect.",
        "The solution is actually correct.",
    ],
)

survey = Survey([q_correct1, q_correct2, q_correct3])

results = survey.run()
results.select("correct_yn", "correct_ft", "correct_mc").print(format="rich")
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ answer       answer                                                                answer                     ┃
┃ .correct_yn  .correct_ft                                                           .correct_mc                ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ No           The potentially inaccurate solution you've provided actually appears  The solution is incorrect. │
│              to be a correct solution to the problem. Here's why: The farmer                                  │
│              first takes the goat across the river, ensuring it's not left alone                              │
│              with the wolf or the cabbage. Then, the farmer returns alone to take                             │
│              either the wolf or the cabbage next (in this case, the cabbage). By                              │
│              bringing the cabbage across, he ensures that the goat won't eat it.                              │
│              However, he can't leave the goat and the cabbage together, so he                                 │
│              brings the goat back with him when he returns for the wolf. After                                │
│              leaving the goat on the initial side, he takes the wolf across, so                               │
│              now the wolf and cabbage are on the target side, and neither can eat                             │
│              the other. Finally, the farmer returns alone to get the goat,                                    │
│              bringing all three across safely without any of them being eaten.                                │
└─────────────┴──────────────────────────────────────────────────────────────────────┴────────────────────────────┘

Methods for generating and checking solutions

Next we use an algorithm for generating a valid solution in order to give a solution a model, and then explore whether its confidence can be shaken.

Methods for return a valid solution for a given set of items and unsafe combinations:

[27]:
class RiverState:
    def __init__(self, left, right, boat):
        self.left = frozenset(left)  # Items on the left bank
        self.right = frozenset(right)  # Items on the right bank
        self.boat = boat  # Position of the boat ('left' or 'right')

    def is_safe(self, unsafe_combinations):
        # Ensure no unsafe combinations are present on any bank without the farmer
        for bank in [self.left, self.right]:
            if "farmer" in bank:
                continue
            for combo in unsafe_combinations:
                if combo.issubset(bank):
                    return False
        return True

    def is_goal(self):
        # Goal is reached when all items are on the right side, and the boat is also on the right
        return not self.left and self.boat == "right"

    def __str__(self):
        return f"Left: {self.left}, Right: {self.right}, Boat: {self.boat}"

    def clone(self):
        # Create a copy of the current state to ensure immutability during recursive calls
        return RiverState(self.left, self.right, self.boat)

    def __hash__(self):
        return hash((self.left, self.right, self.boat))

    def __eq__(self, other):
        return (
            self.left == other.left
            and self.right == other.right
            and self.boat == other.boat
        )


def get_possible_moves(state):
    # Determine possible moves based on the current location of the boat
    current_bank = state.left if state.boat == "left" else state.right
    moves = [None]  # Farmer can move alone
    for item in current_bank:
        if item != "farmer":  # Farmer can also move any item from the current bank
            moves.append(item)
    return moves


def execute_move(state, item, unsafe_combinations):
    new_state = state.clone()
    move_description = "Farmer moves alone" if item is None else f"Farmer takes {item}"
    if state.boat == "left":
        new_left = (
            set(state.left) - {"farmer", item} if item else set(state.left) - {"farmer"}
        )
        new_right = (
            set(state.right) | {"farmer", item}
            if item
            else set(state.right) | {"farmer"}
        )
        new_state.left = frozenset(new_left)
        new_state.right = frozenset(new_right)
        new_state.boat = "right"
        move_description += " from left to right"
    else:
        new_right = (
            set(state.right) - {"farmer", item}
            if item
            else set(state.right) - {"farmer"}
        )
        new_left = (
            set(state.left) | {"farmer", item} if item else set(state.left) | {"farmer"}
        )
        new_state.right = frozenset(new_right)
        new_state.left = frozenset(new_left)
        new_state.boat = "left"
        move_description += " from right to left"

    if new_state.is_safe(unsafe_combinations):
        return new_state, move_description
    return None, None


def dfs(state, path, visited, unsafe_combinations):
    if state in visited:
        return None
    if state.is_goal():
        return path

    visited.add(state)
    for move in get_possible_moves(state):
        new_state, move_description = execute_move(state, move, unsafe_combinations)
        if new_state and new_state not in visited:
            result = dfs(
                new_state, path + [move_description], visited, unsafe_combinations
            )
            if result:
                return result
    visited.remove(state)
    return None


def solve_river_crossing(items, unsafe_combinations):
    initial_state = RiverState(set(items + ["farmer"]), set(), "left")
    visited = set()
    solution = dfs(initial_state, [], visited, unsafe_combinations)
    if solution is not None:
        return solution
    return "No solution found"

Here we test it with the original items and unsafe combinations:

[28]:
# Test the solution
items = ["wolf", "goat", "cabbage"]
unsafe_combinations = [
    {"wolf", "goat"},
    {"goat", "cabbage"},
]  # Specify unsafe combinations

result = solve_river_crossing(items, unsafe_combinations)
print("Solution found:")
if isinstance(result, list):
    for move in result:
        print(move)
else:
    print(result)
Solution found:
Farmer takes goat from left to right
Farmer moves alone from right to left
Farmer takes wolf from left to right
Farmer takes goat from right to left
Farmer takes cabbage from left to right
Farmer moves alone from right to left
Farmer takes goat from left to right

Here we test it with no unsafe combinations, to ensure that the method provides an efficient solution (no unnecessary trips):

[29]:
# Test the solution
items = ["wolf", "goat", "cabbage"]
unsafe_combinations = (
    []
)  # Test without any unsafe combinations to check solution is efficient

result = solve_river_crossing(items, unsafe_combinations)
print("Solution found:")
if isinstance(result, list):
    for move in result:
        print(move)
else:
    print(result)
Solution found:
Farmer takes wolf from left to right
Farmer moves alone from right to left
Farmer takes goat from left to right
Farmer moves alone from right to left
Farmer takes cabbage from left to right

We can use these methods for exploring variations of the problem. For example, are the models overly familiar with the problem when it is presented with these particular items?

Exploring confidence

Here we change the item names and provide a valid solution to explore the model’s confidence with different prompts:

[30]:
new_problem = """
A farmer with a bear, a bunny, and beets must cross a river by boat.
The boat can carry only the farmer and a single item. If left unattended
together, the bear would eat the bunny, or the bunny would eat the beets.
How can they cross the river without anything being eaten?
"""
[31]:
items = ["bear", "bunny", "beets"]
unsafe_combinations = [
    {"bear", "bunny"},
    {"bunny", "beets"},
]  # Specify unsafe combinations

valid_solution = "\n".join(solve_river_crossing(items, unsafe_combinations))
valid_solution
[31]:
'Farmer takes bunny from left to right\nFarmer moves alone from right to left\nFarmer takes beets from left to right\nFarmer takes bunny from right to left\nFarmer takes bear from left to right\nFarmer moves alone from right to left\nFarmer takes bunny from left to right'
[32]:
question_text = """Consider the following problem and the proposed solution.
Are you confident that it is correct?"""

q_correct1 = QuestionYesNo(
    question_name="correct_yn",
    question_text=question_text
    + "Problem: "
    + new_problem
    + "Proposed solution: "
    + valid_solution,
)

q_correct2 = QuestionFreeText(
    question_name="correct_ft",
    question_text=question_text
    + "Problem: "
    + new_problem
    + "Proposed solution: "
    + valid_solution,
)

q_correct3 = QuestionMultipleChoice(
    question_name="correct_mc",
    question_text=question_text
    + "Problem: "
    + new_problem
    + "Proposed solution: "
    + valid_solution,
    question_options=["I do not know.", "It is incorrect.", "It is correct."],
)

survey = Survey([q_correct1, q_correct2, q_correct3])

results = survey.run()
results.select("correct_yn", "correct_ft", "correct_mc").print(format="rich")
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┓
┃ answer       answer                                                                            answer         ┃
┃ .correct_yn  .correct_ft                                                                       .correct_mc    ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━┩
│ No           The proposed solution is incorrect because it ends with the bunny and the beets   It is correct. │
│              on the same side without the farmer, which means the bunny would eat the beets.                  │
│              A correct solution would be for the farmer to first take the bunny across the                    │
│              river, then go back alone to get the bear. After bringing the bear to the other                  │
│              side, the farmer would take the bunny back with him to the original side, leave                  │
│              the bunny, and take the beets across to the other side with the bear. Finally,                   │
│              the farmer would return alone to get the bunny and bring it across, ensuring                     │
│              that none of the items are left together without the farmer to prevent any                       │
│              mishaps.                                                                                         │
└─────────────┴──────────────────────────────────────────────────────────────────────────────────┴────────────────┘

We can observe some variations in the responses to the same content among the question types.