Calculating next token probabilities

This notebook provides sample EDSL code for using language models to simulate a survey and calculating next token probabilities for models’ responses to survey questions.

EDSL is an open-source libary for simulating surveys, experiments and other research with AI agents and large language models. Before running the code below, please ensure that you have installed the EDSL library and either activated remote inference from your Coop account or stored API keys for the language models that you want to use with EDSL. Please also see our documentation page for tips and tutorials on getting started using EDSL.

Research question

[1]:
from IPython.display import HTML
HTML("""<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Aspirational wealth...doing better than you parents...an &quot;Opportunity Economy!&quot; <br>NO!<br>All are late 20th century neoliberal tropes.<br>Americans today seek financial security.<br>Decent jobs and government policy that will pay for the needs of life and old age.<br>Understand that Democrats! <a href="https://t.co/eR3hbx4wbX">pic.twitter.com/eR3hbx4wbX</a></p>&mdash; Dan Alpert (@DanielAlpert) <a href="https://twitter.com/DanielAlpert/status/1833332263733416127?ref_src=twsrc%5Etfw">September 10, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>""")
[1]:

Simulating survey responses

In the steps below we demonstrate how to use EDSL to simulate responses to the above question:

“Which of the following is more important to you: Financial stability / Moving up the income ladder”

Creating questions

We start by selecting a question type and constructing a question in the relevant template. EDSL comes with many common question types that we can choose from based on the desired form of the response:

[2]:
from edsl import QuestionMultipleChoice
[3]:
q = QuestionMultipleChoice(
    question_name = "income_pref",
    question_text = "Which of the following is more important to you: ",
    question_options = ["Financial stability", "Moving up the income ladder"]
)

Designing AI agents

We can design AI agents with relevant traits to answer the question:

[4]:
from edsl import Agent
[5]:
a = Agent(traits = {"persona": "You are an American answering a poll from Pew."})

Selecting language models

EDSL works with many popular models that we can use to generate responses:

[6]:
from edsl import Model
[7]:
m = Model("gpt-4o", temperature = 1)

Running a survey

We administer the question by adding the agent and model and then running it. We can specify the number of times to administer the question:

[8]:
results = q.by(a).by(m).run(n = 20)

EDSL comes with built-in methods for analyzing the dataset of ``Results` <https://docs.expectedparrot.com/en/latest/results.html>`__ that is generated:

[9]:
results.select("income_pref").tally().print(format="rich")
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ value                count ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ Financial stability  20    │
└─────────────────────┴───────┘

Calculating token probabilities

In the above example we specified n = 20 to run the question (with the agent and model) 20 times.

We can also get the probabilities from the model by passing logprobs = True to the Model.

To simplify the token probabilities calculation, we can also specify use_code = True in the Question parameters. This will cause the question to be presented to the model with coded options: 0 for “Financial stability” and 1 for “Moving up the income ladder”:

[10]:
m = Model("gpt-4o", temperature = 1, logprobs = True)
[11]:
q = QuestionMultipleChoice(
    question_name = "income_pref",
    question_text = "Which of the following is more important to you: ",
    question_options = ["Financial stability", "Moving up the income ladder"],
    use_code = True
)
[12]:
new_results = q.by(a).by(m).run(n = 20)
[13]:
new_results.select("income_pref").tally().print(format = "rich")
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ value                count ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ Financial stability  20    │
└─────────────────────┴───────┘

Inspecting results

The Results include information about all the inputs and outputs relating to the question and response.

To see a list of all the components that can be accessed and analyzed:

[14]:
results.columns
[14]:
['agent.agent_instruction',
 'agent.agent_name',
 'agent.persona',
 'answer.income_pref',
 'comment.income_pref_comment',
 'generated_tokens.income_pref_generated_tokens',
 'iteration.iteration',
 'model.frequency_penalty',
 'model.logprobs',
 'model.max_tokens',
 'model.model',
 'model.presence_penalty',
 'model.temperature',
 'model.top_logprobs',
 'model.top_p',
 'prompt.income_pref_system_prompt',
 'prompt.income_pref_user_prompt',
 'question_options.income_pref_question_options',
 'question_text.income_pref_question_text',
 'question_type.income_pref_question_type',
 'raw_model_response.income_pref_cost',
 'raw_model_response.income_pref_one_usd_buys',
 'raw_model_response.income_pref_raw_model_response']

We can inspect the raw_model_response.income_pref_raw_model_response component to identify next token probabilities:

[15]:
example = new_results.select("raw_model_response.income_pref_raw_model_response").to_list()[0]
[16]:
# example
[17]:
next_token_probs = example['choices'][0]['logprobs']['content'][0]['top_logprobs']
next_token_probs
[17]:
[{'token': '0', 'bytes': [48], 'logprob': -0.00055577443},
 {'token': '1', 'bytes': [49], 'logprob': -7.500556},
 {'token': '\n', 'bytes': [10], 'logprob': -13.000556}]
[18]:
import math

# Specifying the codes for the answer options and non-responses:
options = {'0': "Financial stability", '1':"Moving up the income ladder", '\n': "Skipped", " ": "Skipped"}

for token_info in next_token_probs:
    option = options[token_info['token']]
    p = math.exp(token_info['logprob'])

    print(f"Probability of selecting '{option}' was {p:.3f}")
Probability of selecting 'Financial stability' was 0.999
Probability of selecting 'Moving up the income ladder' was 0.001
Probability of selecting 'Skipped' was 0.000

Comparing models

We can rerun the survey with other available models.

To see a list of all available models:

[19]:
# Model.available()
[20]:
len(Model.available())
[20]:
267
[21]:
Model.available(service = "openai")
[21]:
[['chatgpt-4o-latest', 'openai', 46],
 ['curie:ft-emeritus-2022-11-30-12-58-24', 'openai', 67],
 ['curie:ft-emeritus-2022-12-01-01-04-36', 'openai', 68],
 ['curie:ft-emeritus-2022-12-01-01-51-20', 'openai', 69],
 ['curie:ft-emeritus-2022-12-01-14-16-46', 'openai', 70],
 ['curie:ft-emeritus-2022-12-01-14-28-00', 'openai', 71],
 ['curie:ft-emeritus-2022-12-01-14-49-45', 'openai', 72],
 ['curie:ft-emeritus-2022-12-01-15-29-32', 'openai', 73],
 ['curie:ft-emeritus-2022-12-01-15-42-25', 'openai', 74],
 ['curie:ft-emeritus-2022-12-01-15-52-24', 'openai', 75],
 ['curie:ft-emeritus-2022-12-01-16-40-12', 'openai', 76],
 ['davinci:ft-emeritus-2022-11-30-14-57-33', 'openai', 79],
 ['gpt-3.5-turbo', 'openai', 115],
 ['gpt-3.5-turbo-0125', 'openai', 116],
 ['gpt-3.5-turbo-1106', 'openai', 117],
 ['gpt-3.5-turbo-16k', 'openai', 118],
 ['gpt-4', 'openai', 119],
 ['gpt-4-0125-preview', 'openai', 120],
 ['gpt-4-0613', 'openai', 121],
 ['gpt-4-1106-preview', 'openai', 122],
 ['gpt-4-turbo', 'openai', 123],
 ['gpt-4-turbo-2024-04-09', 'openai', 124],
 ['gpt-4-turbo-preview', 'openai', 125],
 ['gpt-4o', 'openai', 126],
 ['gpt-4o-2024-05-13', 'openai', 127],
 ['gpt-4o-2024-08-06', 'openai', 128],
 ['gpt-4o-audio-preview', 'openai', 129],
 ['gpt-4o-audio-preview-2024-10-01', 'openai', 130],
 ['gpt-4o-mini', 'openai', 131],
 ['gpt-4o-mini-2024-07-18', 'openai', 132],
 ['gpt-4o-realtime-preview', 'openai', 133],
 ['gpt-4o-realtime-preview-2024-10-01', 'openai', 134],
 ['o1-mini', 'openai', 236],
 ['o1-mini-2024-09-12', 'openai', 237],
 ['o1-preview', 'openai', 238],
 ['o1-preview-2024-09-12', 'openai', 239]]
[22]:
models = [Model(m) for m in ["gpt-3.5-turbo", "gpt-4-1106-preview", "gpt-4o", "o1-preview"]]
[23]:
results_with_multiple_models = q.by(a).by(models).run()

We can check which models did/not answer the question, and filter out the non-responses:

[24]:
(
    results_with_multiple_models
    .filter('income_pref is not None')
    .select('income_pref')
    .tally()
    .print(format = "rich")
)
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ value                count ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ Financial stability  4     │
└─────────────────────┴───────┘
[25]:
results_with_multiple_models.filter("income_pref is not None").select("model").print()
model.model
gpt-3.5-turbo
gpt-4-1106-preview
gpt-4o
o1-preview

Posting to the Coop

The Coop is a platform for creating, storing and sharing LLM-based research. It is fully integrated with EDSL and accessible from your workspace or Coop account page. Learn more about creating an account and using the Coop.

Here we demonstrate how to post this notebook:

[26]:
from edsl import Notebook
[27]:
n = Notebook(path = "next_token_probs.ipynb")
[26]:
n.push(description = "Example code for calculating next token probabilities", visibility = "public")
[26]:
{'description': 'Example code for calculating next token probabilities',
 'object_type': 'notebook',
 'url': 'https://www.expectedparrot.com/content/8be8de45-006c-484a-b677-8e3bb25f8ff7',
 'uuid': '8be8de45-006c-484a-b677-8e3bb25f8ff7',
 'version': '0.1.33.dev1',
 'visibility': 'public'}

To update an object at the Coop:

[29]:
n = Notebook(path = "next_token_probs.ipynb") # resave
[28]:
n.patch(uuid = "8be8de45-006c-484a-b677-8e3bb25f8ff7", value = n)
[28]:
{'status': 'success'}