Calculating next token probabilities
This notebook provides sample EDSL code for using language models to simulate a survey and calculating next token probabilities for models’ responses to survey questions.
EDSL is an open-source libary for simulating surveys, experiments and other research with AI agents and large language models. Before running the code below, please ensure that you have installed the EDSL library and either activated remote inference from your Coop account or stored API keys for the language models that you want to use with EDSL. Please also see our documentation page for tips and tutorials on getting started using EDSL.
Research question
[1]:
from IPython.display import HTML
HTML("""<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Aspirational wealth...doing better than you parents...an "Opportunity Economy!" <br>NO!<br>All are late 20th century neoliberal tropes.<br>Americans today seek financial security.<br>Decent jobs and government policy that will pay for the needs of life and old age.<br>Understand that Democrats! <a href="https://t.co/eR3hbx4wbX">pic.twitter.com/eR3hbx4wbX</a></p>— Dan Alpert (@DanielAlpert) <a href="https://twitter.com/DanielAlpert/status/1833332263733416127?ref_src=twsrc%5Etfw">September 10, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>""")
[1]:
Aspirational wealth...doing better than you parents...an "Opportunity Economy!"
— Dan Alpert (@DanielAlpert) September 10, 2024
NO!
All are late 20th century neoliberal tropes.
Americans today seek financial security.
Decent jobs and government policy that will pay for the needs of life and old age.
Understand that Democrats! pic.twitter.com/eR3hbx4wbX
Simulating survey responses
In the steps below we demonstrate how to use EDSL to simulate responses to the above question:
“Which of the following is more important to you: Financial stability / Moving up the income ladder”
Creating questions
We start by selecting a question type and constructing a question in the relevant template. EDSL comes with many common question types that we can choose from based on the desired form of the response:
[2]:
from edsl import QuestionMultipleChoice
[3]:
q = QuestionMultipleChoice(
question_name = "income_pref",
question_text = "Which of the following is more important to you: ",
question_options = ["Financial stability", "Moving up the income ladder"]
)
Designing AI agents
We can design AI agents with relevant traits
to answer the question:
[4]:
from edsl import Agent
[5]:
a = Agent(traits = {"persona": "You are an American answering a poll from Pew."})
Selecting language models
EDSL works with many popular models that we can use to generate responses:
[6]:
from edsl import Model
[7]:
m = Model("gpt-4o", temperature = 1)
Running a survey
We administer the question by adding the agent and model and then running it. We can specify the number of times to administer the question:
[8]:
results = q.by(a).by(m).run(n = 20)
EDSL comes with built-in methods for analyzing the dataset of ``Results` <https://docs.expectedparrot.com/en/latest/results.html>`__ that is generated:
[9]:
results.select("income_pref").tally().print(format="rich")
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓ ┃ value ┃ count ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩ │ Financial stability │ 20 │ └─────────────────────┴───────┘
Calculating token probabilities
In the above example we specified n = 20 to run the question (with the agent and model) 20 times.
We can also get the probabilities from the model by passing logprobs = True to the Model
.
To simplify the token probabilities calculation, we can also specify use_code = True in the Question
parameters. This will cause the question to be presented to the model with coded options: 0 for “Financial stability” and 1 for “Moving up the income ladder”:
[10]:
m = Model("gpt-4o", temperature = 1, logprobs = True)
[11]:
q = QuestionMultipleChoice(
question_name = "income_pref",
question_text = "Which of the following is more important to you: ",
question_options = ["Financial stability", "Moving up the income ladder"],
use_code = True
)
[12]:
new_results = q.by(a).by(m).run(n = 20)
[13]:
new_results.select("income_pref").tally().print(format = "rich")
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓ ┃ value ┃ count ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩ │ Financial stability │ 20 │ └─────────────────────┴───────┘
Inspecting results
The Results
include information about all the inputs and outputs relating to the question and response.
To see a list of all the components that can be accessed and analyzed:
[14]:
results.columns
[14]:
['agent.agent_instruction',
'agent.agent_name',
'agent.persona',
'answer.income_pref',
'comment.income_pref_comment',
'generated_tokens.income_pref_generated_tokens',
'iteration.iteration',
'model.frequency_penalty',
'model.logprobs',
'model.max_tokens',
'model.model',
'model.presence_penalty',
'model.temperature',
'model.top_logprobs',
'model.top_p',
'prompt.income_pref_system_prompt',
'prompt.income_pref_user_prompt',
'question_options.income_pref_question_options',
'question_text.income_pref_question_text',
'question_type.income_pref_question_type',
'raw_model_response.income_pref_cost',
'raw_model_response.income_pref_one_usd_buys',
'raw_model_response.income_pref_raw_model_response']
We can inspect the raw_model_response.income_pref_raw_model_response
component to identify next token probabilities:
[15]:
example = new_results.select("raw_model_response.income_pref_raw_model_response").to_list()[0]
[16]:
# example
[17]:
next_token_probs = example['choices'][0]['logprobs']['content'][0]['top_logprobs']
next_token_probs
[17]:
[{'token': '0', 'bytes': [48], 'logprob': -0.00055577443},
{'token': '1', 'bytes': [49], 'logprob': -7.500556},
{'token': '\n', 'bytes': [10], 'logprob': -13.000556}]
[18]:
import math
# Specifying the codes for the answer options and non-responses:
options = {'0': "Financial stability", '1':"Moving up the income ladder", '\n': "Skipped", " ": "Skipped"}
for token_info in next_token_probs:
option = options[token_info['token']]
p = math.exp(token_info['logprob'])
print(f"Probability of selecting '{option}' was {p:.3f}")
Probability of selecting 'Financial stability' was 0.999
Probability of selecting 'Moving up the income ladder' was 0.001
Probability of selecting 'Skipped' was 0.000
Comparing models
We can rerun the survey with other available models.
To see a list of all available models:
[19]:
# Model.available()
[20]:
len(Model.available())
[20]:
267
[21]:
Model.available(service = "openai")
[21]:
[['chatgpt-4o-latest', 'openai', 46],
['curie:ft-emeritus-2022-11-30-12-58-24', 'openai', 67],
['curie:ft-emeritus-2022-12-01-01-04-36', 'openai', 68],
['curie:ft-emeritus-2022-12-01-01-51-20', 'openai', 69],
['curie:ft-emeritus-2022-12-01-14-16-46', 'openai', 70],
['curie:ft-emeritus-2022-12-01-14-28-00', 'openai', 71],
['curie:ft-emeritus-2022-12-01-14-49-45', 'openai', 72],
['curie:ft-emeritus-2022-12-01-15-29-32', 'openai', 73],
['curie:ft-emeritus-2022-12-01-15-42-25', 'openai', 74],
['curie:ft-emeritus-2022-12-01-15-52-24', 'openai', 75],
['curie:ft-emeritus-2022-12-01-16-40-12', 'openai', 76],
['davinci:ft-emeritus-2022-11-30-14-57-33', 'openai', 79],
['gpt-3.5-turbo', 'openai', 115],
['gpt-3.5-turbo-0125', 'openai', 116],
['gpt-3.5-turbo-1106', 'openai', 117],
['gpt-3.5-turbo-16k', 'openai', 118],
['gpt-4', 'openai', 119],
['gpt-4-0125-preview', 'openai', 120],
['gpt-4-0613', 'openai', 121],
['gpt-4-1106-preview', 'openai', 122],
['gpt-4-turbo', 'openai', 123],
['gpt-4-turbo-2024-04-09', 'openai', 124],
['gpt-4-turbo-preview', 'openai', 125],
['gpt-4o', 'openai', 126],
['gpt-4o-2024-05-13', 'openai', 127],
['gpt-4o-2024-08-06', 'openai', 128],
['gpt-4o-audio-preview', 'openai', 129],
['gpt-4o-audio-preview-2024-10-01', 'openai', 130],
['gpt-4o-mini', 'openai', 131],
['gpt-4o-mini-2024-07-18', 'openai', 132],
['gpt-4o-realtime-preview', 'openai', 133],
['gpt-4o-realtime-preview-2024-10-01', 'openai', 134],
['o1-mini', 'openai', 236],
['o1-mini-2024-09-12', 'openai', 237],
['o1-preview', 'openai', 238],
['o1-preview-2024-09-12', 'openai', 239]]
[22]:
models = [Model(m) for m in ["gpt-3.5-turbo", "gpt-4-1106-preview", "gpt-4o", "o1-preview"]]
[23]:
results_with_multiple_models = q.by(a).by(models).run()
We can check which models did/not answer the question, and filter out the non-responses:
[24]:
(
results_with_multiple_models
.filter('income_pref is not None')
.select('income_pref')
.tally()
.print(format = "rich")
)
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓ ┃ value ┃ count ┃ ┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩ │ Financial stability │ 4 │ └─────────────────────┴───────┘
[25]:
results_with_multiple_models.filter("income_pref is not None").select("model").print()
model.model |
---|
gpt-3.5-turbo |
gpt-4-1106-preview |
gpt-4o |
o1-preview |
Posting to the Coop
The Coop is a platform for creating, storing and sharing LLM-based research. It is fully integrated with EDSL and accessible from your workspace or Coop account page. Learn more about creating an account and using the Coop.
Here we demonstrate how to post this notebook:
[26]:
from edsl import Notebook
[27]:
n = Notebook(path = "next_token_probs.ipynb")
[26]:
n.push(description = "Example code for calculating next token probabilities", visibility = "public")
[26]:
{'description': 'Example code for calculating next token probabilities',
'object_type': 'notebook',
'url': 'https://www.expectedparrot.com/content/8be8de45-006c-484a-b677-8e3bb25f8ff7',
'uuid': '8be8de45-006c-484a-b677-8e3bb25f8ff7',
'version': '0.1.33.dev1',
'visibility': 'public'}
To update an object at the Coop:
[29]:
n = Notebook(path = "next_token_probs.ipynb") # resave
[28]:
n.patch(uuid = "8be8de45-006c-484a-b677-8e3bb25f8ff7", value = n)
[28]:
{'status': 'success'}