Calculating next token probabilities

This notebook provides sample EDSL code for using language models to simulate a survey and calculating next token probabilities for models’ responses to survey questions.

EDSL is an open-source libary for simulating surveys, experiments and other research with AI agents and large language models. Before running the code below, please ensure that you have installed the EDSL library and either activated remote inference from your Coop account or stored API keys for the language models that you want to use with EDSL. Please also see our documentation page for tips and tutorials on getting started using EDSL.

Research question

[1]:
from IPython.display import HTML
HTML("""<blockquote class="twitter-tweet"><p lang="en" dir="ltr">Aspirational wealth...doing better than you parents...an &quot;Opportunity Economy!&quot; <br>NO!<br>All are late 20th century neoliberal tropes.<br>Americans today seek financial security.<br>Decent jobs and government policy that will pay for the needs of life and old age.<br>Understand that Democrats! <a href="https://t.co/eR3hbx4wbX">pic.twitter.com/eR3hbx4wbX</a></p>&mdash; Dan Alpert (@DanielAlpert) <a href="https://twitter.com/DanielAlpert/status/1833332263733416127?ref_src=twsrc%5Etfw">September 10, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>""")
[1]:

Simulating survey responses

In the steps below we demonstrate how to use EDSL to simulate responses to the above question:

“Which of the following is more important to you: Financial stability / Moving up the income ladder”

Creating questions

We start by selecting a question type and constructing a question in the relevant template. EDSL comes with many common question types that we can choose from based on the desired form of the response:

[2]:
from edsl import QuestionMultipleChoice
[3]:
q = QuestionMultipleChoice(
    question_name = "income_pref",
    question_text = "Which of the following is more important to you: ",
    question_options = ["Financial stability", "Moving up the income ladder"]
)

Designing AI agents

We can design AI agents with relevant traits to answer the question:

[4]:
from edsl import Agent
[5]:
a = Agent(traits = {"persona": "You are an American answering a poll from Pew."})

Selecting language models

EDSL works with many popular models that we can use to generate responses:

[6]:
from edsl import Model
[7]:
m = Model("gpt-4o", temperature = 1)

Running a survey

We administer the question by adding the agent and model and then running it. We can specify the number of times to administer the question:

[8]:
results = q.by(a).by(m).run(n = 20)

EDSL comes with built-in methods for analyzing the dataset of ``Results` <https://docs.expectedparrot.com/en/latest/results.html>`__ that is generated:

[9]:
results.select("income_pref").tally().print(format="rich")
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ value                count ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ Financial stability  20    │
└─────────────────────┴───────┘

Calculating token probabilities

In the above example we specified n = 20 to run the question (with the agent and model) 20 times.

We can also get the probabilities from the model by passing logprobs = True to the Model.

To simplify the token probabilities calculation, we can also specify use_code = True in the Question parameters. This will cause the question to be presented to the model with coded options: 0 for “Financial stability” and 1 for “Moving up the income ladder”:

[10]:
m = Model("gpt-4o", temperature = 1, logprobs = True)
[11]:
q = QuestionMultipleChoice(
    question_name = "income_pref",
    question_text = "Which of the following is more important to you: ",
    question_options = ["Financial stability", "Moving up the income ladder"],
    use_code = True
)
[12]:
new_results = q.by(a).by(m).run(n = 20)
[13]:
new_results.select("income_pref").tally().print(format = "rich")
┏━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ value                count ┃
┡━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ Financial stability  20    │
└─────────────────────┴───────┘

Inspecting results

The Results include information about all the inputs and outputs relating to the question and response.

To see a list of all the components that can be accessed and analyzed:

[14]:
results.columns
[14]:
['agent.agent_instruction',
 'agent.agent_name',
 'agent.persona',
 'answer.income_pref',
 'comment.income_pref_comment',
 'generated_tokens.income_pref_generated_tokens',
 'iteration.iteration',
 'model.frequency_penalty',
 'model.logprobs',
 'model.max_tokens',
 'model.model',
 'model.presence_penalty',
 'model.temperature',
 'model.top_logprobs',
 'model.top_p',
 'prompt.income_pref_system_prompt',
 'prompt.income_pref_user_prompt',
 'question_options.income_pref_question_options',
 'question_text.income_pref_question_text',
 'question_type.income_pref_question_type',
 'raw_model_response.income_pref_cost',
 'raw_model_response.income_pref_one_usd_buys',
 'raw_model_response.income_pref_raw_model_response']

We can inspect the raw_model_response.income_pref_raw_model_response component to identify next token probabilities:

[15]:
example = new_results.select("raw_model_response.income_pref_raw_model_response").to_list()[0]
[16]:
next_token_probs = example['choices'][0]['logprobs']['content'][0]['top_logprobs']
next_token_probs
[16]:
[{'token': '0', 'bytes': [48], 'logprob': -0.00028982185},
 {'token': '1', 'bytes': [49], 'logprob': -8.25029},
 {'token': ' ', 'bytes': [32], 'logprob': -11.50029}]
[18]:
token_info
[18]:
{'token': ' ', 'bytes': [32], 'logprob': -11.50029}
[19]:
import math

# Specifying the codes for the answer options and non-responses:
options = {'0': "Financial stability", '1':"Moving up the income ladder", '\n': "Skipped", " ": "Skipped"}

for token_info in next_token_probs:
    option = options[token_info['token']]
    p = math.exp(token_info['logprob'])

    print(f"Probability of selecting '{option}' was {p:.3f}")
Probability of selecting 'Financial stability' was 1.000
Probability of selecting 'Moving up the income ladder' was 0.000
Probability of selecting 'Skipped' was 0.000

Comparing models

We can rerun the survey with other available models.

To see a list of all available models:

[18]:
# Model.available()
[20]:
models = [Model(model_name) for model_name, _, _ in Model.available()]
[21]:
len(models)
[21]:
151

We know some models will not be appropriate; we can add print_exceptions = False to skip the error report:

[21]:
results_with_many_models = q.by(a).by(models).run(print_exceptions = False)

We can check which models did/not answer the question, and filter out the non-responses:

[22]:
(
    results_with_many_models
    .filter('income_pref is not None')
    .select('income_pref')
    .tally()
    .print(format = "rich")
)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━┓
┃ value                        count ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━┩
│ Financial stability          86    │
├─────────────────────────────┼───────┤
│ Moving up the income ladder  8     │
└─────────────────────────────┴───────┘
[23]:
results_with_many_models.filter("income_pref is not None").select("model").print()
model.model
01-ai/Yi-34B-Chat
Austism/chronos-hermes-13b-v2
Gryphe/MythoMax-L2-13b
Gryphe/MythoMax-L2-13b-turbo
HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1
Phind/Phind-CodeLlama-34B-v2
Qwen/Qwen2-72B-Instruct
Qwen/Qwen2-7B-Instruct
Qwen/Qwen2.5-72B-Instruct
Sao10K/L3-70B-Euryale-v2.1
Sao10K/L3.1-70B-Euryale-v2.2
bigcode/starcoder2-15b
bigcode/starcoder2-15b-instruct-v0.1
chatgpt-4o-latest
claude-3-5-sonnet-20240620
claude-3-haiku-20240307
claude-3-opus-20240229
claude-3-sonnet-20240229
codellama/CodeLlama-34b-Instruct-hf
codellama/CodeLlama-70b-Instruct-hf
codestral-2405
codestral-latest
codestral-mamba-2407
cognitivecomputations/dolphin-2.6-mixtral-8x7b
cognitivecomputations/dolphin-2.9.1-llama-3-70b
databricks/dbrx-instruct
deepinfra/airoboros-70b
gemini-1.0-pro
gemini-1.5-flash
gemini-1.5-pro
gemini-pro
gemma-7b-it
gemma2-9b-it
google/codegemma-7b-it
google/gemma-1.1-7b-it
google/gemma-2-27b-it
google/gemma-2-9b-it
gpt-3.5-turbo-0125
gpt-3.5-turbo-16k
gpt-4
gpt-4-0125-preview
gpt-4-0613
gpt-4-1106-preview
gpt-4-turbo
gpt-4-turbo-2024-04-09
gpt-4-turbo-preview
gpt-4o
gpt-4o-2024-05-13
gpt-4o-2024-08-06
gpt-4o-mini
gpt-4o-mini-2024-07-18
lizpreciatior/lzlv_70b_fp16_hf
llama-3.1-70b-versatile
llama-3.1-8b-instant
llama3-70b-8192
llama3-8b-8192
llama3-groq-70b-8192-tool-use-preview
llama3-groq-8b-8192-tool-use-preview
mattshumer/Reflection-Llama-3.1-70B
meta-llama/Llama-2-13b-chat-hf
meta-llama/Llama-2-70b-chat-hf
meta-llama/Llama-2-7b-chat-hf
meta-llama/Meta-Llama-3-70B-Instruct
meta-llama/Meta-Llama-3-8B-Instruct
meta-llama/Meta-Llama-3.1-405B-Instruct
meta-llama/Meta-Llama-3.1-70B-Instruct
meta-llama/Meta-Llama-3.1-8B-Instruct
microsoft/Phi-3-medium-4k-instruct
mistral-large-2407
mistral-large-latest
mistral-medium
mistral-medium-2312
mistral-medium-latest
mistral-small-2402
mistral-small-2409
mistral-small-latest
mistral-tiny
mistral-tiny-2312
mistral-tiny-2407
mistral-tiny-latest
mistralai/Mistral-Nemo-Instruct-2407
mistralai/Mixtral-8x22B-v0.1
mistralai/Mixtral-8x7B-Instruct-v0.1
nvidia/Nemotron-4-340B-Instruct
open-mistral-7b
open-mistral-nemo
open-mistral-nemo-2407
open-mixtral-8x22b
open-mixtral-8x22b-2404
openchat/openchat-3.6-8b
pixtral
pixtral-12b
pixtral-12b-2409
pixtral-latest

Posting to the Coop

The Coop is a platform for creating, storing and sharing LLM-based research. It is fully integrated with EDSL and accessible from your workspace or Coop account page. Learn more about creating an account and using the Coop.

Here we demonstrate how to post this notebook:

[24]:
from edsl import Notebook
[25]:
n = Notebook(path = "next_token_probs.ipynb")
[26]:
n.push(description = "Example code for calculating next token probabilities", visibility = "public")
[26]:
{'description': 'Example code for calculating next token probabilities',
 'object_type': 'notebook',
 'url': 'https://www.expectedparrot.com/content/8be8de45-006c-484a-b677-8e3bb25f8ff7',
 'uuid': '8be8de45-006c-484a-b677-8e3bb25f8ff7',
 'version': '0.1.33.dev1',
 'visibility': 'public'}

To update an object at the Coop:

[29]:
n = Notebook(path = "next_token_probs.ipynb") # resave
[30]:
n.patch(uuid = "8be8de45-006c-484a-b677-8e3bb25f8ff7", value = n)
[30]:
{'status': 'success'}