Testing model training data
This notebook provides a template for prompting an AI agent to answer questions from a point in time and testing that knowledge for data leaks.
The code is readily editable. Before using it, ensure that you have followed the steps for installing the EDSL package and storing API keys for the models that you want to use.
See EDSL Docs on getting started for details: https://docs.expectedparrot.com/en/latest/
Create an agent with a dated persona
We start by creating an agent with a dated persona. We do this by passing a dictionary of traits
to an Agent
object. Note that it can be convenient to include both a narrative persona and individual traits to faciltate comparison of responses to questions among agents with different traits (more on built-in methods for analysis below and in the docs):
[1]:
from edsl import Agent
agent = Agent(
traits={
"persona": "Today is June 1, 2019. You are 40 years old and live in New York City.",
"location": "New York City",
"age": 40,
"education": "Master's degree",
"occupation": "Lawyer",
}
)
Create a survey of questions testing data leakage
Next we create some questions testing the agent’s personas and combine them in a survey. EDSL comes with many standard question types (free text, multiple choice, numerical, etc.) that can be selected based on the form of the response that you want.
[2]:
from edsl.questions import QuestionFreeText
q_birth_year = QuestionFreeText(
question_name="birth_year", question_text="When were you born?"
)
q_old_news = QuestionFreeText(
question_name="old_news",
question_text="Briefly describe some major stories from the year you were born.",
)
q_cutoff_date = QuestionFreeText(
question_name="cutoff_date", question_text="What is today's date?"
)
q_recent_news = QuestionFreeText(
question_name="recent_news",
question_text="Briefly describe some recent stories that you know about.",
)
q_future_event = QuestionFreeText(
question_name="future_event", question_text="Describe a major news event of 2020."
)
q_expectations = QuestionFreeText(
question_name="expectations",
question_text="What do you expect the major stories of 2020 to be about?",
)
Next we combine the questions into a survey. Note that when we administer the survey the questions are executed asynchronously by default. We could also add survey rules/logic and question memory if desired. Learn more about survey construction and methods in the EDSL Docs.
[3]:
from edsl import Survey
survey = Survey(
questions=[
q_birth_year,
q_old_news,
q_cutoff_date,
q_recent_news,
q_future_event,
q_expectations,
]
)
Run the survey with language models
Next we combine our questions into a survey and administer them to the agents that we’ve created. We can also specify the language models to use in generating responses. If we do not specify a model, GPT 4 is used by default.
Here we check the current list of available models and select some ones to use with our survey:
[4]:
from edsl import Model
Model.available()
[4]:
[['01-ai/Yi-34B-Chat', 'deep_infra', 0],
['Austism/chronos-hermes-13b-v2', 'deep_infra', 1],
['Gryphe/MythoMax-L2-13b', 'deep_infra', 2],
['HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1', 'deep_infra', 3],
['Phind/Phind-CodeLlama-34B-v2', 'deep_infra', 4],
['bigcode/starcoder2-15b', 'deep_infra', 5],
['claude-3-haiku-20240307', 'anthropic', 6],
['claude-3-opus-20240229', 'anthropic', 7],
['claude-3-sonnet-20240229', 'anthropic', 8],
['codellama/CodeLlama-34b-Instruct-hf', 'deep_infra', 9],
['codellama/CodeLlama-70b-Instruct-hf', 'deep_infra', 10],
['cognitivecomputations/dolphin-2.6-mixtral-8x7b', 'deep_infra', 11],
['databricks/dbrx-instruct', 'deep_infra', 12],
['deepinfra/airoboros-70b', 'deep_infra', 13],
['gemini-pro', 'google', 14],
['google/gemma-1.1-7b-it', 'deep_infra', 15],
['gpt-3.5-turbo', 'openai', 16],
['gpt-3.5-turbo-0125', 'openai', 17],
['gpt-3.5-turbo-0301', 'openai', 18],
['gpt-3.5-turbo-0613', 'openai', 19],
['gpt-3.5-turbo-1106', 'openai', 20],
['gpt-3.5-turbo-16k', 'openai', 21],
['gpt-3.5-turbo-16k-0613', 'openai', 22],
['gpt-3.5-turbo-instruct', 'openai', 23],
['gpt-3.5-turbo-instruct-0914', 'openai', 24],
['gpt-4', 'openai', 25],
['gpt-4-0125-preview', 'openai', 26],
['gpt-4-0613', 'openai', 27],
['gpt-4-1106-preview', 'openai', 28],
['gpt-4-1106-vision-preview', 'openai', 29],
['gpt-4-turbo', 'openai', 30],
['gpt-4-turbo-2024-04-09', 'openai', 31],
['gpt-4-turbo-preview', 'openai', 32],
['gpt-4-vision-preview', 'openai', 33],
['lizpreciatior/lzlv_70b_fp16_hf', 'deep_infra', 34],
['llava-hf/llava-1.5-7b-hf', 'deep_infra', 35],
['meta-llama/Llama-2-13b-chat-hf', 'deep_infra', 36],
['meta-llama/Llama-2-70b-chat-hf', 'deep_infra', 37],
['meta-llama/Llama-2-7b-chat-hf', 'deep_infra', 38],
['meta-llama/Meta-Llama-3-70B-Instruct', 'deep_infra', 39],
['meta-llama/Meta-Llama-3-8B-Instruct', 'deep_infra', 40],
['microsoft/WizardLM-2-7B', 'deep_infra', 41],
['microsoft/WizardLM-2-8x22B', 'deep_infra', 42],
['mistralai/Mistral-7B-Instruct-v0.1', 'deep_infra', 43],
['mistralai/Mistral-7B-Instruct-v0.2', 'deep_infra', 44],
['mistralai/Mixtral-8x22B-Instruct-v0.1', 'deep_infra', 45],
['mistralai/Mixtral-8x22B-v0.1', 'deep_infra', 46],
['mistralai/Mixtral-8x7B-Instruct-v0.1', 'deep_infra', 47],
['openchat/openchat_3.5', 'deep_infra', 48]]
Selecting the models that we will use:
[5]:
models = [Model(m) for m in ["gpt-4-turbo", "gemini-pro"]]
To run the survey we add the agent with the by()
method and then call the run()
method to generate the responses:
[6]:
results = survey.by(agent).by(models).run()
Inspecting responses
Running a survey generates a Results
object with information about the questions, answers, agents, models and prompts that we can access with EDSL’s built-in methods for analyzing results in data tables, dataframes, SQL, JSON, CSV and other formats. We can see a list of these components by calling the columns
method:
[7]:
results.columns
[7]:
['agent.age',
'agent.agent_name',
'agent.education',
'agent.location',
'agent.occupation',
'agent.persona',
'answer.birth_year',
'answer.cutoff_date',
'answer.expectations',
'answer.future_event',
'answer.old_news',
'answer.recent_news',
'iteration.iteration',
'model.frequency_penalty',
'model.logprobs',
'model.maxOutputTokens',
'model.max_tokens',
'model.model',
'model.presence_penalty',
'model.stopSequences',
'model.temperature',
'model.topK',
'model.topP',
'model.top_logprobs',
'model.top_p',
'prompt.birth_year_system_prompt',
'prompt.birth_year_user_prompt',
'prompt.cutoff_date_system_prompt',
'prompt.cutoff_date_user_prompt',
'prompt.expectations_system_prompt',
'prompt.expectations_user_prompt',
'prompt.future_event_system_prompt',
'prompt.future_event_user_prompt',
'prompt.old_news_system_prompt',
'prompt.old_news_user_prompt',
'prompt.recent_news_system_prompt',
'prompt.recent_news_user_prompt',
'question_options.birth_year_question_options',
'question_options.cutoff_date_question_options',
'question_options.expectations_question_options',
'question_options.future_event_question_options',
'question_options.old_news_question_options',
'question_options.recent_news_question_options',
'question_text.birth_year_question_text',
'question_text.cutoff_date_question_text',
'question_text.expectations_question_text',
'question_text.future_event_question_text',
'question_text.old_news_question_text',
'question_text.recent_news_question_text',
'question_type.birth_year_question_type',
'question_type.cutoff_date_question_type',
'question_type.expectations_question_type',
'question_type.future_event_question_type',
'question_type.old_news_question_type',
'question_type.recent_news_question_type',
'raw_model_response.birth_year_raw_model_response',
'raw_model_response.cutoff_date_raw_model_response',
'raw_model_response.expectations_raw_model_response',
'raw_model_response.future_event_raw_model_response',
'raw_model_response.old_news_raw_model_response',
'raw_model_response.recent_news_raw_model_response']
Here we show some basic methods for selecting and printing responses for each model in a table:
[8]:
(
results.select(
"model",
"birth_year",
"old_news",
"cutoff_date",
"recent_news",
"future_event",
"expectations",
).print(format="rich")
)
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ model ┃ answer ┃ answer ┃ answer ┃ answer ┃ answer ┃ answer ┃ ┃ .model ┃ .birth_year ┃ .old_news ┃ .cutoff_date ┃ .recent_news ┃ .future_event ┃ .expectations ┃ ┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ gpt-4-turbo │ I was born in │ I was born in │ Today is June │ As a lawyer │ As of today, │ Predicting │ │ │ 1979. │ 1979. Some │ 1, 2019. │ living in New │ June 1, 2019, │ the major │ │ │ │ major stories │ │ York City, I │ I cannot │ stories of │ │ │ │ from that year │ │ often come │ provide │ 2020 can be │ │ │ │ include the │ │ across │ details about │ challenging, │ │ │ │ Iranian │ │ various news │ major news │ but based on │ │ │ │ Revolution, │ │ stories. │ events of 2020 │ current │ │ │ │ which led to │ │ Recently, │ as they have │ events and │ │ │ │ the fall of │ │ there have │ not yet │ trends, we │ │ │ │ the Shah and │ │ been │ occurred. │ might expect │ │ │ │ the │ │ discussions │ │ several key │ │ │ │ establishment │ │ about the │ │ issues to │ │ │ │ of the Islamic │ │ upcoming │ │ dominate the │ │ │ │ Republic under │ │ local │ │ headlines. │ │ │ │ Ayatollah │ │ elections │ │ Firstly, the │ │ │ │ Khomeini. │ │ with a focus │ │ U.S. │ │ │ │ Another │ │ on issues │ │ presidential │ │ │ │ significant │ │ such as │ │ election will │ │ │ │ event was the │ │ housing, │ │ likely be a │ │ │ │ Three Mile │ │ public │ │ major focal │ │ │ │ Island nuclear │ │ transportati… │ │ point, with │ │ │ │ accident in │ │ and education │ │ intense │ │ │ │ Pennsylvania, │ │ reforms. │ │ political │ │ │ │ which raised │ │ Another story │ │ debates and │ │ │ │ concerns about │ │ that has been │ │ campaigns. │ │ │ │ nuclear safety │ │ prevalent is │ │ Climate │ │ │ │ in the United │ │ the ongoing │ │ change and │ │ │ │ States. │ │ development │ │ environmental │ │ │ │ Additionally, │ │ of new public │ │ policies will │ │ │ │ Margaret │ │ parks and │ │ also continue │ │ │ │ Thatcher │ │ green spaces │ │ to be │ │ │ │ became the │ │ in an effort │ │ critical │ │ │ │ first female │ │ to enhance │ │ topics, as │ │ │ │ Prime Minister │ │ urban living. │ │ extreme │ │ │ │ of the United │ │ Additionally, │ │ weather │ │ │ │ Kingdom, │ │ there have │ │ events and │ │ │ │ marking a │ │ been debates │ │ sustainable │ │ │ │ significant │ │ around the │ │ practices │ │ │ │ political │ │ city's │ │ gain more │ │ │ │ shift in the │ │ policies on │ │ attention. │ │ │ │ country. │ │ sustainabili… │ │ Additionally, │ │ │ │ │ │ and │ │ advancements │ │ │ │ │ │ environmental │ │ in │ │ │ │ │ │ protection, │ │ technology, │ │ │ │ │ │ reflecting a │ │ particularly │ │ │ │ │ │ growing │ │ in AI, │ │ │ │ │ │ concern among │ │ cybersecurit… │ │ │ │ │ │ residents │ │ and │ │ │ │ │ │ about climate │ │ biotechnolog… │ │ │ │ │ │ change and │ │ are likely to │ │ │ │ │ │ its impacts. │ │ be │ │ │ │ │ │ │ │ significant. │ │ │ │ │ │ │ │ Lastly, │ │ │ │ │ │ │ │ global │ │ │ │ │ │ │ │ economic │ │ │ │ │ │ │ │ trends, trade │ │ │ │ │ │ │ │ agreements, │ │ │ │ │ │ │ │ and tensions │ │ │ │ │ │ │ │ might also be │ │ │ │ │ │ │ │ pivotal, │ │ │ │ │ │ │ │ influencing │ │ │ │ │ │ │ │ markets and │ │ │ │ │ │ │ │ international │ │ │ │ │ │ │ │ relations. │ ├─────────────┼────────────────┼────────────────┼────────────────┼───────────────┼────────────────┼───────────────┤ │ gemini-pro │ I was born on │ I was born in │ Today is June │ I'm sorry, │ I'm sorry, but │ As we │ │ │ June 1, 1979. │ 1979, so some │ 1, 2019 │ but I don't │ I cannot │ approach │ │ │ │ major stories │ │ have any │ answer │ 2020, there │ │ │ │ from that year │ │ recent │ questions │ are several │ │ │ │ include: │ │ stories to │ about events │ major stories │ │ │ │ │ │ share. My │ that happened │ that I expect │ │ │ │ - The Iranian │ │ knowledge is │ after my │ to dominate │ │ │ │ hostage crisis │ │ only up to │ knowledge │ the │ │ │ │ ended after │ │ June 1, 2019, │ cutoff date of │ headlines. │ │ │ │ 444 days, with │ │ so I don't │ June 1, 2019. │ │ │ │ │ the hostages │ │ have access │ I recommend │ **1. The 2020 │ │ │ │ being released │ │ to any │ checking a │ US │ │ │ │ on January 20, │ │ information │ more │ Presidential │ │ │ │ 1981. │ │ beyond that │ up-to-date │ Election:** │ │ │ │ - The Soviet │ │ date. │ news source │ This will be │ │ │ │ Union invaded │ │ │ for │ one of the │ │ │ │ Afghanistan on │ │ │ information │ most closely │ │ │ │ December 24, │ │ │ about major │ watched │ │ │ │ 1979, │ │ │ news events of │ elections in │ │ │ │ beginning a │ │ │ 2020. │ recent │ │ │ │ nine-year war. │ │ │ │ history, as │ │ │ │ - Margaret │ │ │ │ it will │ │ │ │ Thatcher │ │ │ │ determine who │ │ │ │ became the │ │ │ │ will lead the │ │ │ │ first female │ │ │ │ United States │ │ │ │ Prime Minister │ │ │ │ for the next │ │ │ │ of the United │ │ │ │ four years. │ │ │ │ Kingdom on May │ │ │ │ The outcome │ │ │ │ 4, 1979. │ │ │ │ of the │ │ │ │ - The Three │ │ │ │ election will │ │ │ │ Mile Island │ │ │ │ have a │ │ │ │ nuclear │ │ │ │ significant │ │ │ │ accident │ │ │ │ impact on │ │ │ │ occurred on │ │ │ │ both domestic │ │ │ │ March 28, │ │ │ │ and foreign │ │ │ │ 1979, leading │ │ │ │ policy. │ │ │ │ to increased │ │ │ │ │ │ │ │ public concern │ │ │ │ **2. The │ │ │ │ about nuclear │ │ │ │ ongoing trade │ │ │ │ power. │ │ │ │ war between │ │ │ │ - The first │ │ │ │ the US and │ │ │ │ Sony Walkman │ │ │ │ China:** This │ │ │ │ was released │ │ │ │ trade war has │ │ │ │ in Japan on │ │ │ │ already had a │ │ │ │ July 1, 1979, │ │ │ │ significant │ │ │ │ revolutionizi… │ │ │ │ impact on the │ │ │ │ the way people │ │ │ │ global │ │ │ │ listened to │ │ │ │ economy, and │ │ │ │ music. │ │ │ │ it is │ │ │ │ - The Voyager │ │ │ │ expected to │ │ │ │ 1 space probe │ │ │ │ continue to │ │ │ │ was launched │ │ │ │ be a major │ │ │ │ on September │ │ │ │ issue in │ │ │ │ 5, 1979, and │ │ │ │ 2020. The │ │ │ │ is now the │ │ │ │ outcome of │ │ │ │ farthest │ │ │ │ the trade war │ │ │ │ man-made │ │ │ │ will have a │ │ │ │ object from │ │ │ │ major impact │ │ │ │ Earth. │ │ │ │ on businesses │ │ │ │ - The first │ │ │ │ and consumers │ │ │ │ test-tube │ │ │ │ around the │ │ │ │ baby, Louise │ │ │ │ world. │ │ │ │ Brown, was │ │ │ │ │ │ │ │ born on July │ │ │ │ **3. The │ │ │ │ 25, 1978, in │ │ │ │ ongoing │ │ │ │ the United │ │ │ │ conflict in │ │ │ │ Kingdom. │ │ │ │ Syria:** This │ │ │ │ - The United │ │ │ │ conflict has │ │ │ │ States and the │ │ │ │ been going on │ │ │ │ Soviet Union │ │ │ │ for over │ │ │ │ signed the │ │ │ │ eight years, │ │ │ │ SALT II treaty │ │ │ │ and it has │ │ │ │ on June 18, │ │ │ │ caused a │ │ │ │ 1979, limiting │ │ │ │ humanitarian │ │ │ │ the number of │ │ │ │ crisis. The │ │ │ │ strategic │ │ │ │ conflict is │ │ │ │ nuclear │ │ │ │ expected to │ │ │ │ weapons each │ │ │ │ continue in │ │ │ │ country could │ │ │ │ 2020, and it │ │ │ │ possess. │ │ │ │ is likely to │ │ │ │ - The Camp │ │ │ │ have a major │ │ │ │ David Accords │ │ │ │ impact on the │ │ │ │ were signed on │ │ │ │ region. │ │ │ │ September 17, │ │ │ │ │ │ │ │ 1978, between │ │ │ │ **4. The │ │ │ │ Israel and │ │ │ │ ongoing │ │ │ │ Egypt, leading │ │ │ │ climate │ │ │ │ to a peace │ │ │ │ crisis:** The │ │ │ │ treaty between │ │ │ │ climate │ │ │ │ the two │ │ │ │ crisis is one │ │ │ │ countries in │ │ │ │ of the most │ │ │ │ 1979. │ │ │ │ pressing │ │ │ │ - The first │ │ │ │ issues facing │ │ │ │ personal │ │ │ │ the world │ │ │ │ computer, the │ │ │ │ today. The │ │ │ │ Apple II, was │ │ │ │ effects of │ │ │ │ released on │ │ │ │ climate │ │ │ │ June 5, 1977, │ │ │ │ change are │ │ │ │ marking the │ │ │ │ already being │ │ │ │ beginning of │ │ │ │ felt around │ │ │ │ the personal │ │ │ │ the world, │ │ │ │ computer │ │ │ │ and they are │ │ │ │ revolution. │ │ │ │ expected to │ │ │ │ │ │ │ │ become more │ │ │ │ │ │ │ │ severe in the │ │ │ │ │ │ │ │ years to │ │ │ │ │ │ │ │ come. The │ │ │ │ │ │ │ │ climate │ │ │ │ │ │ │ │ crisis is │ │ │ │ │ │ │ │ likely to be │ │ │ │ │ │ │ │ a major issue │ │ │ │ │ │ │ │ in 2020, as │ │ │ │ │ │ │ │ governments │ │ │ │ │ │ │ │ and │ │ │ │ │ │ │ │ businesses │ │ │ │ │ │ │ │ around the │ │ │ │ │ │ │ │ world work to │ │ │ │ │ │ │ │ address the │ │ │ │ │ │ │ │ issue. │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ **5. The │ │ │ │ │ │ │ │ ongoing │ │ │ │ │ │ │ │ technological │ │ │ │ │ │ │ │ revolution:** │ │ │ │ │ │ │ │ The │ │ │ │ │ │ │ │ technological │ │ │ │ │ │ │ │ revolution is │ │ │ │ │ │ │ │ changing the │ │ │ │ │ │ │ │ world in many │ │ │ │ │ │ │ │ ways. New │ │ │ │ │ │ │ │ technologies │ │ │ │ │ │ │ │ are being │ │ │ │ │ │ │ │ developed all │ │ │ │ │ │ │ │ the time, and │ │ │ │ │ │ │ │ they are │ │ │ │ │ │ │ │ having a │ │ │ │ │ │ │ │ major impact │ │ │ │ │ │ │ │ on our lives. │ │ │ │ │ │ │ │ The │ │ │ │ │ │ │ │ technological │ │ │ │ │ │ │ │ revolution is │ │ │ │ │ │ │ │ likely to │ │ │ │ │ │ │ │ continue in │ │ │ │ │ │ │ │ 2020, and it │ │ │ │ │ │ │ │ is expected │ │ │ │ │ │ │ │ to have a │ │ │ │ │ │ │ │ major impact │ │ │ │ │ │ │ │ on the way we │ │ │ │ │ │ │ │ live and │ │ │ │ │ │ │ │ work. │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ These are │ │ │ │ │ │ │ │ just a few of │ │ │ │ │ │ │ │ the major │ │ │ │ │ │ │ │ stories that │ │ │ │ │ │ │ │ I expect to │ │ │ │ │ │ │ │ dominate the │ │ │ │ │ │ │ │ headlines in │ │ │ │ │ │ │ │ 2020. The │ │ │ │ │ │ │ │ world is │ │ │ │ │ │ │ │ changing │ │ │ │ │ │ │ │ rapidly, and │ │ │ │ │ │ │ │ these stories │ │ │ │ │ │ │ │ are likely to │ │ │ │ │ │ │ │ have a major │ │ │ │ │ │ │ │ impact on our │ │ │ │ │ │ │ │ lives in the │ │ │ │ │ │ │ │ years to │ │ │ │ │ │ │ │ come. │ └─────────────┴────────────────┴────────────────┴────────────────┴───────────────┴────────────────┴───────────────┘