Creating AI agents for survey respondents

This notebook provides code for creating AI agents based on responses to the CES 2022 Pre-Election Questionnaire using EDSL, an open-source Python library for simulating social science research with large language models. EDSL is developed by Expected Parrot, a start-up building tools for AI-powered research.

Sections

The sections below demonstrate how to:

Import survey data into EDSL
Select questions and responses
Use a language model to suggest questions
Create a panel of AI agents representing respondents
Conduct new surveys with agents

Companion notebooks

This notebook is designed to be used with 2 companion notebooks:

Using EDSL for AI polling demonstrates basic methods for designing AI agents and conducting surveys with them.
Working with CES data in EDSL provides code for working with CES datasets in EDSL.

Reference & contact

Documentation for the EDSL package is available at https://docs/expectedparrot.com. You can also find example code, tutorials and notebooks for a variety of use cases.

Please let us know if you have any questions or encounter issues working with this data:

Discord: https://discord.com/invite/mxAYkjfy9m
Email: info@expectedparrot.com

Technical setup

EDSL is compatible with Python 3.9-3.12.

See instructions on installing the EDSL library and storing API keys for the language models that you want to use. In examples below where no model is specified, EDSL will use GPT 4 by default (an API key for OpenAI is required). We also show how to use different models.

Sample data

We start by inspecting the dataset of responses to the CES Pre-Election Questionnaire:

[1]:

import pandas as pd

df = pd.read_csv("CCES22_Common_OUTPUT_vv_topost.csv", low_memory=False)

[2]:

df.head()

[2]:

	Unnamed: 0	caseid	commonweight	commonpostweight	vvweight	vvweight_post	tookpost	CCEStake	add_confirm	inputzip	...	TS_g2022	TS_p2022	TS_p2022_party	TS_state	TS_partyreg	starttime	endtime	starttime_post	endtime_post	sourcevar
0	1	1983126005	3.649671	3.525008	4.487326	3.979634	2	1	1.0	NaN	...	1.0	1.0	NaN	23.0	NaN	2022-09-29 00:10:59	2022-09-29 00:26:13	2022-11-28 04:03:31	2022-11-28 04:14:36	2
1	2	1983126559	0.780431	0.818539	0.645634	0.640604	2	1	NaN	1331.0	...	6.0	6.0	6.0	20.0	NaN	2022-09-29 00:13:20	2022-09-29 00:26:35	2022-12-07 11:27:52	2022-12-07 11:40:09	2
2	3	1983126197	0.891555	0.774314	0.869904	0.830826	2	1	1.0	NaN	...	3.0	7.0	NaN	39.0	2.0	2022-09-29 00:12:44	2022-09-29 00:27:29	2022-11-20 23:20:50	2022-11-20 23:40:15	2
3	4	1979974411	1.103598	1.207156	1.063535	0.985901	2	1	NaN	6716.0	...	4.0	4.0	NaN	7.0	2.0	2022-09-29 00:10:33	2022-09-29 00:30:55	2022-11-19 13:42:13	2022-11-19 15:27:12	2
4	5	1983130427	0.542923	0.327550	0.559116	0.391543	2	1	2.0	21401.0	...	7.0	7.0	NaN	21.0	NaN	2022-09-29 00:14:06	2022-09-29 00:35:23	2022-12-02 17:21:47	2022-12-02 17:45:00	2

5 rows × 707 columns

We can see at a glance that there are non-question columns that we may want to drop before importing, and subsetting the questions may improve efficiency. We show how to do this in the steps below.

Importing survey data with `Conjure`

The Conjure module allows you to import a dataset of survey responses and automatically reconstruct it as EDSL objects:

Method to_agent_list() generates an AgentList of Agent objects with traits for the survey responses.
Method to_survey() generates a Survey of the Question objects.
Method to_results() generates a Results dataset for the responses, including information about the survey and agents.

Creating a `Conjure` object

We create a Conjure object by passing a file of survey responses. We can optionally pass information about the questions, question types and answer codebook; otherwise, they will be inferred from the responses.

Here we start by creating a dictionary for the complete set of questions in the CES Pre-Election Questionnaire, together with a dictionary of the question types and answer codebook, and verify that the keys match. We also ensure that the caseid is included for reference:

[3]:

questions = {
    "caseid": "caseid",
    "CCEStake": "Do you agree to participate in the study?",
    "add_confirm": "Is the name and address displayed above correct?",
    "inputzip": "So that we can ask you about the news and events in your area, in what zip code do you currently reside?",
    "birthyr": "In what year were you born?",
    "gender4": "What is your gender?",
    "educ": "What is the highest level of education you have completed?",
    "race": "What racial or ethnic group best describes you?",
    "hispanic": "Are you of Spanish, Latino, or Hispanic origin or descent?",
    "multrace": "Please indicate the racial or ethnic groups that best describe you?",
    "comptype": "What type of device are you currently taking this survey on?",
    "votereg": "Are you registered to vote?",
    "votereg_f": "Is $izip the zip code where you are registered to vote?",
    "pid3": "Generally speaking, do you think of yourself as a ...?",
    "pid7": "$pid7text",
    "inputstate": "What is your State of Residence?",
    "region": "In which census region do you live?",
    "CC22_300": "In the past 24 hours have you...",
    "CC22_300a": "Did you watch local news, national news, or both?",
    "CC22_300c": "Did you read a print newspaper, an online newspaper, or both?",
    "CC22_300b": "Which of these networks did you watch?",
    "CC22_300d": "In the past 24 hours, did you do any of the following on social media (such as Facebook, Youtube or Twitter)?",
    "CC19_302": "Would you say that OVER THE PAST YEAR the nation's economy has ...",
    "CC22_303": "OVER THE PAST YEAR, has your household's annual income...?",
    "CC22_304": "OVER THE PAST YEAR, have the prices of everyday goods and services...?",
    "CC22_305": "Over the past year have you...",
    "CC22_307": "Do the police make you feel...?",
    "CC22_309a": "Have you or someone you know been diagnosed with the novel coronavirus (COVID-19) during the past year?",
    "CC22_306": "Which of the following best describes you when it comes to being vaccinated against COVID-19?",
    "CC22_309b": "Do you know anyone who died from the novel coronavirus (COVID-19)?",
    "CC22_309c": "How did your work status change as a result of the coronavirus pandemic?",
    "CC22_309dx": "Suppose that you have an emergency expense that costs $400. Based on your current financial situation, how would you pay for this expense? If you would use more than one method to cover this expense, please select all that apply.",
    "CC22_309e": "Would you say that in general your health is...",
    "CC22_309f": "Would you say that in general your mental health is...",
    "CC22_310a": "Which party has a majority of seats in the U.S. House of Representatives?",
    "CC22_310b": "Which party has a majority of seats in the U.S. Senate?",
    "CC22_310c": "Which party has a majority of seats in the $inputstate.answer State Senate?",
    "CC22_310d": "Which party has a majority of seats in the $LowerChamberName?",
    "CC22_311a": "Please indicate whether you've heard of this person and if so which party he or she is affiliated with: $CurrentGovName",
    "CC22_311b": "Please indicate whether you've heard of this person and if so which party he or she is affiliated with: $CurrentSen1Name",
    "CC22_311c": "Please indicate whether you've heard of this person and if so which party he or she is affiliated with: $CurrentSen2Name",
    "CC22_311d": "Please indicate whether you've heard of this person and if so which party he or she is affiliated with: $CurrentHouseName",
    "CC22_320a": "Do you approve of the way President Biden is doing his job?",
    "CC22_320b": "Do you approve of the way the U.S. Congress is doing its job?",
    "CC22_320c": "Do you approve of the way the U.S. Supreme Court is doing its job?",
    "CC22_320d": "Do you approve of the way the Governor of $inputstate.answer is doing their job?",
    "CC22_320e": "Do you approve of the way $LegName is doing their job?",
    "CC22_320f": "Do you approve of the way $CurrentHouseName is doing their job?",
    "CC22_320g": "Do you approve of the way $CurrentSen1Name is doing their job?",
    "CC22_320h": "Do you approve of the way $CurrentSen2Name is doing their job?",
    "cit1": "Are you a United States citizen?",
    "immstat": "Which of these statements best describes you?",
    "CC22_321": "What do you think the United States should do in response to Russia's invasion of Ukraine?",
    "CC22_327a": "Would you support or oppose the proposal to expand Medicare to a single comprehensive public health care coverage program that would cover all Americans?",
    "CC22_327b": "Would you support or oppose the proposal to allow the government to negotiate with drug companies to get a lower price on prescription drugs that would apply to both Medicare and private insurance?",
    "CC22_327c": "Would you support or oppose the proposal to repeal the entire Affordable Care Act?",
    "CC22_327d": "Would you support or oppose the proposal to allow states to import prescription drugs from other countries?",
    "CC22_330a": "On the issue of gun regulation, do you support or oppose the proposal to prohibit state and local governments from publishing the names and addresses of all gun owners?",
    "CC22_330b": "On the issue of gun regulation, do you support or oppose the proposal to ban assault rifles?",
    "CC22_330c": "On the issue of gun regulation, do you support or oppose the proposal to make it easier for people to obtain concealed-carry permits?",
    "CC22_330d": "On the issue of gun regulation, do you support or oppose the proposal to provide federal funding to encourage states to take guns away from people who already own them but might pose a threat to themselves or others?",
    "CC22_330e": "On the issue of gun regulation, do you support or oppose the proposal to improve background checks to give authorities time to check the juvenile and mental health records of any prospective gun buyer under the age of 21?",
    "CC22_330f": "On the issue of gun regulation, do you support or oppose the proposal to allow teachers and school officials to carry guns in public schools?",
    "CC22_331a": "What do you think the U.S. government should do about immigration? Do you support or oppose the proposal to grant legal status to all illegal immigrants who have held jobs and paid taxes for at least 3 years, and not been convicted of any felony crimes?",
    "CC22_331b": "What do you think the U.S. government should do about immigration? Do you support or oppose the proposal to increase the number of border patrols on the US-Mexican border?",
    "CC22_331c": "What do you think the U.S. government should do about immigration? Do you support or oppose the proposal to reduce legal immigration by 50 percent over the next 10 years by eliminating the visa lottery and ending family-based migration?",
    "CC22_331d": "What do you think the U.S. government should do about immigration? Do you support or oppose the proposal to increase spending on border security by $25 billion, including building a wall between the U.S. and Mexico?",
    "CC22_332a": "On the topic of abortion, do you support or oppose the proposal to always allow a woman to obtain an abortion as a matter of choice?",
    "CC22_332b": "On the topic of abortion, do you support or oppose the proposal to permit abortion only in case of rape, incest or when the woman's life is in danger?",
    "CC22_332c": "On the topic of abortion, do you support or oppose the proposal to prohibit all abortions after the 20th week of pregnancy?",
    "CC22_332d": "On the topic of abortion, do you support or oppose the proposal to allow employers to decline coverage of abortions in insurance plans?",
    "CC22_332e": "On the topic of abortion, do you support or oppose the proposal to prohibit the expenditure of funds authorized or appropriated by federal law for any abortion?",
    "CC22_332f": "On the topic of abortion, do you support or oppose the proposal to make abortions illegal in all circumstances?",
    "CC22_333": "From what you know about global climate change or global warming, which one of the following statements comes closest to your opinion?",
    "CC22_333a": "Do you support or oppose the proposal to give the Environmental Protection Agency power to regulate carbon dioxide emissions?",
    "CC22_333b": "Do you support or oppose the proposal to require that each state use a minimum amount of renewable fuels (wind, solar, and hydroelectric) in the generation of electricity even if electricity prices increase a little?",
    "CC22_333c": "Do you support or oppose the proposal to strengthen the Environmental Protection Agency enforcement of the Clean Air Act and Clean Water Act even if it costs U.S. jobs?",
    "CC22_333d": "Do you support or oppose the proposal to raise the average fuel efficiency for all cars and trucks in the US from 40 miles per gallon to 54.5 miles per gallon by 2025?",
    "CC22_333e": "Do you support or oppose the proposal to increase fossil fuel production in the U.S. and boost exports of U.S. liquefied natural gas?",
    "CC22_334a": "Do you support or oppose the proposal to eliminate mandatory minimum sentences for non-violent drug offenders?",
    "CC22_334b": "Do you support or oppose the proposal to require police officers to wear body cameras that record all of their activities while on duty?",
    "CC22_334c": "Do you support or oppose the proposal to increase the number of police on the street by 10 percent, even if it means fewer funds for other public services?",
    "CC22_334d": "Do you support or oppose the proposal to decrease the number of police on the street by 10 percent, and increase funding for other public services?",
    "CC22_334e": "Do you support or oppose the proposal to ban the use of choke holds by police?",
    "CC22_334f": "Do you support or oppose the proposal to create a national registry of police who have been investigated for or disciplined for misconduct?",
    "CC22_334g": "Do you support or oppose the proposal to end the Department of Defense program that sends surplus military weapons and equipment to police departments?",
    "CC22_334h": 'Do you support or oppose the proposal to allow individuals or their families to sue a police officer for damages if the officer is found to have "recklessly disregarded" the individual\'s rights?',
    "CC22_340a": "How would you rate Yourself?",
    "CC22_340b": "How would you rate $CurrentGovName?",
    "CC22_340c": "How would you rate Joe Biden?",
    "CC22_340d": "How would you rate Donald Trump?",
    "CC22_340e": "How would you rate The Democratic Party?",
    "CC22_340f": "How would you rate The Republican Party?",
    "CC22_340g": "How would you rate The U.S. Supreme Court?",
    "CC22_340h": "How would you rate $CurrentSen1Name?",
    "CC22_340i": "How would you rate $CurrentSen2Name?",
    "CC22_340j": "How would you rate $SenCand1Name?",
    "CC22_340k": "How would you rate $SenCand2Name?",
    "CC22_340l": "How would you rate $HouseCand1Name?",
    "CC22_340m": "How would you rate $HouseCand2Name?",
    "CC22_340n": "How would you rate $CurrentHouseName?",
    "CC22_350a": "Over the past two years, Congress voted on many issues. Do you support the proposal to authorize spending up to $1.9 trillion for COVID relief from March 2021 through September 2021, including extension of unemployment benefits through September 2021, and emergency funding to state and local governments for the fiscal year?",
    "CC22_350b": "Over the past two years, Congress voted on many issues. Do you support the proposal to spend $150 billion a year for 8 years on construction and repair of roads and bridges, rail, public transit, airports, water systems and broadband internet?",
    "CC22_350c": "Over the past two years, Congress voted on many issues. Do you support the proposal to spend $2.2 trillion over the next decade to provide universal prekindergarten, subsidies for child care, expanded financial aid for college, housing support, home and community care for older Americans, and to shift the U.S. economy away from fossil fuels to renewable energy and electric cars?",
    "CC22_350d": "Over the past two years, Congress voted on many issues. Do you support the proposal to prohibit government restrictions on the provision of, and access to, abortion services?",
    "CC22_350e": "Over the past two years, Congress voted on many issues. Do you support the proposal to provide $52 billion in grants for American semiconductor manufacturing and research and a tax credit subsidizing 25% of investments in semiconductor manufacturing?",
    "CC22_350f": "Over the past two years, Congress voted on many issues. Do you support the proposal to prohibit large online platforms from giving preference to their own products on the platform at the expense of competing products from another business?",
    "CC22_350g": "Over the past two years, Congress voted on many issues. Do you support the proposal to appoint Ketanji Brown Jackson to the U.S. Supreme Court?",
    "CC22_350h": "Over the past two years, Congress voted on many issues. Do you support the proposal to spend $369 billion for tax credits to encourage the production of solar panels, wind turbines, and batteries; lowers Affordable Care Act health care premiums; reduces the deficit by $300 billion by allowing Medicare to negotiate the cost of some prescription drugs and making changes to the tax code?",
    "CC22_355a": "For each of the following tell us whether you support or oppose the decision for the United States to re-join the Paris Climate Agreement?",
    "CC22_355b": "For each of the following tell us whether you support or oppose the decision for the United States to re-join the World Health Organization?",
    "CC22_355c": "For each of the following tell us whether you support or oppose the decision to order all federal agencies to buy clean energy, purchase electric vehicles, and make federal buildings energy efficient?",
    "CC22_355d": "For each of the following tell us whether you support or oppose the decision to increase the minimum wage paid to federal contractors to $15 an hour?",
    "CC22_355e": "For each of the following tell us whether you support or oppose the decision to require that all employees at large companies be vaccinated?",
    "CC22_360": "With which party, if any, are you registered?",
    "CC22_361": "How long have you lived at your present address?",
    "urbancity": "How would you describe the place where you live?",
    "presvote20post": "Who did you vote for in the election for President in 2020?",
    "CC22_363": "Do you intend to vote in the 2022 general election on November 8th?",
    "CC22_365_voted": "For which candidate for U.S. Senator did you vote?",
    "CC22_365b_voted": "For which candidate for the special election for U.S. Senate did you vote?",
    "CC22_366_voted": "For which candidate for Governor did you vote?",
    "CC22_367_voted": "For which candidate for U.S. House of Representatives in your area did you vote?",
    "CC22_365": "In the race for U.S. Senator in your state, who do you prefer?",
    "CC22_365a": "Who is your second choice for U.S. Senator?",
    "CC22_365b": "In the special election for U.S. Senator in your state, who do you prefer?",
    "CC22_365c": "Who is your second choice for U.S. Senator?",
    "CC22_366": "In the race for Governor in your state, who do you prefer?",
    "CC22_367": "In the general election for U.S. House of Representatives in your area, who do you prefer?",
    "CC22_367a": "Who is your second choice for U.S. House of Representatives?",
    "ideo5": "In general, how would you describe your own political viewpoint?",
    "employ": "Which of the following best describes your current employment status?",
    "hadjob": "At any time over the past five years, have you had a job?",
    "investor": "Do you personally (or jointly with a spouse), have any money invested in the stock market right now, either in an individual stock or in a mutual fund?",
    "pew_bornagain": 'Would you describe yourself as a "born-again" or evangelical Christian, or not?',
    "pew_religimp": "How important is religion in your life?",
    "pew_churatd": "Aside from weddings and funerals, how often do you attend religious services?",
    "pew_prayer": "People practice their religion in different ways. Outside of attending religious services, how often do you pray?",
    "religpew": "What is your present religion, if any?",
    "religpew_protestant": "To which Protestant church or group do you belong?",
    "Xreligpew_protestant": "Do you belong to any one of these churches or groups?",
    "religpew_baptist": "To which Baptist church do you belong, if any?",
    "religpew_methodist": "To which Methodist church do you belong, if any?",
    "religpew_nondenom": "To which kind of nondenominational or independent church do you belong, if any?",
    "religpew_lutheran": "To which Lutheran church do you belong?",
    "religpew_presby": "To which Presbyterian church do you belong?",
    "religpew_pentecost": "To which Pentecostal church do you belong?",
    "religpew_episcop": "To which Episcopalian church do you belong?",
    "religpew_christian": "To which Christian church do you belong?",
    "religpew_congreg": "To which congregational church do you belong?",
    "religpew_holiness": "To which Holiness church do you belong?",
    "religpew_reformed": "To which Reformed church do you belong?",
    "religpew_advent": "To which Adventist church do you belong?",
    "religpew_catholic": "To which Catholic church do you belong?",
    "religpew_mormon": "To which Mormon church do you belong?",
    "religpew_orthodox": "To which Orthodox church do you belong?",
    "religpew_jewish": "To which Jewish group do you belong?",
    "religpew_muslim": "To which Muslim group do you belong?",
    "religpew_buddhist": "To which Buddhist group do you belong?",
    "religpew_hindu": "With which of the following Hindu groups, if any, do you identify with most closely?",
    "marstat": "What is your marital status?",
    "union": "Are you a member of a labor union?",
    "union_coverage": "Are you covered by a union contract, also known as a collective bargaining agreement?",
    "unionhh": "Other than yourself, is any member of your household a union member?",
    "ccesmodule": "Survey assigned",
    "dualcit": "Are you also a citizen of another country besides the United States?",
    "dualctry": "What country do you hold citizenship with besides the United States?",
    "ownhome": "Do you own your home or pay rent?",
    "newsint": "Some people seem to follow what's going on in government and public affairs most of the time, whether there's an election going on or not. Others aren't that interested. Would you say you follow what's going on in government and public affairs ...",
    "faminc_new": "Thinking back over the last year, what was your family's annual income?",
    "milstat": "We'd like to know whether you or someone in your immediate family is currently serving or has ever served in the U.S. military. Immediate family is defined as your parents, siblings, spouse, and children. Please check all boxes that apply.",
    "child18": "Are you the parent or guardian of any children under the age of 18?",
    "healthins": "Do you currently have health insurance?",
    "healthins2": "When you purchased health insurance did you use a health insurance exchange?",
    "phone": "Thinking about your phone service, do you have ...?",
    "internethome": "What best describes the access you have to the internet at home?",
    "internetwork": "What best describes the access you have to the internet at work (or at school)?",
    "CC22_hisp": "From which country or region do you trace your heritage or ancestry?",
    "CC22_asian": "From which country or region do you trace your heritage or ancestry?",
    "presvote16post": "Who did you vote for in the election for President in 2016?",
    "industry": "$employtext",
    "sexuality": "Which of the following best describes your sexuality?",
    "transgender": "Do you identify as transgender?",
}

[4]:

question_types = {
    "caseid": "numerical",
    "CCEStake": "yes_no",
    "add_confirm": "yes_no",
    "inputzip": "numerical",
    "birthyr": "numerical",
    "gender4": "multiple_choice",
    "educ": "multiple_choice",
    "race": "multiple_choice",
    "hispanic": "yes_no",
    "multrace": "checkbox",
    "comptype": "multiple_choice",
    "votereg": "multiple_choice",
    "votereg_f": "yes_no",
    "pid3": "multiple_choice",
    "pid7": "multiple_choice",
    "inputstate": "multiple_choice",
    "region": "multiple_choice",
    "CC22_300": "checkbox",
    "CC22_300a": "checkbox",
    "CC22_300c": "checkbox",
    "CC22_300b": "checkbox",
    "CC22_300d": "checkbox",
    "CC19_302": "multiple_choice",
    "CC22_303": "multiple_choice",
    "CC22_304": "multiple_choice",
    "CC22_305": "checkbox",
    "CC22_307": "multiple_choice",
    "CC22_309a": "checkbox",
    "CC22_306": "multiple_choice",
    "CC22_309b": "checkbox",
    "CC22_309c": "checkbox",
    "CC22_309dx": "checkbox",
    "CC22_309e": "multiple_choice",
    "CC22_309f": "multiple_choice",
    "CC22_310a": "multiple_choice",
    "CC22_310b": "multiple_choice",
    "CC22_310c": "multiple_choice",
    "CC22_310d": "multiple_choice",
    "CC22_311a": "multiple_choice",
    "CC22_311b": "multiple_choice",
    "CC22_311c": "multiple_choice",
    "CC22_311d": "multiple_choice",
    "CC22_320a": "multiple_choice",
    "CC22_320b": "multiple_choice",
    "CC22_320c": "multiple_choice",
    "CC22_320d": "multiple_choice",
    "CC22_320e": "multiple_choice",
    "CC22_320f": "multiple_choice",
    "CC22_320g": "multiple_choice",
    "CC22_320h": "multiple_choice",
    "cit1": "yes_no",
    "immstat": "multiple_choice",
    "CC22_321": "checkbox",
    "CC22_327a": "multiple_choice",
    "CC22_327b": "multiple_choice",
    "CC22_327c": "multiple_choice",
    "CC22_327d": "multiple_choice",
    "CC22_330a": "multiple_choice",
    "CC22_330b": "multiple_choice",
    "CC22_330c": "multiple_choice",
    "CC22_330d": "multiple_choice",
    "CC22_330e": "multiple_choice",
    "CC22_330f": "multiple_choice",
    "CC22_331a": "multiple_choice",
    "CC22_331b": "multiple_choice",
    "CC22_331c": "multiple_choice",
    "CC22_331d": "multiple_choice",
    "CC22_332a": "multiple_choice",
    "CC22_332b": "multiple_choice",
    "CC22_332c": "multiple_choice",
    "CC22_332d": "multiple_choice",
    "CC22_332e": "multiple_choice",
    "CC22_332f": "multiple_choice",
    "CC22_333": "multiple_choice",
    "CC22_333a": "multiple_choice",
    "CC22_333b": "multiple_choice",
    "CC22_333c": "multiple_choice",
    "CC22_333d": "multiple_choice",
    "CC22_333e": "multiple_choice",
    "CC22_334a": "multiple_choice",
    "CC22_334b": "multiple_choice",
    "CC22_334c": "multiple_choice",
    "CC22_334d": "multiple_choice",
    "CC22_334e": "multiple_choice",
    "CC22_334f": "multiple_choice",
    "CC22_334g": "multiple_choice",
    "CC22_334h": "multiple_choice",
    "CC22_340a": "multiple_choice",
    "CC22_340b": "multiple_choice",
    "CC22_340c": "multiple_choice",
    "CC22_340d": "multiple_choice",
    "CC22_340e": "multiple_choice",
    "CC22_340f": "multiple_choice",
    "CC22_340g": "multiple_choice",
    "CC22_340h": "multiple_choice",
    "CC22_340i": "multiple_choice",
    "CC22_340j": "multiple_choice",
    "CC22_340k": "multiple_choice",
    "CC22_340l": "multiple_choice",
    "CC22_340m": "multiple_choice",
    "CC22_340n": "multiple_choice",
    "CC22_350a": "multiple_choice",
    "CC22_350b": "multiple_choice",
    "CC22_350c": "multiple_choice",
    "CC22_350d": "multiple_choice",
    "CC22_350e": "multiple_choice",
    "CC22_350f": "multiple_choice",
    "CC22_350g": "multiple_choice",
    "CC22_350h": "multiple_choice",
    "CC22_355a": "multiple_choice",
    "CC22_355b": "multiple_choice",
    "CC22_355c": "multiple_choice",
    "CC22_355d": "multiple_choice",
    "CC22_355e": "multiple_choice",
    "CC22_360": "multiple_choice",
    "CC22_361": "multiple_choice",
    "urbancity": "multiple_choice",
    "presvote20post": "multiple_choice",
    "CC22_363": "multiple_choice",
    "CC22_365_voted": "multiple_choice",
    "CC22_365b_voted": "multiple_choice",
    "CC22_366_voted": "multiple_choice",
    "CC22_367_voted": "multiple_choice",
    "CC22_365": "multiple_choice",
    "CC22_365a": "multiple_choice",
    "CC22_365b": "multiple_choice",
    "CC22_365c": "multiple_choice",
    "CC22_366": "multiple_choice",
    "CC22_367": "multiple_choice",
    "CC22_367a": "multiple_choice",
    "ideo5": "multiple_choice",
    "employ": "multiple_choice",
    "hadjob": "yes_no",
    "investor": "yes_no",
    "pew_bornagain": "yes_no",
    "pew_religimp": "multiple_choice",
    "pew_churatd": "multiple_choice",
    "pew_prayer": "multiple_choice",
    "religpew": "multiple_choice",
    "religpew_protestant": "multiple_choice",
    "Xreligpew_protestant": "multiple_choice",
    "religpew_baptist": "multiple_choice",
    "religpew_methodist": "multiple_choice",
    "religpew_nondenom": "multiple_choice",
    "religpew_lutheran": "multiple_choice",
    "religpew_presby": "multiple_choice",
    "religpew_pentecost": "multiple_choice",
    "religpew_episcop": "multiple_choice",
    "religpew_christian": "multiple_choice",
    "religpew_congreg": "multiple_choice",
    "religpew_holiness": "multiple_choice",
    "religpew_reformed": "multiple_choice",
    "religpew_advent": "multiple_choice",
    "religpew_catholic": "multiple_choice",
    "religpew_mormon": "multiple_choice",
    "religpew_orthodox": "multiple_choice",
    "religpew_jewish": "multiple_choice",
    "religpew_muslim": "multiple_choice",
    "religpew_buddhist": "multiple_choice",
    "religpew_hindu": "multiple_choice",
    "marstat": "multiple_choice",
    "union": "multiple_choice",
    "union_coverage": "multiple_choice",
    "unionhh": "multiple_choice",
    "ccesmodule": "multiple_choice",
    "dualcit": "yes_no",
    "dualctry": "free_text",
    "ownhome": "multiple_choice",
    "newsint": "multiple_choice",
    "faminc_new": "multiple_choice",
    "milstat": "checkbox",
    "child18": "yes_no",
    "healthins": "checkbox",
    "healthins2": "yes_no",
    "phone": "multiple_choice",
    "internethome": "multiple_choice",
    "internetwork": "multiple_choice",
    "CC22_hisp": "checkbox",
    "CC22_asian": "checkbox",
    "presvote16post": "multiple_choice",
    "industry": "multiple_choice",
    "sexuality": "multiple_choice",
    "transgender": "multiple_choice",
}

[5]:

codebook = {
    "caseid": {},
    "CCEStake": {1: "Yes", 2: "No"},
    "add_confirm": {1: "Yes", 2: "No"},
    "inputzip": {},
    "birthyr": {},
    "gender4": {1: "Man", 2: "Woman", 3: "Non-binary", 4: "Other"},
    "educ": {
        1: "Did not graduate from high school",
        2: "High school graduate",
        3: "Some college, but no degree (yet)",
        4: "2-year college degree",
        5: "4-year college degree",
        6: "Postgraduate degree (MA, MBA, MD, JD, PhD, etc.)",
    },
    "race": {
        1: "White",
        2: "Black or African-American",
        3: "Hispanic or Latino",
        4: "Asian or Asian-American",
        5: "Native American",
        8: "Middle Eastern",
        6: "Two or more races",
        7: "Other",
    },
    "hispanic": {1: "Yes", 2: "No"},
    "multrace": {
        1: "White",
        2: "Black or African-American",
        3: "Hispanic or Latino",
        4: "Asian or Asian-American",
        5: "Native American",
        8: "Middle Eastern",
        97: "Other",
        98: "Don't know",
        99: "None of these",
    },
    "comptype": {
        1: "I am taking this survey on a smart phone (e.g., iPhone or Android phone)",
        2: "I am taking this survey on a tablet (e.g., iPad)",
        3: "I am taking this survey on a desktop computer or laptop computer",
    },
    "votereg": {1: "Yes", 2: "No", 3: "Don't know"},
    "votereg_f": {1: "Yes", 2: "No"},
    "pid3": {
        1: "Democrat",
        2: "Republican",
        3: "Independent",
        4: "Other",
        5: "Not sure",
    },
    "pid7": {
        1: "Strong Democrat",
        2: "Not very strong Democrat",
        7: "Strong Republican",
        6: "Not very strong Republican",
        3: "The Democratic Party",
        5: "The Republican Party",
        4: "Neither",
        8: "Not sure",
        9: "Don't know",
    },
    "inputstate": {
        1: "Alabama",
        2: "Alaska",
        4: "Arizona",
        5: "Arkansas",
        6: "California",
        8: "Colorado",
        9: "Connecticut",
        10: "Delaware",
        11: "District of Columbia",
        12: "Florida",
        13: "Georgia",
        15: "Hawaii",
        16: "Idaho",
        17: "Illinois",
        18: "Indiana",
        19: "Iowa",
        20: "Kansas",
        21: "Kentucky",
        22: "Louisiana",
        23: "Maine",
        24: "Maryland",
        25: "Massachusetts",
        26: "Michigan",
        27: "Minnesota",
        28: "Mississippi",
        29: "Missouri",
        30: "Montana",
        31: "Nebraska",
        32: "Nevada",
        33: "New Hampshire",
        34: "New Jersey",
        35: "New Mexico",
        36: "New York",
        37: "North Carolina",
        38: "North Dakota",
        39: "Ohio",
        40: "Oklahoma",
        41: "Oregon",
        42: "Pennsylvania",
        44: "Rhode Island",
        45: "South Carolina",
        46: "South Dakota",
        47: "Tennessee",
        48: "Texas",
        49: "Utah",
        50: "Vermont",
        51: "Virginia",
        53: "Washington",
        54: "West Virginia",
        55: "Wisconsin",
        56: "Wyoming",
    },
    "region": {1: "Northeast", 2: "Midwest", 3: "South", 4: "West"},
    "CC22_300": {
        1: "Used social media (such as Facebook or Youtube)",
        2: "Watched TV news",
        3: "Read a newspaper in print or online",
        4: "Listened to a radio news program or talk radio",
        5: "None of these",
    },
    "CC22_300a": {1: "Local Newscast", 2: "National Newscast", 3: "Both"},
    "CC22_300c": {1: "Print", 2: "Online", 3: "Both"},
    "CC22_300b": {
        1: "ABC",
        2: "CBS",
        3: "NBC",
        4: "CNN",
        5: "Fox News",
        6: "MSNBC",
        7: "PBS",
        8: "Other",
    },
    "CC22_300d": {
        1: "Posted a story, photo, video or link about politics",
        2: "Posted a comment about politics",
        3: "Read a story or watched a video about politics",
        4: "Followed a political event",
        5: "Forwarded a story, photo, video or link about politics to friends",
        6: "None of the above",
    },
    "CC19_302": {
        1: "Gotten much better",
        2: "Gotten somewhat better",
        3: "Stayed about the same",
        4: "Gotten somewhat worse",
        5: "Gotten much worse",
        6: "Not sure",
    },
    "CC22_303": {
        1: "Increased a lot",
        2: "Increased somewhat",
        3: "Stayed about the same",
        4: "Decreased somewhat",
        5: "Decreased a lot",
    },
    "CC22_304": {
        1: "Increased a lot",
        2: "Increased somewhat",
        3: "Stayed about the same",
        4: "Decreased somewhat",
        5: "Decreased a lot",
    },
    "CC22_305": {
        1: "Married",
        2: "Lost a job",
        3: "Finished school",
        4: "Retired",
        5: "Divorced",
        6: "Had a child",
        7: "Taken a new job",
        9: "Been a victim of a crime",
        10: "Visited an emergency room",
        11: "Visited a doctor for a regular examination",
        12: "Received a raise at work",
        13: "Had a pay cut at work",
    },
    "CC22_307": {
        1: "Mostly safe",
        2: "Somewhat safe",
        3: "Somewhat unsafe",
        4: "Mostly unsafe",
    },
    "CC22_309a": {
        1: "Yes, I have",
        2: "Yes, a family member",
        3: "Yes, a friend",
        4: "Yes, a co-worker",
        5: "No, I do not know anyone who has been diagnosed",
    },
    "CC22_306": {
        1: "I am fully vaccinated and have received at least one booster shot",
        2: "I am fully vaccinated but have not received a booster shot",
        3: "I am partially vaccinated (I have received the first of two shots for either Pfizer or Moderna)",
        4: "I am not vaccinated at all",
    },
    "CC22_309b": {
        1: "Yes, a family member",
        2: "Yes, a friend",
        3: "Yes, a co-worker",
        4: "No, I do not know anyone who has died from coronavirus",
    },
    "CC22_309c": {
        1: "My hours have been reduced",
        2: "My hours were reduced, but they have been restored",
        3: "I have been temporarily laid off",
        4: "I was temporarily laid off but have now been re-hired",
        5: "I had more than one job before the pandemic, and lost one of them",
        6: "I lost my job",
        7: "I was not working when the pandemic began",
        8: "My hours have increased",
        9: "I have taken additional jobs since the pandemic",
        10: "No, nothing about my work has changed",
    },
    "CC22_309dx": {
        1: "Put it on my credit card and pay it off in full at the next statement",
        2: "Put it on my credit card and pay it off over time",
        3: "With the money currently in my checking/savings account or with cash",
        4: "Using money from a bank loan or line of credit",
        5: "By borrowing from a friend or family member",
        6: "Using a payday loan, deposit advance, or overdraft",
        7: "By selling something",
        8: "I wouldn't be able to pay for the expense right now",
        9: "Other",
    },
    "CC22_309e": {1: "Excellent", 2: "Very good", 3: "Good", 4: "Fair", 5: "Poor"},
    "CC22_309f": {1: "Excellent", 2: "Very good", 3: "Good", 4: "Fair", 5: "Poor"},
    "CC22_310a": {1: "Democrats", 2: "Republicans", 3: "Neither", 4: "Not sure"},
    "CC22_310b": {1: "Democrats", 2: "Republicans", 3: "Neither", 4: "Not sure"},
    "CC22_310c": {1: "Democrats", 2: "Republicans", 3: "Neither", 4: "Not sure"},
    "CC22_310d": {1: "Democrats", 2: "Republicans", 3: "Neither", 4: "Not sure"},
    "CC22_311a": {
        1: "Never heard of person",
        2: "Republican",
        3: "Democrat",
        4: "Other Party / Independent",
        5: "Not sure",
    },
    "CC22_311b": {
        1: "Never heard of person",
        2: "Republican",
        3: "Democrat",
        4: "Other Party / Independent",
        5: "Not sure",
    },
    "CC22_311c": {
        1: "Never heard of person",
        2: "Republican",
        3: "Democrat",
        4: "Other Party / Independent",
        5: "Not sure",
    },
    "CC22_311d": {
        1: "Never heard of person",
        2: "Republican",
        3: "Democrat",
        4: "Other Party / Independent",
        5: "Not sure",
    },
    "CC22_320a": {
        1: "Strongly approve",
        2: "Somewhat approve",
        3: "Somewhat disapprove",
        4: "Strongly disapprove",
        5: "Not sure",
    },
    "CC22_320b": {
        1: "Strongly approve",
        2: "Somewhat approve",
        3: "Somewhat disapprove",
        4: "Strongly disapprove",
        5: "Not sure",
    },
    "CC22_320c": {
        1: "Strongly approve",
        2: "Somewhat approve",
        3: "Somewhat disapprove",
        4: "Strongly disapprove",
        5: "Not sure",
    },
    "CC22_320d": {
        1: "Strongly approve",
        2: "Somewhat approve",
        3: "Somewhat disapprove",
        4: "Strongly disapprove",
        5: "Not sure",
    },
    "CC22_320e": {
        1: "Strongly approve",
        2: "Somewhat approve",
        3: "Somewhat disapprove",
        4: "Strongly disapprove",
        5: "Not sure",
    },
    "CC22_320f": {
        1: "Strongly approve",
        2: "Somewhat approve",
        3: "Somewhat disapprove",
        4: "Strongly disapprove",
        5: "Not sure",
    },
    "CC22_320g": {
        1: "Strongly approve",
        2: "Somewhat approve",
        3: "Somewhat disapprove",
        4: "Strongly disapprove",
        5: "Not sure",
    },
    "CC22_320h": {
        1: "Strongly approve",
        2: "Somewhat approve",
        3: "Somewhat disapprove",
        4: "Strongly disapprove",
        5: "Not sure",
    },
    "cit1": {1: "Yes", 2: "No"},
    "immstat": {
        1: "I am an immigrant to the USA and a naturalized citizen",
        2: "I am an immigrant to the USA but not a citizen",
        3: "I was born in the USA but at least one of my parents is an immigrant",
        4: "My parents and I were born in the USA but at least one of my grandparents was an immigrant",
        5: "My parents, grandparents and I were all born in the USA",
    },
    "CC22_321": {
        1: "Not sure",
        2: "Do not get involved",
        3: "Send food, medicine and other aid to countries affected",
        4: "Provide arms to Ukraine",
        5: "Enforce a no fly zone",
        6: "Use drones and air craft to bomb Russian troops",
        7: "Send military support staff (non-combat)",
        8: "Send significant force to fight Russia",
    },
    "CC22_327a": {1: "Support", 2: "Oppose"},
    "CC22_327b": {1: "Support", 2: "Oppose"},
    "CC22_327c": {1: "Support", 2: "Oppose"},
    "CC22_327d": {1: "Support", 2: "Oppose"},
    "CC22_330a": {1: "Support", 2: "Oppose"},
    "CC22_330b": {1: "Support", 2: "Oppose"},
    "CC22_330c": {1: "Support", 2: "Oppose"},
    "CC22_330d": {1: "Support", 2: "Oppose"},
    "CC22_330e": {1: "Support", 2: "Oppose"},
    "CC22_330f": {1: "Support", 2: "Oppose"},
    "CC22_331a": {1: "Support", 2: "Oppose"},
    "CC22_331b": {1: "Support", 2: "Oppose"},
    "CC22_331c": {1: "Support", 2: "Oppose"},
    "CC22_331d": {1: "Support", 2: "Oppose"},
    "CC22_332a": {1: "Support", 2: "Oppose"},
    "CC22_332b": {1: "Support", 2: "Oppose"},
    "CC22_332c": {1: "Support", 2: "Oppose"},
    "CC22_332d": {1: "Support", 2: "Oppose"},
    "CC22_332e": {1: "Support", 2: "Oppose"},
    "CC22_332f": {1: "Support", 2: "Oppose"},
    "CC22_333": {
        1: "Global climate change has been established as a serious problem, and immediate action is necessary",
        2: "There is enough evidence that climate change is taking place and some action should be taken",
        3: "We don't know enough about global climate change, and more research is necessary before we take any actions",
        4: "Concern about global climate change is exaggerated. No action is necessary",
        5: "Global climate change is not occurring; this is not a real issue",
    },
    "CC22_333a": {1: "Support", 2: "Oppose"},
    "CC22_333b": {1: "Support", 2: "Oppose"},
    "CC22_333c": {1: "Support", 2: "Oppose"},
    "CC22_333d": {1: "Support", 2: "Oppose"},
    "CC22_333e": {1: "Support", 2: "Oppose"},
    "CC22_334a": {1: "Support", 2: "Oppose"},
    "CC22_334b": {1: "Support", 2: "Oppose"},
    "CC22_334c": {1: "Support", 2: "Oppose"},
    "CC22_334d": {1: "Support", 2: "Oppose"},
    "CC22_334e": {1: "Support", 2: "Oppose"},
    "CC22_334f": {1: "Support", 2: "Oppose"},
    "CC22_334g": {1: "Support", 2: "Oppose"},
    "CC22_334h": {1: "Support", 2: "Oppose"},
    "CC22_340a": {
        1: "Very liberal",
        2: "Liberal",
        3: "Somewhat liberal",
        4: "Middle of the road",
        5: "Somewhat conservative",
        6: "Conservative",
        7: "Very conservative",
        8: "Not sure",
    },
    "CC22_340b": {
        1: "Very liberal",
        2: "Liberal",
        3: "Somewhat liberal",
        4: "Middle of the road",
        5: "Somewhat conservative",
        6: "Conservative",
        7: "Very conservative",
        8: "Not sure",
    },
    "CC22_340c": {
        1: "Very liberal",
        2: "Liberal",
        3: "Somewhat liberal",
        4: "Middle of the road",
        5: "Somewhat conservative",
        6: "Conservative",
        7: "Very conservative",
        8: "Not sure",
    },
    "CC22_340d": {
        1: "Very liberal",
        2: "Liberal",
        3: "Somewhat liberal",
        4: "Middle of the road",
        5: "Somewhat conservative",
        6: "Conservative",
        7: "Very conservative",
        8: "Not sure",
    },
    "CC22_340e": {
        1: "Very liberal",
        2: "Liberal",
        3: "Somewhat liberal",
        4: "Middle of the road",
        5: "Somewhat conservative",
        6: "Conservative",
        7: "Very conservative",
        8: "Not sure",
    },
    "CC22_340f": {
        1: "Very liberal",
        2: "Liberal",
        3: "Somewhat liberal",
        4: "Middle of the road",
        5: "Somewhat conservative",
        6: "Conservative",
        7: "Very conservative",
        8: "Not sure",
    },
    "CC22_340g": {
        1: "Very liberal",
        2: "Liberal",
        3: "Somewhat liberal",
        4: "Middle of the road",
        5: "Somewhat conservative",
        6: "Conservative",
        7: "Very conservative",
        8: "Not sure",
    },
    "CC22_340h": {
        1: "Very liberal",
        2: "Liberal",
        3: "Somewhat liberal",
        4: "Middle of the road",
        5: "Somewhat conservative",
        6: "Conservative",
        7: "Very conservative",
        8: "Not sure",
    },
    "CC22_340i": {
        1: "Very liberal",
        2: "Liberal",
        3: "Somewhat liberal",
        4: "Middle of the road",
        5: "Somewhat conservative",
        6: "Conservative",
        7: "Very conservative",
        8: "Not sure",
    },
    "CC22_340j": {
        1: "Very liberal",
        2: "Liberal",
        3: "Somewhat liberal",
        4: "Middle of the road",
        5: "Somewhat conservative",
        6: "Conservative",
        7: "Very conservative",
        8: "Not sure",
    },
    "CC22_340k": {
        1: "Very liberal",
        2: "Liberal",
        3: "Somewhat liberal",
        4: "Middle of the road",
        5: "Somewhat conservative",
        6: "Conservative",
        7: "Very conservative",
        8: "Not sure",
    },
    "CC22_340l": {
        1: "Very liberal",
        2: "Liberal",
        3: "Somewhat liberal",
        4: "Middle of the road",
        5: "Somewhat conservative",
        6: "Conservative",
        7: "Very conservative",
        8: "Not sure",
    },
    "CC22_340m": {
        1: "Very liberal",
        2: "Liberal",
        3: "Somewhat liberal",
        4: "Middle of the road",
        5: "Somewhat conservative",
        6: "Conservative",
        7: "Very conservative",
        8: "Not sure",
    },
    "CC22_340n": {
        1: "Very liberal",
        2: "Liberal",
        3: "Somewhat liberal",
        4: "Middle of the road",
        5: "Somewhat conservative",
        6: "Conservative",
        7: "Very conservative",
        8: "Not sure",
    },
    "CC22_350a": {1: "Favor", 2: "Oppose"},
    "CC22_350b": {1: "Favor", 2: "Oppose"},
    "CC22_350c": {1: "Favor", 2: "Oppose"},
    "CC22_350d": {1: "Favor", 2: "Oppose"},
    "CC22_350e": {1: "Favor", 2: "Oppose"},
    "CC22_350f": {1: "Favor", 2: "Oppose"},
    "CC22_350g": {1: "Favor", 2: "Oppose"},
    "CC22_350h": {1: "Favor", 2: "Oppose"},
    "CC22_355a": {1: "Support", 2: "Oppose"},
    "CC22_355b": {1: "Support", 2: "Oppose"},
    "CC22_355c": {1: "Support", 2: "Oppose"},
    "CC22_355d": {1: "Support", 2: "Oppose"},
    "CC22_355e": {1: "Support", 2: "Oppose"},
    "CC22_360": {
        1: "No Party, Independent, Declined to State",
        2: "Democratic Party",
        3: "Republican Party",
        4: "Other",
    },
    "CC22_361": {
        1: "Less than 1 month",
        2: "2 to 6 months",
        3: "7 to 11 months",
        4: "1 to 2 years",
        5: "3 to 4 years",
        6: "5 or more years",
    },
    "urbancity": {1: "City", 2: "Suburb", 3: "Town", 4: "Rural area", 5: "Other"},
    "presvote20post": {
        1: "Joe Biden",
        2: "Donald Trump",
        3: "Jo Jorgensen",
        4: "Howie Hawkins",
        5: "Other",
        6: "Did not vote for President",
    },
    "CC22_363": {
        1: "Yes, definitely",
        2: "Probably",
        3: "I already voted (early or absentee)",
        4: "I plan to vote before November 8th",
        5: "No",
        6: "Undecided",
    },
    "CC22_365_voted": {
        1: "$SenCand1Name ($SenCand1Party)",
        2: "$SenCand2Name ($SenCand2Party)",
        3: "$SenCand3Name ($SenCand3Party)",
        4: "$SenCand4Name ($SenCand4Party)",
        7: "Other",
        8: "I'm not sure",
        9: "I didn't vote in this election",
    },
    "CC22_365b_voted": {
        1: "$SenCand1Name2 ($SenCand1Party2)",
        2: "$SenCand2Name2 ($SenCand2Party2)",
        7: "Other",
        8: "I'm not sure",
        9: "I didn't vote in this election",
    },
    "CC22_366_voted": {
        1: "$GovCand1Name ($GovCand1Party)",
        2: "$GovCand2Name ($GovCand2Party)",
        3: "$GovCand3Name ($GovCand3Party)",
        7: "Other",
        8: "I'm not sure",
        9: "I didn't vote in this election",
    },
    "CC22_367_voted": {
        1: "$HouseCand1Name ($HouseCand1Party)",
        2: "$HouseCand2Name ($HouseCand2Party)",
        3: "$HouseCand3Name ($HouseCand3Party)",
        4: "$HouseCand4Name ($HouseCand4Party)",
        5: "$HouseCand5Name ($HouseCand5Party)",
        6: "$HouseCand6Name ($HouseCand6Party)",
        7: "$HouseCand7Name ($HouseCand7Party)",
        8: "$HouseCand8Name ($HouseCand8Party)",
        10: "Other",
        98: "I'm not sure",
        99: "I didn't vote in this election",
    },
    "CC22_365": {
        1: "$SenCand1Name ($SenCand1Party)",
        2: "$SenCand2Name ($SenCand2Party)",
        3: "$SenCand3Name ($SenCand3Party)",
        4: "$SenCand4Name ($SenCand4Party)",
        7: "Other",
        8: "I'm not sure",
        9: "No one",
    },
    "CC22_365a": {
        1: "$SenCand1Name ($SenCand1Party)",
        2: "$SenCand2Name ($SenCand2Party)",
        3: "$SenCand3Name ($SenCand3Party)",
        4: "$SenCand4Name ($SenCand4Party)",
        7: "Other",
        8: "I'm not sure",
        9: "No one",
    },
    "CC22_365b": {
        1: "$SenCand1Name2 ($SenCand1Party2)",
        2: "$SenCand2Name2 ($SenCand2Party2)",
        7: "Other",
        8: "I'm not sure",
        9: "No one",
    },
    "CC22_365c": {
        1: "$SenCand1Name2 ($SenCand1Party2)",
        2: "$SenCand2Name2 ($SenCand2Party2)",
        7: "Other",
        8: "I'm not sure",
        9: "No one",
    },
    "CC22_366": {
        1: "$GovCand1Name ($GovCand1Party)",
        2: "$GovCand2Name ($GovCand2Party)",
        3: "$GovCand3Name ($GovCand3Party)",
        7: "Other",
        8: "I'm not sure",
        9: "No one",
    },
    "CC22_367": {
        1: "$HouseCand1Name ($HouseCand1Party)",
        2: "$HouseCand2Name ($HouseCand2Party)",
        3: "$HouseCand3Name ($HouseCand3Party)",
        4: "$HouseCand4Name ($HouseCand4Party)",
        5: "$HouseCand5Name ($HouseCand5Party)",
        6: "$HouseCand6Name ($HouseCand6Party)",
        7: "$HouseCand7Name ($HouseCand7Party)",
        8: "$HouseCand8Name ($HouseCand8Party)",
        10: "Other",
        98: "I'm not sure",
        99: "No one",
    },
    "CC22_367a": {
        1: "$HouseCand1Name ($HouseCand1Party)",
        2: "$HouseCand2Name ($HouseCand2Party)",
        3: "$HouseCand3Name ($HouseCand3Party)",
        4: "$HouseCand4Name ($HouseCand4Party)",
        5: "$HouseCand5Name ($HouseCand5Party)",
        6: "$HouseCand6Name ($HouseCand6Party)",
        7: "$HouseCand7Name ($HouseCand7Party)",
        8: "$HouseCand8Name ($HouseCand8Party)",
        10: "Other",
        98: "I'm not sure",
        99: "No one",
    },
    "ideo5": {
        1: "Very liberal",
        2: "Liberal",
        3: "Moderate",
        4: "Conservative",
        5: "Very conservative",
        6: "Not sure",
    },
    "employ": {
        1: "Working full time now",
        2: "Working part time now",
        3: "Temporarily laid off",
        4: "Unemployed",
        5: "Retired",
        6: "Permanently disabled",
        7: "Taking care of home or family",
        8: "Student",
        9: "Other",
    },
    "hadjob": {1: "Yes", 2: "No"},
    "investor": {1: "Yes", 2: "No"},
    "pew_bornagain": {1: "Yes", 2: "No"},
    "pew_religimp": {
        1: "Very important",
        2: "Somewhat important",
        3: "Not too important",
        4: "Not at all important",
    },
    "pew_churatd": {
        1: "More than once a week",
        2: "Once a week",
        3: "Once or twice a month",
        4: "A few times a year",
        5: "Seldom",
        6: "Never",
        7: "Don't know",
    },
    "pew_prayer": {
        1: "Several times a day",
        2: "Once a day",
        3: "A few times a week",
        4: "Once a week",
        5: "A few times a month",
        6: "Seldom",
        7: "Never",
        8: "Don't know",
    },
    "religpew": {
        1: "Protestant",
        2: "Roman Catholic",
        3: "Mormon",
        4: "Eastern or Greek Orthodox",
        5: "Jewish",
        6: "Muslim",
        7: "Buddhist",
        8: "Hindu",
        9: "Atheist",
        10: "Agnostic",
        11: "Nothing in particular",
        12: "Something else",
    },
    "religpew_protestant": {
        1: "Baptist",
        2: "Methodist",
        3: "Nondenominational or Independent Church",
        4: "Lutheran",
        5: "Presbyterian",
        6: "Pentecostal",
        7: "Episcopalian",
        8: "Church of Christ or Disciples of Christ",
        9: "Congregational or United Church of Christ",
        10: "Holiness",
        11: "Reformed",
        12: "Adventist",
        13: "Jehovah's Witness",
        90: "Something else",
    },
    "Xreligpew_protestant": {
        1: "Baptist",
        2: "Methodist",
        3: "Nondenominational or Independent Church",
        4: "Lutheran",
        5: "Presbyterian",
        6: "Pentecostal",
        7: "Episcopalian",
        8: "Church of Christ or Disciples of Christ",
        9: "Congregational or United Church of Christ",
        10: "Holiness",
        11: "Reformed",
        12: "Adventist",
        13: "Jehovah's Witness",
        90: "None of these",
    },
    "religpew_baptist": {
        1: "Southern Baptist Convention",
        2: "American Baptist Churches in USA",
        3: "National Baptist Convention",
        4: "Progressive Baptist Convention",
        5: "Independent Baptist",
        6: "Baptist General Conference",
        7: "Baptist Missionary Association",
        8: "Conservative Baptist Assoc. of America",
        9: "Free Will Baptist",
        10: "General Association of Regular Baptists",
        90: "Other Baptist",
    },
    "religpew_methodist": {
        1: "United Methodist Church",
        2: "Free Methodist Church",
        3: "African Methodist Episcopal",
        4: "African Methodist Episcopal Zion",
        5: "Christian Methodist Episcopal Church",
        90: "Other Methodist Church",
    },
    "religpew_nondenom": {
        1: "Nondenominational evangelical",
        2: "Nondenominational fundamentalist",
        3: "Nondenominational charismatic",
        4: "Interdenominational",
        5: "Community church",
        90: "Other",
    },
    "religpew_lutheran": {
        1: "Evangelical Lutheran Church in America (ELCA)",
        2: "Lutheran Church, Missouri Synod",
        3: "Lutheran Church, Wisconsin Synod",
        4: "Other Lutheran Church",
    },
    "religpew_presby": {
        1: "Presbyterian Church USA",
        2: "Presbyterian Church in America",
        3: "Associate Reformed Presbyterian",
        4: "Cumberland Presbyterian Church",
        5: "Orthodox Presbyterian",
        6: "Evangelical Presbyterian Church",
        90: "Other Presbyterian Church",
    },
    "religpew_pentecost": {
        1: "Assemblies of God",
        2: "Church of God Cleveland TN",
        3: "Four Square Gospel",
        4: "Pentecostal Church of God",
        5: "Pentecostal Holiness Church",
        6: "Church of God in Christ",
        7: "Church of God of the Apostolic Faith",
        8: "Assembly of Christian Churches",
        9: "Apostolic Christian",
        90: "Other Pentecostal Church",
    },
    "religpew_episcop": {
        1: "Episcopal Church in the USA",
        2: "Anglican Church (Church of England)",
        3: "Anglican Orthodox Church",
        4: "Reformed Episcopal Church",
        90: "Other Episcopalian or Anglican Church",
    },
    "religpew_christian": {
        1: "Church of Christ",
        2: "Disciples of Christ",
        3: "Christian Churches and Churches of Christ",
        90: "Other Christian church",
    },
    "religpew_congreg": {
        1: "United Church of Christ",
        2: "Conservative Congregational Christian",
        3: "National Association of Congregational Christians",
        90: "Other Congregational",
    },
    "religpew_holiness": {
        1: "Church of the Nazarene",
        2: "Wesleyan Church",
        3: "Free Methodist Church",
        4: "Christian and Missionary Alliance",
        5: "Church of God (Anderson, Indiana)",
        6: "Salvation Army, American Rescue workers",
        90: "Other Holiness",
    },
    "religpew_reformed": {
        1: "Reformed Church in America",
        2: "Christian Reformed Church",
        90: "Other Reformed",
    },
    "religpew_advent": {
        1: "Seventh Day Adventist",
        2: "Church of God, General Conference",
        3: "Advent Christian",
        90: "Other Adventist",
    },
    "religpew_catholic": {
        1: "Roman Catholic Church",
        2: "National Polish Catholic Church",
        3: "Greek-rite Catholic",
        4: "Armenian Catholic",
        5: "Old Catholic",
        90: "Other Catholic",
    },
    "religpew_mormon": {
        1: "The Church of Jesus Christ of Latter-day Saints",
        2: "Community of Christ",
        90: "Other Mormon",
    },
    "religpew_orthodox": {
        1: "Greek Orthodox",
        2: "Russian Orthodox",
        3: "Orthodox Church in America",
        4: "Armenian Orthodox",
        5: "Eastern Orthodox",
        6: "Serbian Orthodox",
        90: "Other Orthodox",
    },
    "religpew_jewish": {
        1: "Reform",
        2: "Conservative",
        3: "Orthodox",
        4: "Reconstructionist",
        90: "Other",
    },
    "religpew_muslim": {
        1: "Sunni",
        2: "Shia",
        3: "Nation of Islam (Black Muslim)",
        90: "Other Muslim",
    },
    "religpew_buddhist": {
        1: "Theravada (Vipassana) Buddhism",
        2: "Mahayana (Zen) Buddhism",
        3: "Vajrayana (Tibetan) Buddhism",
        90: "Other Buddhist",
    },
    "religpew_hindu": {
        1: "Vaishnava Hinduism",
        2: "Shaivite Hinduism",
        3: "Shaktism Hinduism",
        90: "Other Hindu",
    },
    "marstat": {
        1: "Married",
        2: "Separated",
        3: "Divorced",
        4: "Widowed",
        5: "Never married",
        6: "Domestic / civil partnership",
    },
    "union": {
        1: "Yes, I am currently a member of a labor union",
        2: "I formerly was a member of a labor union",
        3: "I am not now, nor have I been, a member of a labor union",
    },
    "union_coverage": {1: "Yes", 2: "No", 3: "Not sure"},
    "unionhh": {
        1: "Yes, a member of my household is currently a union member",
        2: "A member of my household was formerly a member of a labor union, but is not now",
        3: "No, no one in my household has ever been a member of a labor union",
        4: "Not sure",
    },
    "ccesmodule": {
        1: "NCC",
        2: "MIA",
        3: "TTU",
        4: "RUT",
        5: "MSU",
        6: "FSU",
        7: "LSU",
        8: "JHU",
        9: "IUA",
        10: "BOS",
        11: "DKU",
        12: "NCW",
        13: "WUS",
        14: "USC",
        15: "ZOU",
        16: "MIC",
        17: "BCJ",
        18: "CPC",
        19: "ASU",
        20: "CAC",
        21: "UWM",
        22: "UCR",
        23: "GTN",
        24: "BTU",
        25: "UTA",
        26: "UTB",
        27: "UVA",
        28: "NCK",
        29: "UTD",
        30: "GWU",
        31: "CUB",
        32: "OSU",
        33: "IOW",
        34: "UCL",
        35: "DKN",
        36: "CLA",
        37: "LBU",
        38: "UND",
        39: "TAM",
        40: "WAS",
        41: "YLS",
        42: "UGA",
        43: "TUF",
        44: "DAR",
        45: "NYU",
        46: "VAN",
        47: "UCM",
        48: "UMA",
        49: "UDE",
        50: "MIZ",
        51: "BYU",
        52: "EMY",
        53: "RCO",
        54: "MSL",
        55: "MCS",
        56: "HUA",
        57: "HUB",
        58: "AMU",
        59: "UMB/CGU",
        60: "DMC / HKS",
    },
    "dualcit": {1: "Yes", 2: "No"},
    "dualctry": {
        124: "Canada",
        826: "United Kingdom",
        4: "Afghanistan",
        248: "Aland Islands",
        8: "Albania",
        12: "Algeria",
        16: "American Samoa",
        20: "Andorra",
        24: "Angola",
        660: "Anguilla",
        10: "Antarctica",
        28: "Antigua and Barbuda",
        32: "Argentina",
        51: "Armenia",
        533: "Aruba",
        36: "Australia",
        40: "Austria",
        31: "Azerbaijan",
        44: "Bahamas",
        48: "Bahrain",
        50: "Bangladesh",
        52: "Barbados",
        112: "Belarus",
        56: "Belgium",
        84: "Belize",
        204: "Benin",
        60: "Bermuda",
        64: "Bhutan",
        68: "Bolivia",
        70: "Bosnia and Herzegovina",
        72: "Botswana",
        74: "Bouvet Island",
        76: "Brazil",
        86: "British Indian Ocean Territory",
        96: "Brunei Darussalam",
        100: "Bulgaria",
        854: "Burkina Faso",
        108: "Burundi",
        116: "Cambodia",
        120: "Cameroon",
        132: "Cape Verde",
        136: "Cayman Islands",
        140: "Central African Republic",
        148: "Chad",
        152: "Chile",
        156: "China",
        162: "Christmas Island",
        166: "Cocos (Keeling) Islands",
        170: "Colombia",
        174: "Comoros",
        178: "Congo",
        180: "Congo, the Democratic Republic of the",
        184: "Cook Islands",
        188: "Costa Rica",
        384: "Cote d'Ivoire",
        191: "Croatia",
        192: "Cuba",
        196: "Cyprus",
        203: "Czech Republic",
        208: "Denmark",
        262: "Djibouti",
        212: "Dominica",
        214: "Dominican Republic",
        218: "Ecuador",
        818: "Egypt",
        222: "El Salvador",
        226: "Equatorial Guinea",
        232: "Eritrea",
        233: "Estonia",
        231: "Ethiopia",
        238: "Falkland Islands (Malvinas)",
        234: "Faroe Islands",
        242: "Fiji",
        246: "Finland",
        250: "France",
        254: "French Guiana",
        258: "French Polynesia",
        260: "French Southern Territories",
        266: "Gabon",
        270: "Gambia",
        268: "Georgia",
        276: "Germany",
        288: "Ghana",
        292: "Gibraltar",
        300: "Greece",
        304: "Greenland",
        308: "Grenada",
        312: "Guadeloupe",
        316: "Guam",
        320: "Guatemala",
        831: "Guernsey",
        324: "Guinea",
        624: "Guinea-Bissau",
        328: "Guyana",
        332: "Haiti",
        334: "Heard Island and McDonald Islands",
        336: "Holy See (Vatican City State)",
        340: "Honduras",
        344: "Hong Kong",
        348: "Hungary",
        352: "Iceland",
        356: "India",
        360: "Indonesia",
        364: "Iran, Islamic Republic of",
        368: "Iraq",
        372: "Ireland",
        833: "Isle of Man",
        376: "Israel",
        380: "Italy",
        388: "Jamaica",
        392: "Japan",
        832: "Jersey",
        400: "Jordan",
        398: "Kazakhstan",
        404: "Kenya",
        296: "Kiribati",
        408: "Korea, Democratic People's Republic of",
        410: "Korea, Republic of",
        414: "Kuwait",
        417: "Kyrgyzstan",
        418: "Lao People's Democratic Republic",
        428: "Latvia",
        422: "Lebanon",
        426: "Lesotho",
        430: "Liberia",
        434: "Libyan Arab Jamahiriya",
        438: "Liechtenstein",
        440: "Lithuania",
        442: "Luxembourg",
        446: "Macao",
        807: "Macedonia, the former Yugoslav Republic of",
        450: "Madagascar",
        454: "Malawi",
        458: "Malaysia",
        462: "Maldives",
        466: "Mali",
        470: "Malta",
        584: "Marshall Islands",
        474: "Martinique",
        478: "Mauritania",
        480: "Mauritius",
        175: "Mayotte",
        484: "Mexico",
        583: "Micronesia, Federated States of",
        498: "Moldova, Republic of",
        492: "Monaco",
        496: "Mongolia",
        500: "Montserrat",
        504: "Morocco",
        508: "Mozambique",
        104: "Myanmar",
        516: "Namibia",
        520: "Nauru",
        524: "Nepal",
        528: "Netherlands",
        530: "Netherlands Antilles",
        540: "New Caledonia",
        554: "New Zealand",
        558: "Nicaragua",
        562: "Niger",
        566: "Nigeria",
        570: "Niue",
        574: "Norfolk Island",
        580: "Northern Mariana Islands",
        578: "Norway",
        512: "Oman",
        586: "Pakistan",
        585: "Palau",
        275: "Palestinian Territory, Occupied",
        591: "Panama",
        598: "Papua New Guinea",
        600: "Paraguay",
        604: "Peru",
        608: "Philippines",
        612: "Pitcairn",
        616: "Poland",
        620: "Portugal",
        630: "Puerto Rico",
        634: "Qatar",
        638: "Reunion",
        642: "Romania",
        643: "Russian Federation",
        646: "Rwanda",
        654: "Saint Helena",
        659: "Saint Kitts and Nevis",
        662: "Saint Lucia",
        666: "Saint Pierre and Miquelon",
        670: "Saint Vincent and the Grenadines",
        882: "Samoa",
        674: "San Marino",
        678: "Sao Tome and Principe",
        682: "Saudi Arabia",
        686: "Senegal",
        891: "Serbia and Montenegro",
        690: "Seychelles",
        694: "Sierra Leone",
        702: "Singapore",
        703: "Slovakia",
        705: "Slovenia",
        90: "Solomon Islands",
        706: "Somalia",
        710: "South Africa",
        239: "South Georgia and the South Sandwich Islands",
        724: "Spain",
        144: "Sri Lanka",
        736: "Sudan",
        740: "Suriname",
        744: "Svalbard and Jan Mayen",
        748: "Swaziland",
        752: "Sweden",
        756: "Switzerland",
        760: "Syrian Arab Republic",
        158: "Taiwan",
        762: "Tajikistan",
        834: "Tanzania, United Republic of",
        764: "Thailand",
        626: "Timor-Leste",
        768: "Togo",
        772: "Tokelau",
        776: "Tonga",
        780: "Trinidad and Tobago",
        788: "Tunisia",
        792: "Turkey",
        795: "Turkmenistan",
        796: "Turks and Caicos Islands",
        798: "Tuvalu",
        800: "Uganda",
        804: "Ukraine",
        784: "United Arab Emirates",
        581: "United States Minor Outlying Islands",
        858: "Uruguay",
        860: "Uzbekistan",
        548: "Vanuatu",
        862: "Venezuela",
        704: "Vietnam",
        92: "Virgin Islands, British",
        850: "Virgin Islands, U.S.",
        876: "Wallis and Futuna",
        732: "Western Sahara",
        887: "Yemen",
        894: "Zambia",
        716: "Zimbabwe",
        9999: "Other",
    },
    "ownhome": {1: "Own", 2: "Rent", 3: "Other"},
    "newsint": {
        1: "Most of the time",
        2: "Some of the time",
        3: "Only now and then",
        4: "Hardly at all",
        7: "Don't know",
    },
    "faminc_new": {
        1: "Less than $10,000",
        2: "$10,000 - $19,999",
        3: "$20,000 - $29,999",
        4: "$30,000 - $39,999",
        5: "$40,000 - $49,999",
        6: "$50,000 - $59,999",
        7: "$60,000 - $69,999",
        8: "$70,000 - $79,999",
        9: "$80,000 - $99,999",
        10: "$100,000 - $119,999",
        11: "$120,000 - $149,999",
        12: "$150,000 - $199,999",
        13: "$200,000 - $249,999",
        14: "$250,000 - $349,999",
        15: "$350,000 - $499,999",
        16: "$500,000 or more",
        97: "Prefer not to say",
    },
    "milstat": {
        1: "I am currently serving in the U.S. military",
        2: "I have immediate family members currently serving in the U.S. military",
        3: "I previously served in the U.S. military but I am no longer active",
        4: "Members of my immediate family have served in the U.S. military but are no longer active",
        5: "Neither myself nor any members of my immediate family have ever served in the U.S. military",
    },
    "child18": {1: "Yes", 2: "No"},
    "healthins": {
        1: "Yes, through my job or a family member's employer",
        2: "Yes, through a government program, such as Medicare or Medicaid",
        3: "Yes, through my school",
        4: "Yes, I purchased my own",
        5: "Not sure",
        6: "No",
    },
    "healthins2": {1: "Yes", 2: "No"},
    "phone": {
        1: "A landline and a cell phone",
        2: "Cell phone only",
        3: "Landline only",
        4: "No landline or cell phone service",
    },
    "internethome": {1: "Cable, DSL, or other broadband", 2: "Dial-up", 3: "None"},
    "internetwork": {1: "Cable, DSL, or other broadband", 2: "Dial-up", 3: "None"},
    "CC22_hisp": {
        1: "No Country in Particular",
        2: "United States",
        3: "Mexico",
        4: "Puerto Rico",
        5: "Cuba",
        6: "Dominican Republic",
        7: "South America",
        8: "Central America",
        9: "Caribbean",
        10: "Spain",
        11: "Other",
        12: "I am not of Latino, Hispanic or Spanish Heritage",
    },
    "CC22_asian": {
        1: "No Country in Particular",
        2: "United States",
        3: "China",
        4: "Japan",
        5: "India",
        6: "Philippines",
        7: "Taiwan",
        8: "Korea",
        9: "Vietnam",
        10: "Pakistan",
        11: "Hmong",
        12: "Cambodia",
        13: "Thailand",
        14: "Other",
        15: "I am not of Asian Heritage",
    },
    "presvote16post": {
        1: "Hillary Clinton",
        2: "Donald Trump",
        3: "Gary Johnson",
        4: "Jill Stein",
        5: "Evan McMullin",
        6: "Other",
        7: "Did not vote for President",
    },
    "industry": {
        1: "Agriculture, forestry, fishing, and hunting",
        2: "Mining",
        3: "Utilities",
        4: "Construction",
        5: "Manufacturing",
        6: "Professional and business services",
        7: "Educational services",
        8: "Health care and social assistance",
        9: "Leisure and hospitality",
        10: "Other services",
        11: "Wholesale trade",
        12: "Retail trade",
        13: "Transportation and warehousing",
        14: "Information",
        15: "Financial activities",
        16: "Federal government",
        17: "State and local government",
    },
    "sexuality": {
        1: "Heterosexual / straight",
        2: "Lesbian / gay woman",
        3: "Gay man",
        4: "Bisexual",
        5: "Other",
        6: "Prefer not to say",
    },
    "transgender": {1: "Yes", 2: "No", 3: "Prefer not to say"},
}

Selecting questions

Here we demonstrate ways of selecting questions that we want to use for the data import: first manually, and then using a language model to suggest a subset of questions.

[6]:

target_questions = {
    "caseid": "caseid",
    "birthyr": "In what year were you born?",
    "gender4": "What is your gender?",
    "educ": "What is the highest level of education you have completed?",
    "race": "What racial or ethnic group best describes you?",
    "votereg": "Are you registered to vote?",
    "pid3": "Generally speaking, do you think of yourself as a ...?",
    "inputstate": "What is your State of Residence?",
}

Formatting data for import

Here we create a method for finalizing the data for import based on the selected questions. Note that we also check to ensure that all the target questions are in the dataset of responses. If any are missing, we also modify formatted data to exclude non-present fields:

[7]:

import pandas as pd
from typing import List, Dict, Union


def format_data(
    filename: str,
    new_filename: str,
    target_questions: Union[
        List[str], Dict[str, str]
    ],  # Allow a list or dict of target questions
    questions: Dict[str, str],
    question_types: Dict[str, str],
    codebook: Dict[str, Dict[int, str]],
):

    # Read in the dataset of responses
    df = pd.read_csv(filename)

    # If target_questions is a dict, use its keys
    if isinstance(target_questions, dict):
        target_questions = list(target_questions.keys())

    # Filter out the target questions that are not present in the responses
    present_questions = [q for q in target_questions if q in df.columns]

    # Create final dictionaries filtered by target questions that are present in the responses
    final_questions = {
        key: value for key, value in questions.items() if key in present_questions
    }
    final_question_types = {
        key: value for key, value in question_types.items() if key in present_questions
    }
    final_codebook = {
        key: value for key, value in codebook.items() if key in present_questions
    }

    # Filter dataframe to keep only the relevant columns
    relevant_columns = list(final_questions.keys())
    df = df[relevant_columns]

    # Save the reduced DataFrame to a new CSV file
    df.to_csv(new_filename, index=False)

    return df, final_questions, final_question_types, final_codebook

Formatting the data based on target questions

[8]:

df, q, t, c = format_data(
    "CCES22_Common_OUTPUT_vv_topost.csv",
    "ces22_target_responses.csv",
    target_questions,
    questions,
    question_types,
    codebook,
)

/var/folders/j0/xq1nxxt51j7_1dgv8s116fmh0000gn/T/ipykernel_36580/3913042285.py:14: DtypeWarning: Columns (362,363,366,367,616,617,620,621) have mixed types. Specify dtype option on import or set low_memory=False.
  df = pd.read_csv(filename)

Inspect the formatted data

[9]:

print("Responses: ", df.head(), "\n")
print("Questions: ", q, "\n")
print("Question types: ", t, "\n")
print("Codebook: ", c, "\n")

Responses:         caseid  birthyr  gender4  educ  race  votereg  pid3  inputstate
0  1983126005     1992        1     6     1        1     1          26
1  1983126559     1957        1     3     1        1     3          25
2  1983126197     1978        2     5     1        1     1          42
3  1979974411     1991        3     6     1        1     4           9
4  1983130427     1991        1     6     1        1     3          24

Questions:  {'caseid': 'caseid', 'birthyr': 'In what year were you born?', 'gender4': 'What is your gender?', 'educ': 'What is the highest level of education you have completed?', 'race': 'What racial or ethnic group best describes you?', 'votereg': 'Are you registered to vote?', 'pid3': 'Generally speaking, do you think of yourself as a ...?', 'inputstate': 'What is your State of Residence?'}

Question types:  {'caseid': 'numerical', 'birthyr': 'numerical', 'gender4': 'multiple_choice', 'educ': 'multiple_choice', 'race': 'multiple_choice', 'votereg': 'multiple_choice', 'pid3': 'multiple_choice', 'inputstate': 'multiple_choice'}

Codebook:  {'caseid': {}, 'birthyr': {}, 'gender4': {1: 'Man', 2: 'Woman', 3: 'Non-binary', 4: 'Other'}, 'educ': {1: 'Did not graduate from high school', 2: 'High school graduate', 3: 'Some college, but no degree (yet)', 4: '2-year college degree', 5: '4-year college degree', 6: 'Postgraduate degree (MA, MBA, MD, JD, PhD, etc.)'}, 'race': {1: 'White', 2: 'Black or African-American', 3: 'Hispanic or Latino', 4: 'Asian or Asian-American', 5: 'Native American', 8: 'Middle Eastern', 6: 'Two or more races', 7: 'Other'}, 'votereg': {1: 'Yes', 2: 'No', 3: "Don't know"}, 'pid3': {1: 'Democrat', 2: 'Republican', 3: 'Independent', 4: 'Other', 5: 'Not sure'}, 'inputstate': {1: 'Alabama', 2: 'Alaska', 4: 'Arizona', 5: 'Arkansas', 6: 'California', 8: 'Colorado', 9: 'Connecticut', 10: 'Delaware', 11: 'District of Columbia', 12: 'Florida', 13: 'Georgia', 15: 'Hawaii', 16: 'Idaho', 17: 'Illinois', 18: 'Indiana', 19: 'Iowa', 20: 'Kansas', 21: 'Kentucky', 22: 'Louisiana', 23: 'Maine', 24: 'Maryland', 25: 'Massachusetts', 26: 'Michigan', 27: 'Minnesota', 28: 'Mississippi', 29: 'Missouri', 30: 'Montana', 31: 'Nebraska', 32: 'Nevada', 33: 'New Hampshire', 34: 'New Jersey', 35: 'New Mexico', 36: 'New York', 37: 'North Carolina', 38: 'North Dakota', 39: 'Ohio', 40: 'Oklahoma', 41: 'Oregon', 42: 'Pennsylvania', 44: 'Rhode Island', 45: 'South Carolina', 46: 'South Dakota', 47: 'Tennessee', 48: 'Texas', 49: 'Utah', 50: 'Vermont', 51: 'Virginia', 53: 'Washington', 54: 'West Virginia', 55: 'Wisconsin', 56: 'Wyoming'}}

Importing formatted data

[10]:

from edsl import Conjure

c = Conjure(
    datafile_name="ces22_target_responses.csv",
    question_names=list(q.keys()),
    question_texts=list(q.values()),
    answer_codebook=c,
    question_types=list(t.values()),
    question_options=[list(options.values()) for options in c.values()],
)

Creating agents

[11]:

agents = c.to_agent_list()

Inspecting some agents that have been created. We can see that they consist of dictionaries for the responses and questions:

[12]:

agents[0:2]

[12]:

[
    {
        "traits": {
            "caseid": 1983126005,
            "birthyr": 1992,
            "gender4": "Man",
            "educ": "Postgraduate degree (MA, MBA, MD, JD, PhD, etc.)",
            "race": "White",
            "votereg": "Yes",
            "pid3": "Democrat",
            "inputstate": "Michigan"
        },
        "codebook": {
            "caseid": "caseid",
            "birthyr": "In what year were you born?",
            "gender4": "What is your gender?",
            "educ": "What is the highest level of education you have completed?",
            "race": "What racial or ethnic group best describes you?",
            "votereg": "Are you registered to vote?",
            "pid3": "Generally speaking, do you think of yourself as a ...?",
            "inputstate": "What is your State of Residence?"
        },
        "edsl_version": "0.1.29.dev6",
        "edsl_class_name": "Agent"
    },
    {
        "traits": {
            "caseid": 1983126559,
            "birthyr": 1957,
            "gender4": "Man",
            "educ": "Some college, but no degree (yet)",
            "race": "White",
            "votereg": "Yes",
            "pid3": "Independent",
            "inputstate": "Massachusetts"
        },
        "codebook": {
            "caseid": "caseid",
            "birthyr": "In what year were you born?",
            "gender4": "What is your gender?",
            "educ": "What is the highest level of education you have completed?",
            "race": "What racial or ethnic group best describes you?",
            "votereg": "Are you registered to vote?",
            "pid3": "Generally speaking, do you think of yourself as a ...?",
            "inputstate": "What is your State of Residence?"
        },
        "edsl_version": "0.1.29.dev6",
        "edsl_class_name": "Agent"
    }
]

Using a language model to suggest questions

Here we demonstrate how to use a language model to suggest an appropriate set of questions. We create a Question, administer it to the default model (GPT 4), and can then use the response as the input to our above method for formatting the survey data based on target questions.

Please see EDSL documentation for details on all available question types and how to select other language models to use in generating responses. In the sections below we show how to administer questions to the agents that we have created.

[13]:

from edsl.questions import QuestionList

q_subset = QuestionList(
    question_name="subset",
    question_text="""
    Consider the following set of survey questions.
    Select a subset of the survey questions likely to provide
    the most relevant and useful information about the respondents
    for estimating answers to other common survey questions.
    Try to select the top 20 questions.
    Questions:
    """
    + ", ".join(questions.values()),
)

[14]:

results_subset = q_subset.run()

[15]:

suggested_target_questions = results_subset.select("subset").first()
suggested_target_questions

[15]:

['In what year were you born?',
 'What is your gender?',
 'What is the highest level of education you have completed?',
 'What racial or ethnic group best describes you?',
 'Are you of Spanish, Latino, or Hispanic origin or descent?',
 'Are you registered to vote?',
 'Generally speaking, do you think of yourself as a ...?',
 'What is your State of Residence?',
 "OVER THE PAST YEAR, has your household's annual income...?",
 'OVER THE PAST YEAR, have the prices of everyday goods and services...?',
 'How did your work status change as a result of the coronavirus pandemic?',
 'Suppose that you have an emergency expense that costs $400. Based on your current financial situation, how would you pay for this expense?',
 'Would you say that in general your health is...',
 'Would you say that in general your mental health is...',
 'Are you a United States citizen?',
 "What do you think the United States should do in response to Russia's invasion of Ukraine?",
 'Which of the following best describes your current employment status?',
 'Do you personally (or jointly with a spouse), have any money invested in the stock market right now, either in an individual stock or in a mutual fund?',
 'How important is religion in your life?',
 'What is your marital status?']

Adding back the caseid field and getting the question names:

[16]:

suggested_target_questions = {
    name: text
    for name, text in questions.items()
    if text in suggested_target_questions + ["caseid"]
}
suggested_target_questions

[16]:

{'caseid': 'caseid',
 'birthyr': 'In what year were you born?',
 'gender4': 'What is your gender?',
 'educ': 'What is the highest level of education you have completed?',
 'race': 'What racial or ethnic group best describes you?',
 'hispanic': 'Are you of Spanish, Latino, or Hispanic origin or descent?',
 'votereg': 'Are you registered to vote?',
 'pid3': 'Generally speaking, do you think of yourself as a ...?',
 'inputstate': 'What is your State of Residence?',
 'CC22_303': "OVER THE PAST YEAR, has your household's annual income...?",
 'CC22_304': 'OVER THE PAST YEAR, have the prices of everyday goods and services...?',
 'CC22_309c': 'How did your work status change as a result of the coronavirus pandemic?',
 'CC22_309e': 'Would you say that in general your health is...',
 'CC22_309f': 'Would you say that in general your mental health is...',
 'cit1': 'Are you a United States citizen?',
 'CC22_321': "What do you think the United States should do in response to Russia's invasion of Ukraine?",
 'employ': 'Which of the following best describes your current employment status?',
 'investor': 'Do you personally (or jointly with a spouse), have any money invested in the stock market right now, either in an individual stock or in a mutual fund?',
 'pew_religimp': 'How important is religion in your life?',
 'marstat': 'What is your marital status?'}

Repeating the import with the suggested target questions:

[17]:

df, q, t, c = format_data(
    "CCES22_Common_OUTPUT_vv_topost.csv",
    "ces22_target_responses.csv",
    suggested_target_questions,
    questions,
    question_types,
    codebook,
)

/var/folders/j0/xq1nxxt51j7_1dgv8s116fmh0000gn/T/ipykernel_36580/3913042285.py:14: DtypeWarning: Columns (362,363,366,367,616,617,620,621) have mixed types. Specify dtype option on import or set low_memory=False.
  df = pd.read_csv(filename)

[18]:

from edsl import Conjure

c = Conjure(
    datafile_name="ces22_target_responses.csv",
    question_names=list(q.keys()),
    question_texts=list(q.values()),
    answer_codebook=c,
    question_types=list(t.values()),
    question_options=[list(options.values()) for options in c.values()],
)

[19]:

agents = c.to_agent_list()

[20]:

agents[0:2]

[20]:

[
    {
        "traits": {
            "caseid": 1983126005,
            "birthyr": 1992,
            "gender4": "Man",
            "educ": "Postgraduate degree (MA, MBA, MD, JD, PhD, etc.)",
            "race": "White",
            "hispanic": "No",
            "votereg": "Yes",
            "pid3": "Democrat",
            "inputstate": "Michigan",
            "CC22_303": "Stayed about the same",
            "CC22_304": "Increased somewhat",
            "CC22_309e": "Good",
            "CC22_309f": "Fair",
            "cit1": "Yes",
            "employ": "Working full time now",
            "investor": "Yes",
            "pew_religimp": "Not at all important",
            "marstat": "Never married"
        },
        "codebook": {
            "caseid": "caseid",
            "birthyr": "In what year were you born?",
            "gender4": "What is your gender?",
            "educ": "What is the highest level of education you have completed?",
            "race": "What racial or ethnic group best describes you?",
            "hispanic": "Are you of Spanish, Latino, or Hispanic origin or descent?",
            "votereg": "Are you registered to vote?",
            "pid3": "Generally speaking, do you think of yourself as a ...?",
            "inputstate": "What is your State of Residence?",
            "CC22_303": "OVER THE PAST YEAR, has your household's annual income...?",
            "CC22_304": "OVER THE PAST YEAR, have the prices of everyday goods and services...?",
            "CC22_309e": "Would you say that in general your health is...",
            "CC22_309f": "Would you say that in general your mental health is...",
            "cit1": "Are you a United States citizen?",
            "employ": "Which of the following best describes your current employment status?",
            "investor": "Do you personally (or jointly with a spouse), have any money invested in the stock market right now, either in an individual stock or in a mutual fund?",
            "pew_religimp": "How important is religion in your life?",
            "marstat": "What is your marital status?"
        },
        "edsl_version": "0.1.29.dev6",
        "edsl_class_name": "Agent"
    },
    {
        "traits": {
            "caseid": 1983126559,
            "birthyr": 1957,
            "gender4": "Man",
            "educ": "Some college, but no degree (yet)",
            "race": "White",
            "hispanic": "No",
            "votereg": "Yes",
            "pid3": "Independent",
            "inputstate": "Massachusetts",
            "CC22_303": "Stayed about the same",
            "CC22_304": "Increased a lot",
            "CC22_309e": "Excellent",
            "CC22_309f": "Excellent",
            "cit1": "Yes",
            "employ": "Working full time now",
            "investor": "Yes",
            "pew_religimp": "Somewhat important",
            "marstat": "Divorced"
        },
        "codebook": {
            "caseid": "caseid",
            "birthyr": "In what year were you born?",
            "gender4": "What is your gender?",
            "educ": "What is the highest level of education you have completed?",
            "race": "What racial or ethnic group best describes you?",
            "hispanic": "Are you of Spanish, Latino, or Hispanic origin or descent?",
            "votereg": "Are you registered to vote?",
            "pid3": "Generally speaking, do you think of yourself as a ...?",
            "inputstate": "What is your State of Residence?",
            "CC22_303": "OVER THE PAST YEAR, has your household's annual income...?",
            "CC22_304": "OVER THE PAST YEAR, have the prices of everyday goods and services...?",
            "CC22_309e": "Would you say that in general your health is...",
            "CC22_309f": "Would you say that in general your mental health is...",
            "cit1": "Are you a United States citizen?",
            "employ": "Which of the following best describes your current employment status?",
            "investor": "Do you personally (or jointly with a spouse), have any money invested in the stock market right now, either in an individual stock or in a mutual fund?",
            "pew_religimp": "How important is religion in your life?",
            "marstat": "What is your marital status?"
        },
        "edsl_version": "0.1.29.dev6",
        "edsl_class_name": "Agent"
    }
]

Creating agent personas

We can optionally create agent personas to use together with (or in lieu of) the full agent traits:

[21]:

from edsl import QuestionFreeText

q_persona = QuestionFreeText(
    question_name="persona",
    question_text="Draft a short bio (~5 sentences) based on your traits.",
)

[22]:

personas = q_persona.by(agents[0:100]).run()

[23]:

(
    personas.select("caseid", "persona").print(
        format="html", iframe=True, iframe_width=1000, iframe_height=500
    )
)

"

Conducting new surveys with agents

We can construct new questions to ask the agents we’ve created, and optionally combine new responses with existing traits. Here we create some Question types, combine them in a Survey and administer them to a set of agents. We also show how to parameterize questions with different inputs using Scenario objects. Learn more about all available questions types.

[24]:

from edsl import (
    QuestionFreeText,
    QuestionMultipleChoice,
    QuestionLinearScale,
    Survey,
    Scenario,
    ScenarioList,
)

[25]:

q0 = QuestionFreeText(
    question_name="considerations",
    question_text="Describe your main considerations in deciding whether to {{ activity }}.",
)

q1 = QuestionMultipleChoice(
    question_name="past_12_months",
    question_text="Over the past 12 months did you ever {{ activity }}?",
    question_options=["Yes", "No", "I do not remember"],
)

q2 = QuestionLinearScale(
    question_name="next_12_months",
    question_text="On a scale from 1 to 5, in the next 12 months how likely are you to {{ activity }}?",
    question_options=[1, 2, 3, 4, 5],
    option_labels={1: "Not at all likely", 5: "Very likely"},
)

survey = Survey(questions=[q0, q1, q2])

activities = ["vote in a local election", "buy a house", "change jobs"]

scenarios = ScenarioList(Scenario({"activity": a}) for a in activities)

[26]:

results = survey.by(scenarios).by(agents[0:100]).run()

[27]:

(
    results.select(
        "caseid", "activity", "considerations", "past_12_months", "next_12_months"
    ).print(
        pretty_labels={
            "agent.caseid": "caseid",
            "scenario.activity": "Activity",
            "answer.considerations": q0.question_text,
            "answer.past_12_months": q1.question_text,
            "answer.next_12_months": q2.question_text,
        },
        format="html",
        iframe=True,
        iframe_width=1200,
        iframe_height=500,
    )
)

"

Learn more

Please see our documentation page for more example code, tutorials and notebooks for a variety of use cases.