Adding metadata to survey results

This notebook provides sample EDSL code for adding metadata to survey results. This can be useful when you are using EDSL to conduct data labeling or similar tasks and want to include information about the data or content that you are using with a survey (e.g., the data source or date), without having to perform post-survey data match up steps.

In EDSL this can be done by including fields for metadata in scenarios that you create for the data/content you are using with a survey. When the scenarios are added to the survey and it is run, columns for the metadata fields are automatically included in the results that are generated.

Example

In the steps below we create and run a simple EDSL survey that uses scenarios to add metadata to the results. The steps consist of:

  • Constructing a survey of questions about some data (mock news stories)

  • Creating a scenario (dictionary) for each news story

  • Adding the scenarios to the survey and running it

  • Inspecting the results

Technical setup

Before running the code below, please ensure that you have installed the EDSL libary and either activated remote inference from your Coop account or stored API keys for the language models that you want to use with EDSL. Please also see our documentation page for tips and tutorials on getting started using EDSL.

Constructing questions

We start by constructing some questions with a {{ placeholder }} for data that we will add to the question texts. EDSL comes with a variety of question types that we can choose from based on the form of the response that we want to get back from the model:

[1]:
from edsl import QuestionFreeText, QuestionMultipleChoice
[2]:
q_reference = QuestionFreeText(
    question_name = "reference",
    question_text = "What is this headline referring to: {{ headline }}",
)

q_section = QuestionMultipleChoice(
    question_name = "section",
    question_text = "Which section of the paper is most likely to include this story: {{ headline }}",
    question_options = [
        "Front page",
        "Health",
        "Politics",
        "Entertainment",
        "Local",
        "Opinion",
        "Sports",
        "Culture",
        "Housing"
    ]
)

Creating a survey

Next we pass the questions to a survey in order to administer them together:

[3]:
from edsl import Survey
[4]:
survey = Survey(questions = [q_reference, q_section])

Parameterizing questions with scenarios

Next we create a ScenarioList with a Scenario consisting of a key/value for each piece of data that we want to add to the questions at the {{ placeholder }}, with additional key/values for metadata that we want to keep with the results that are generated when the survey is run. EDSL comes with a variety of methods for generating scenarios from different data sources (PDFs, CSVs, images, tables, lists, etc.); here we generate scenarios from a dictionary:

[5]:
from edsl import ScenarioList, Scenario
[6]:
data = {
    "headline": [
        "Armistice Signed, War Over: Celebrations Erupt Across City",
        "Spanish Flu Pandemic: Hospitals Overwhelmed as Cases Surge",
        "Women Gain Right to Vote: Historic Amendment Passed",
        "Broadway Theaters Reopen After Flu Shutdown",
        "City Welcomes Returning Soldiers with Parade",
        "Prohibition Debate Heats Up: Public Opinion Divided",
        "New York Yankees Win First Pennant in Franchise History",
        "Subway Expansion Project Approved by City Council",
        "Harlem Renaissance: New Wave of Cultural Expression",
        "Mayor Announces New Housing Initiative for Veterans",
    ],
    "date": [
        "1918-11-11",
        "1918-10-15",
        "1918-06-05",
        "1918-12-01",
        "1918-11-12",
        "1918-07-20",
        "1918-09-30",
        "1918-08-18",
        "1918-04-25",
        "1918-11-20",
    ],
    "author": [
        "John Doe",
        "Jane Smith",
        "Robert Johnson",
        "Mary Lee",
        "James Brown",
        "Patricia Green",
        "William Davis",
        "Barbara Wilson",
        "Charles Miller",
        "Elizabeth Taylor",
    ]
}
[7]:
scenarios = ScenarioList.from_nested_dict(data)

We can inspect the scenarios that have been created:

[8]:
scenarios
[8]:
{
    "scenarios": [
        {
            "headline": "Armistice Signed, War Over: Celebrations Erupt Across City",
            "date": "1918-11-11",
            "author": "John Doe"
        },
        {
            "headline": "Spanish Flu Pandemic: Hospitals Overwhelmed as Cases Surge",
            "date": "1918-10-15",
            "author": "Jane Smith"
        },
        {
            "headline": "Women Gain Right to Vote: Historic Amendment Passed",
            "date": "1918-06-05",
            "author": "Robert Johnson"
        },
        {
            "headline": "Broadway Theaters Reopen After Flu Shutdown",
            "date": "1918-12-01",
            "author": "Mary Lee"
        },
        {
            "headline": "City Welcomes Returning Soldiers with Parade",
            "date": "1918-11-12",
            "author": "James Brown"
        },
        {
            "headline": "Prohibition Debate Heats Up: Public Opinion Divided",
            "date": "1918-07-20",
            "author": "Patricia Green"
        },
        {
            "headline": "New York Yankees Win First Pennant in Franchise History",
            "date": "1918-09-30",
            "author": "William Davis"
        },
        {
            "headline": "Subway Expansion Project Approved by City Council",
            "date": "1918-08-18",
            "author": "Barbara Wilson"
        },
        {
            "headline": "Harlem Renaissance: New Wave of Cultural Expression",
            "date": "1918-04-25",
            "author": "Charles Miller"
        },
        {
            "headline": "Mayor Announces New Housing Initiative for Veterans",
            "date": "1918-11-20",
            "author": "Elizabeth Taylor"
        }
    ]
}

Running a survey

To run the survey, we add the scenarios with the by() method and then call the run() method:

[9]:
results = survey.by(scenarios).run()

This generates a dataset of Results that we can access with built-in methods for analysis. To see a list of all the components of results:

[10]:
# results.columns

For example, we can filter, sort, select and print components of results in a table:

[11]:
(results
 .filter("section in ['Sports', 'Health', 'Politics']")
 .sort_by("section", "date")
 .select("headline", "date", "author", "section", "reference")
 .print(format="rich")
)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ scenario                            scenario    scenario        answer    answer                            ┃
┃ .headline                           .date       .author         .section  .reference                        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Spanish Flu Pandemic: Hospitals     1918-10-15  Jane Smith      Health    The headline "Spanish Flu         │
│ Overwhelmed as Cases Surge                                                Pandemic: Hospitals Overwhelmed   │
│                                                                           as Cases Surge" likely refers to  │
│                                                                           the historical event of the       │
│                                                                           Spanish flu pandemic that         │
│                                                                           occurred in 1918-1919. The        │
│                                                                           Spanish flu was an unusually      │
│                                                                           deadly influenza pandemic caused  │
│                                                                           by the H1N1 influenza A virus. It │
│                                                                           infected about one-third of the   │
│                                                                           world's population and resulted   │
│                                                                           in at least 50 million deaths     │
│                                                                           worldwide. The headline suggests  │
│                                                                           that during this period, the      │
│                                                                           number of cases surged            │
│                                                                           dramatically, overwhelming        │
│                                                                           hospitals and healthcare systems, │
│                                                                           which struggled to cope with the  │
│                                                                           influx of patients.               │
├────────────────────────────────────┼────────────┼────────────────┼──────────┼───────────────────────────────────┤
│ Women Gain Right to Vote: Historic  1918-06-05  Robert Johnson  Politics  This headline refers to the       │
│ Amendment Passed                                                          passage of the 19th Amendment to  │
│                                                                           the United States Constitution,   │
│                                                                           which granted women the right to  │
│                                                                           vote. The amendment was ratified  │
│                                                                           on August 18, 1920, marking a     │
│                                                                           significant milestone in the      │
│                                                                           women's suffrage movement in the  │
│                                                                           United States. This historic      │
│                                                                           event followed decades of         │
│                                                                           activism and advocacy by          │
│                                                                           suffragists who fought for gender │
│                                                                           equality in voting rights.        │
├────────────────────────────────────┼────────────┼────────────────┼──────────┼───────────────────────────────────┤
│ Prohibition Debate Heats Up:        1918-07-20  Patricia Green  Politics  The headline "Prohibition Debate  │
│ Public Opinion Divided                                                    Heats Up: Public Opinion Divided" │
│                                                                           likely refers to a contentious    │
│                                                                           discussion regarding the          │
│                                                                           enforcement or potential repeal   │
│                                                                           of laws prohibiting the           │
│                                                                           manufacture, sale, and            │
│                                                                           transportation of alcoholic       │
│                                                                           beverages. Historically, this     │
│                                                                           would relate to the period in the │
│                                                                           early 20th century in the United  │
│                                                                           States when Prohibition was       │
│                                                                           enacted through the 18th          │
│                                                                           Amendment and later repealed by   │
│                                                                           the 21st Amendment. In a          │
│                                                                           contemporary context, it could    │
│                                                                           also be referring to debates over │
│                                                                           similar restrictions on other     │
│                                                                           substances or activities, where   │
│                                                                           public opinion is sharply split   │
│                                                                           on whether such prohibitions      │
│                                                                           should remain in place or be      │
│                                                                           lifted.                           │
├────────────────────────────────────┼────────────┼────────────────┼──────────┼───────────────────────────────────┤
│ New York Yankees Win First Pennant  1918-09-30  William Davis   Sports    The headline "New York Yankees    │
│ in Franchise History                                                      Win First Pennant in Franchise    │
│                                                                           History" refers to a significant  │
│                                                                           milestone in the history of the   │
│                                                                           New York Yankees baseball team. A │
│                                                                           "pennant" in Major League         │
│                                                                           Baseball (MLB) is awarded to the  │
│                                                                           team that wins their league's     │
│                                                                           championship, either the American │
│                                                                           League (AL) or the National       │
│                                                                           League (NL), thereby earning a    │
│                                                                           spot in the World Series, which   │
│                                                                           is the championship series of     │
│                                                                           MLB.                              │
└────────────────────────────────────┴────────────┴────────────────┴──────────┴───────────────────────────────────┘

Posting to the Coop

The Coop is a platform for creating, storing and sharing LLM-based research. It is fully integrated with EDSL and accessible from your workspace or Coop account page. Learn more about creating an account and using the Coop.

Here we post the scenarios, survey and results from above, and this notebook:

[12]:
scenarios.push(description = "Scenarios for example survey using metadata", visibility = "public")
[12]:
{'description': 'Scenarios for example survey using metadata',
 'object_type': 'scenario_list',
 'url': 'https://www.expectedparrot.com/content/711d3d8d-3e60-4b9b-9b64-9c5c1a5f749d',
 'uuid': '711d3d8d-3e60-4b9b-9b64-9c5c1a5f749d',
 'version': '0.1.33.dev1',
 'visibility': 'public'}
[13]:
survey.push(description = "Example survey using scenarios to add metadata to results", visibility = "public")
[13]:
{'description': 'Example survey using scenarios to add metadata to results',
 'object_type': 'survey',
 'url': 'https://www.expectedparrot.com/content/333395bb-bfe1-4795-a17f-93cc67da88a9',
 'uuid': '333395bb-bfe1-4795-a17f-93cc67da88a9',
 'version': '0.1.33.dev1',
 'visibility': 'public'}
[14]:
results.push(description = "Results for example survey using scenarios to add metadata", visibility = "public")
[14]:
{'description': 'Results for example survey using scenarios to add metadata',
 'object_type': 'results',
 'url': 'https://www.expectedparrot.com/content/5cdf086d-45cb-4bc4-896a-a48f45621919',
 'uuid': '5cdf086d-45cb-4bc4-896a-a48f45621919',
 'version': '0.1.33.dev1',
 'visibility': 'public'}
[15]:
from edsl import Notebook
[16]:
n = Notebook(path = "adding_metadata.ipynb")
[17]:
n.push(description = "Adding metadata to survey results", visibility = "public")
[17]:
{'description': 'Adding metadata to survey results',
 'object_type': 'notebook',
 'url': 'https://www.expectedparrot.com/content/0837130c-5983-482b-ae1b-a6ba2bbef07e',
 'uuid': '0837130c-5983-482b-ae1b-a6ba2bbef07e',
 'version': '0.1.33.dev1',
 'visibility': 'public'}

To update an object at the Coop:

[18]:
n = Notebook(path = "adding_metadata.ipynb")
[19]:
n.patch(uuid = "0837130c-5983-482b-ae1b-a6ba2bbef07e", value = n)
[19]:
{'status': 'success'}