Using data with surveys: FileStore

This notebook provides example EDSL code for methods for using data with an EDSL survey. In the steps below we show how to use the FileStore module to upload, share and retrieve data files at the Coop, and then create Scenario objects for the data to use it with a survey.

EDSL is an open-source library for simulating surveys, experiments and other research with AI agents and large language models. Before running the code below, please ensure that you have installed the EDSL library and either activated remote inference from your Coop account or stored API keys for the language models that you want to use with EDSL. Please also see our documentation page for tips and tutorials on getting started using EDSL.

What is a Scenario?

A Scenario is a dictionary of one or more key/value pairs representing data or content to be added to questions; a ScenarioList is a list of Scenario objects. Scenario keys are used as question parameters that get replaced with the values when the scenarios are added to the questions, allowing you to create variants of questions efficiently. Learn more about creating and working with scenarios here and here.

What is the Coop?

Coop is a platform for creating, storing and sharing LLM-based research. It is fully integrated with EDSL, allowing you to post, download and update objects directly from your workspace and at the Coop web app. The Coop also provides access to features for working with EDSL remotely at the Expected Parrot server. Learn more about these features in the remote inference and remote caching sections of the documentation page.

What is FileStore?

FileStore is a module for storing and sharing data files at the Coop to use in EDSL projects, such as survey data, PDFs, CSVs or images. In particular, it is designed for storing files to be used as as scenarios, and allows you to include code for easily retrieving and processing the files in your EDSL project, as we do in the examples below!

Example

In the example below we create scenarios for some data (a table at a Wikipedia page) and inspect them. Then we store the scenarios as a CSV and post it to the Coop using the file store. Then we retrieve the file and recreate the scenarios, and use them in a survey. We also post the survey, results and this notebook to the Coop for reference.

We start by creating importing the tools that we will use:

[1]:
from edsl import ScenarioList, Scenario
from edsl.scenarios.FileStore import CSVFileStore

Creating a scenario list for a Wikipedia table

EDSL comes with many methods for automatically generating scenarios for various data sources, such as PDFs, CSVs, docs, images, lists, dicts, etc. Here we use a method to automatically create a scenario list for a Wikipedia table, passing the URL and the number of the table on the page:

[2]:
s = ScenarioList.from_wikipedia("https://en.wikipedia.org/wiki/List_of_Billboard_Hot_100_number-one_singles_of_the_1980s",5)

We can inspect the scenario list that has been created:

[3]:
s
[3]:

ScenarioList scenarios: 14; keys: ['Song', 'Artist(s)', 'Weeks at number one'];

Weeks at number oneSong Artist(s)
10"Physical" Olivia Newton-John
9"Bette Davis Eyes" Kim Carnes
9"Endless Love" Diana Ross and Lionel Richie
8"Every Breath You Take" The Police
7"I Love Rock 'n' Roll" Joan Jett and the Blackhearts
7"Ebony and Ivory" Paul McCartney and Stevie Wonder
7"Billie Jean" Michael Jackson
6"Call Me" Blondie
6"Lady" Kenny Rogers
6"Centerfold" The J. Geils Band
6"Eye of the Tiger" Survivor
6"Flashdance... What a Feeling"Irene Cara
6"Say, Say, Say" Paul McCartney and Michael Jackson
6"Like a Virgin" Madonna

We can rename the keys for convenience:

[4]:
s.parameters
[4]:
{'Artist(s)', 'Song', 'Weeks at number one'}
[5]:
s = s.rename({'Artist(s)':"artists", 'Song':"song", 'Weeks at number one':"weeks"})
[6]:
s
[6]:

ScenarioList scenarios: 14; keys: ['artists', 'song', 'weeks'];

weekssong artists
10"Physical" Olivia Newton-John
9"Bette Davis Eyes" Kim Carnes
9"Endless Love" Diana Ross and Lionel Richie
8"Every Breath You Take" The Police
7"I Love Rock 'n' Roll" Joan Jett and the Blackhearts
7"Ebony and Ivory" Paul McCartney and Stevie Wonder
7"Billie Jean" Michael Jackson
6"Call Me" Blondie
6"Lady" Kenny Rogers
6"Centerfold" The J. Geils Band
6"Eye of the Tiger" Survivor
6"Flashdance... What a Feeling"Irene Cara
6"Say, Say, Say" Paul McCartney and Michael Jackson
6"Like a Virgin" Madonna

We can save the scenarios to a CSV:

[7]:
s.to_csv("billboard_100_1980s.csv")

Storing data at the Coop using the file store

Here we use the CSV file store to store the file that we just created:

[8]:
fs = CSVFileStore("billboard_100_1980s.csv")

We can post a FileStore object to the Coop by calling the push() method on it. We can optionally pass a description and a visibility setting - public, unlisted (by default) or private:

[9]:
info = fs.push(description = "Wikipedia: List of Billboard Hot 100 number-one singles of the 1980s")

We can print the details of the posted object, including the URL and Coop uuid that we will need to retrieve it later:

[10]:
info
[10]:
{'description': 'Wikipedia: List of Billboard Hot 100 number-one singles of the 1980s',
 'object_type': 'scenario',
 'url': 'https://www.expectedparrot.com/content/5a95a759-c1b9-4db1-81d9-68fc5a360134',
 'uuid': '5a95a759-c1b9-4db1-81d9-68fc5a360134',
 'version': '0.1.39.dev1',
 'visibility': 'unlisted'}

Retrieving a file and recreating scenarios

Here we retrieve the file from the file store and recreate scenarios:

[11]:
uuid = info["uuid"]
uuid
[11]:
'5a95a759-c1b9-4db1-81d9-68fc5a360134'
[12]:
csv_file = CSVFileStore.pull(uuid=uuid)
[13]:
s = ScenarioList.from_csv(csv_file.to_tempfile())
[14]:
s
[14]:

ScenarioList scenarios: 14; keys: ['artists', 'song', 'weeks'];

weekssong artists
10"Physical" Olivia Newton-John
9"Bette Davis Eyes" Kim Carnes
9"Endless Love" Diana Ross and Lionel Richie
8"Every Breath You Take" The Police
7"I Love Rock 'n' Roll" Joan Jett and the Blackhearts
7"Ebony and Ivory" Paul McCartney and Stevie Wonder
7"Billie Jean" Michael Jackson
6"Call Me" Blondie
6"Lady" Kenny Rogers
6"Centerfold" The J. Geils Band
6"Eye of the Tiger" Survivor
6"Flashdance... What a Feeling"Irene Cara
6"Say, Say, Say" Paul McCartney and Michael Jackson
6"Like a Virgin" Madonna

Using scenarios in a survey

We can use the scenarios with a survey by creating placeholders in the questions for the scenario keys, and adding the scenarios to the survey when we run it:

[15]:
from edsl import QuestionFreeText, QuestionMultipleChoice, QuestionCheckBox, QuestionList, Survey

q1 = QuestionFreeText(
    question_name = "topic",
    question_text = "What is the topic of the song {{ song }} by {{ artists }}?"
)

q2 = QuestionMultipleChoice(
    question_name = "sentiment",
    question_text = "What is the sentiment of the song {{ song }} by {{ artists }}?",
    question_options = [
        "Happy",
        "Sad",
        "Angry",
        "Romantic",
        "Nostalgic",
        "Empowering",
        "Melancholic",
        "Hopeful"
    ]
)

q3 = QuestionCheckBox(
    question_name = "themes",
    question_text = "What themes are present in the song {{ song }} by {{ artists }}?",
    question_options = [
        "Love",
        "Loss",
        "Struggle",
        "Celebration",
        "Social issues",
        "Other"
    ]
)

q4 = QuestionList(
    question_name = "other_themes",
    question_text = "What other themes are present?"
)

survey = (
    Survey(questions = [q1, q2, q3, q4])
    .add_targeted_memory(q4, q3)
    .add_stop_rule(q3, "'Other' not in themes")
)

results = survey.by(s).run()

We can filter, sort, select and print any components of the results that are generated. Note that the results include columns for all scenario keys, whether used in question texts or not:

[16]:
results.sort_by("song").select("song", "artists", "topic")
[16]:
scenario.song scenario.artists answer.topic
"Bette Davis Eyes" Kim Carnes The song "Bette Davis Eyes" by Kim Carnes is about a woman who is captivating and alluring, with a mysterious and seductive charm reminiscent of the iconic actress Bette Davis. The lyrics describe how this woman has an intense and mesmerizing presence, with a certain allure that draws people in, much like Bette Davis was known for her distinctive eyes and strong screen presence. The song highlights her ability to enchant and manipulate those around her with her charisma and enigmatic personality.
"Billie Jean" Michael Jackson The song "Billie Jean" by Michael Jackson is about a man who is confronted by a woman named Billie Jean who claims that he is the father of her son. The lyrics describe his denial of her allegations and his insistence that she is lying. The song explores themes of false accusations, the burden of fame, and the impact of rumors on personal life.
"Call Me" Blondie The song "Call Me" by Blondie is about a person expressing their desire for a romantic connection and availability to a potential lover. The lyrics convey a sense of urgency and excitement, inviting the other person to reach out and make contact at any time. The song captures the thrill and anticipation of new love, emphasizing themes of passion and longing.
"Centerfold" The J. Geils Band The song "Centerfold" by The J. Geils Band is about a man who discovers that a former crush from his school days has become a model featured in a men's magazine. The lyrics express his surprise and mixed emotions as he grapples with the contrast between his innocent memories of her and her new, more provocative public image.
"Ebony and Ivory" Paul McCartney and Stevie Wonder The song "Ebony and Ivory" by Paul McCartney and Stevie Wonder addresses themes of racial harmony and unity. It uses the metaphor of piano keys—ebony (black) and ivory (white)—to symbolize how different races can coexist peacefully and complement each other, just as the keys work together to create beautiful music.
"Endless Love" Diana Ross and Lionel Richie The song "Endless Love" by Diana Ross and Lionel Richie is about a deep, romantic love between two people. The lyrics express a commitment to loving each other endlessly and highlight the emotional connection and devotion they share. It's often considered a classic love song and is frequently played at weddings and romantic occasions.
"Every Breath You Take" The Police The song "Every Breath You Take" by The Police is often interpreted as being about obsession and surveillance. While it is frequently perceived as a romantic song, the lyrics actually convey a sense of possessiveness and control, with the narrator closely watching and monitoring every move of the person they are addressing. This underlying theme of obsession contrasts with the song's soothing melody, creating an intriguing dynamic.
"Eye of the Tiger" Survivor The song "Eye of the Tiger" by Survivor is about perseverance, determination, and fighting spirit. It was famously used as the theme song for the movie "Rocky III." The lyrics emphasize staying focused, facing challenges head-on, and having the willpower to overcome obstacles, embodying the mindset of a fighter who is ready to face any opponent.
"Flashdance... What a Feeling"Irene Cara The song "Flashdance... What a Feeling" by Irene Cara is primarily about empowerment, self-expression, and the joy of pursuing one's dreams. It captures the emotions of determination and exhilaration that come with following one's passion and overcoming obstacles. The song was famously featured in the 1983 film "Flashdance," where it underscored themes of ambition and the pursuit of artistic fulfillment.
"I Love Rock 'n' Roll" Joan Jett and the Blackhearts The song "I Love Rock 'n' Roll" by Joan Jett and the Blackhearts is about the excitement and passion for rock and roll music. The lyrics describe a person encountering someone attractive at a jukebox, bonding over their shared love for rock music, and suggesting they spend more time together listening to it. The song captures the rebellious and energetic spirit of rock and roll.
"Lady" Kenny Rogers The song "Lady" by Kenny Rogers is a romantic ballad that expresses deep love and admiration. The lyrics convey a heartfelt message from a man to the woman he loves, emphasizing her importance in his life and his devotion to her. The song highlights themes of love, appreciation, and emotional connection.
"Like a Virgin" Madonna The song "Like a Virgin" by Madonna is primarily about the feeling of renewal and experiencing something as if it were for the first time. It conveys themes of love and emotional rebirth, using the metaphor of virginity to describe the fresh and transformative nature of a new romantic relationship. The song captures the excitement and vulnerability that can come with falling in love again after past experiences.
"Physical" Olivia Newton-John The song "Physical" by Olivia Newton-John, released in 1981, is primarily about physical attraction and desire. The lyrics suggest a playful and flirtatious approach to romance, emphasizing the singer's interest in taking a relationship to a more intimate, physical level. The song's upbeat tempo and catchy melody, along with its suggestive lyrics, contributed to its popularity and the somewhat controversial reception it received at the time of its release.
"Say, Say, Say" Paul McCartney and Michael JacksonThe song "Say, Say, Say" by Paul McCartney and Michael Jackson is about a person pleading with their lover to reciprocate their feelings and to communicate openly. The lyrics express themes of love, longing, and the desire for emotional connection. The song captures the emotional struggle of wanting reassurance and clarity in a romantic relationship.
[17]:
results.sort_by("weeks", reverse=True).select("weeks", "song", "artists", "sentiment", "themes", "other_themes")
[17]:
scenario.weeksscenario.song scenario.artists answer.sentiment answer.themes answer.other_themes
10"Physical" Olivia Newton-John Empowering ['Love', 'Celebration', 'Other']['Empowerment', 'Desire', 'Physical Attraction']
9"Bette Davis Eyes" Kim Carnes Nostalgic ['Love', 'Other'] ['Fame', 'Seduction', 'Mystery']
9"Endless Love" Diana Ross and Lionel Richie Romantic ['Love', 'Celebration']
8"Every Breath You Take" The Police Melancholic ['Love', 'Loss', 'Other'] ['Obsession', 'Possessiveness', 'Surveillance']
7"I Love Rock 'n' Roll" Joan Jett and the Blackhearts Empowering ['Love', 'Celebration', 'Other']['Nostalgia', 'Rebellion', 'Youthful Energy']
7"Ebony and Ivory" Paul McCartney and Stevie Wonder Hopeful ['Love', 'Social issues']
7"Billie Jean" Michael Jackson Melancholic ['Loss', 'Struggle', 'Other'] ['Deception', 'Fame', 'Paranoia', 'Guilt', 'Identity']
6"Call Me" Blondie Empowering ['Love', 'Celebration']
6"Lady" Kenny Rogers Romantic ['Love']
6"Centerfold" The J. Geils Band Nostalgic ['Love', 'Loss', 'Other'] ['Nostalgia', 'Innocence', 'Surprise', 'Adolescence']
6"Eye of the Tiger" Survivor Empowering ['Struggle']
6"Flashdance... What a Feeling"Irene Cara Empowering ['Struggle', 'Celebration']
6"Say, Say, Say" Paul McCartney and Michael JacksonRomantic ['Love', 'Struggle']
6"Like a Virgin" Madonna Romantic ['Love', 'Celebration']

Posting a notebook to the Coop

Here we post the contents of this notebook to the Coop for anyone to access:

[18]:
from edsl import Notebook
[19]:
n = Notebook(path = "scenarios_filestore_example.ipynb")
[20]:
n.push(description = "Example code for using data files for scenarios via file store and Coop", visibility = "public")
[20]:
{'description': 'Example code for using data files for scenarios via file store and Coop',
 'object_type': 'notebook',
 'url': 'https://www.expectedparrot.com/content/c45b97a0-0e29-4d6b-9f9c-28fb58a810c8',
 'uuid': 'c45b97a0-0e29-4d6b-9f9c-28fb58a810c8',
 'version': '0.1.39.dev1',
 'visibility': 'public'}

To update an object:

[21]:
n = Notebook(path = "scenarios_filestore_example.ipynb") # resave
[22]:
n.patch(uuid = "0b0c86b4-7629-428c-8346-03d69a6a76f9", value = n)
[22]:
{'status': 'success'}