Using PDFs in a survey

This notebook provides sample EDSL code demonstrating a method from_pdf() that imports a PDF and automatically creates Scenario objects for the pages to use as parameters of survey questions. This can be helpful when using EDSL to extract qualitative information from a large text efficiently.

EDSL is an open-source library for simulating surveys and experiments with AI agents and large language models. Please see our documentation page for tips and tutorials on getting started.

How it works

EDSL comes with a variety of question types that we can select from based on the desired form of the response (multiple choice, free text, etc.). We can also parameterize questions with textual content in order to ask questions about it. We do this by creating a {{ placeholder }} in a question text, e.g., What are the key themes of this text: {{ text }}, and then creating Scenario objects for the content to be inserted in the placeholder when we run the survey. This allows us to administer multiple versions of a question with different inputs all at once. A common use case for this is performing data labeling tasks designed as questions about one or more pieces of textual data that can be inserted into the survey question texts. Learn more about using scenarios.

Example

For purposes of demonstration we use a PDF copy of the first page of the recent paper Automated Social Science: Language Models as Scientist and Subjects and conduct a survey consisting of several questions about the contents of it:

af35474a05f340488ed35863e4f4ca25

We have stored it at the Coop and can re-import it:

[1]:
from edsl.scenarios.FileStore import PDFFileStore
[2]:
ass_pdf = PDFFileStore.pull('65c1ca0c-35d8-4c57-9186-787522806a1f', expected_parrot_url='https://www.expectedparrot.com')
[3]:
# Code for posting a PDF to Coop file store:
#
# ass_pdf = PDFFileStore("automated_social_scientist.pdf")
# info = ass_pdf.push()
# print(info)

Here we create a survey of questions that we will administer for each page of the PDF. Note that the from_pdf() method requires that the scenario placeholders be {{ text }} (for regular scenario objects, you can use any placeholder word that you like):

[4]:
from edsl import QuestionFreeText, QuestionList, ScenarioList, Survey
[5]:
q_summary = QuestionFreeText(
    question_name="summary",
    question_text="Briefly summarize the abstract of this paper: {{ text }}",
)

q_authors = QuestionList(
    question_name="authors",
    question_text="List the names of all the authors of the following paper: {{ text }}",
)

q_thanks = QuestionList(
    question_name="thanks",
    question_text="List the names of the people thanked in the following paper: {{ text }}",
)

survey = Survey([q_summary, q_authors, q_thanks])

Next we create a ScenarioList for the PDF using the from_pdf() method, which automatically creates a list of Scenario objects for the pages of the PDF which will be inserted in our questions (in our example, this is just the first page of the paper):

[6]:
automated_social_scientist = ScenarioList.from_pdf(ass_pdf.to_tempfile())

Alternative method for importing a file locally:

[7]:
# automated_social_scientist = ScenarioList.from_pdf("automated_social_scientist.pdf")

We can inspect the scenarios:

[8]:
automated_social_scientist[0:2]
[8]:
{
    "scenarios": [
        {
            "filename": "tmptw7anub7.pdf",
            "page": 1,
            "text": "Automated Social Science:\nLanguage Models as Scientist and Subjects\u2217\nBenjamin S. Manning\u2020\nMIT\nKehang Zhu\u2020\nHarvard\nJohn J. Horton\nMIT & NBER\nApril 26, 2024\nAbstract\nWe present an approach for automatically generating and testing, in silico,\nsocial scientific hypotheses. This automation is made possible by recent ad-\nvances in large language models (LLM), but the key feature of the approach\nis the use of structural causal models. Structural causal models provide a lan-\nguage to state hypotheses, a blueprint for constructing LLM-based agents, an\nexperimental design, and a plan for data analysis. The fitted structural causal\nmodel becomes an object available for prediction or the planning of follow-on\nexperiments. We demonstrate the approach with several scenarios: a nego-\ntiation, a bail hearing, a job interview, and an auction. In each case, causal\nrelationships are both proposed and tested by the system, finding evidence\nfor some and not others. We provide evidence that the insights from these\nsimulations of social interactions are not available to the LLM purely through\ndirect elicitation. When given its proposed structural causal model for each\nscenario, the LLM is good at predicting the signs of estimated effects, but\nit cannot reliably predict the magnitudes of those estimates. In the auction\nexperiment, the in silico simulation results closely match the predictions of\nauction theory, but elicited predictions of the clearing prices from the LLM\nare inaccurate. However, the LLM\u2019s predictions are dramatically improved if\nthe model can condition on the fitted structural causal model. In short, the\nLLM knows more than it can (immediately) tell.\n\u2217Thanks to generous support from Drew Houston and his AI for Augmentation and Productivity\nseed grant. Thanks to Jordan Ellenberg, Benjamin Lira Luttges, David Holtz, Bruce Sacerdote,\nPaul R\u00a8ottger, Mohammed Alsobay, Ray Duch, Matt Schwartz, David Autor, and Dean Eckles\nfor their helpful feedback. Author\u2019s contact information, code, and data are currently or will be\navailable at http://www.benjaminmanning.io/.\n\u2020Both authors contributed equally to this work.\n1\narXiv:2404.11794v2  [econ.GN]  25 Apr 2024\n"
        },
        {
            "filename": "tmptw7anub7.pdf",
            "page": 2,
            "text": "1\nIntroduction\nThere is much work on efficiently estimating econometric models of human behavior\nbut comparatively little work on efficiently generating and testing those models to\nestimate. Previously, developing such models and hypotheses to test was exclusively\na human task. This is changing as researchers have begun to explore automated\nhypothesis generation through the use of machine learning.1 But even with novel\nmachine-generated hypotheses, there is still the problem of testing.\nA potential\nsolution is simulation. Researchers have shown that Large Language Models (LLM)\ncan simulate humans as experimental subjects with surprising degrees of realism.2\nTo the extent that these simulation results carry over to human subjects in out-of-\nsample tasks, they provide another option for testing (Horton, 2023). In this paper,\nwe combine these ideas\u2014automated hypothesis generation and automated in silico\nhypothesis testing\u2014by using LLMs for both purposes. We demonstrate that such\nautomation is possible. We evaluate the approach by comparing results to a setting\nwhere the real-world predictions are well known and test to see if an LLM can be\nused to generate information that it cannot access through direct elicitation.\nThe key innovation in our approach is the use of structural causal models to orga-\nnize the research process. Structural causal models are mathematical representations\nof cause and effect (Pearl, 2009b; Wright, 1934) and have long offered a language\nfor expressing hypotheses.3 What is novel in our paper is the use of these models\nas a blueprint for the design of agents and experiments. In short, each explanatory\nvariable describes something about a person or scenario that has to vary for the effect\nto be identified, so the system \u201cknows\u201d it needs to generate agents or scenarios that\n1A few examples include generative adversarial networks to formulate new hypotheses (Ludwig\nand Mullainathan, 2023), algorithms to find anomalies in formal theories (Mullainathan and Ram-\nbachan, 2023), reinforcement learning to propose tax policies (Zheng et al., 2022), random forests\nto identify heterogenous treatment effects (Wager and Athey, 2018), and several others (Buyalskaya\net al., 2023; Cai et al., 2023; Enke and Shubatt, 2023; Girotra et al., 2023; Peterson et al., 2021).\n2(Aher et al., 2023; Argyle et al., 2023; Bakker et al., 2022; Binz and Schulz, 2023b; Brand et\nal., 2023; Bubeck et al., 2023; Fish et al., 2023; Mei et al., 2024; Park et al., 2023)\n3In an unfortunate clash of naming conventions, some disciplines have alternative definitions\nfor the term \u201cstructural\u201d when discussing formal models. Here, structural does not refer to the\ndefinition traditionally used in economics. See Appendix B for a more detailed explanation.\n2\n"
        }
    ]
}

We can select pages to use if we do not want to use all of them – e.g., here we filter just the first page to use with our survey:

[9]:
automated_social_scientist = automated_social_scientist.filter("page == 1")
automated_social_scientist
[9]:
{
    "scenarios": [
        {
            "filename": "tmptw7anub7.pdf",
            "page": 1,
            "text": "Automated Social Science:\nLanguage Models as Scientist and Subjects\u2217\nBenjamin S. Manning\u2020\nMIT\nKehang Zhu\u2020\nHarvard\nJohn J. Horton\nMIT & NBER\nApril 26, 2024\nAbstract\nWe present an approach for automatically generating and testing, in silico,\nsocial scientific hypotheses. This automation is made possible by recent ad-\nvances in large language models (LLM), but the key feature of the approach\nis the use of structural causal models. Structural causal models provide a lan-\nguage to state hypotheses, a blueprint for constructing LLM-based agents, an\nexperimental design, and a plan for data analysis. The fitted structural causal\nmodel becomes an object available for prediction or the planning of follow-on\nexperiments. We demonstrate the approach with several scenarios: a nego-\ntiation, a bail hearing, a job interview, and an auction. In each case, causal\nrelationships are both proposed and tested by the system, finding evidence\nfor some and not others. We provide evidence that the insights from these\nsimulations of social interactions are not available to the LLM purely through\ndirect elicitation. When given its proposed structural causal model for each\nscenario, the LLM is good at predicting the signs of estimated effects, but\nit cannot reliably predict the magnitudes of those estimates. In the auction\nexperiment, the in silico simulation results closely match the predictions of\nauction theory, but elicited predictions of the clearing prices from the LLM\nare inaccurate. However, the LLM\u2019s predictions are dramatically improved if\nthe model can condition on the fitted structural causal model. In short, the\nLLM knows more than it can (immediately) tell.\n\u2217Thanks to generous support from Drew Houston and his AI for Augmentation and Productivity\nseed grant. Thanks to Jordan Ellenberg, Benjamin Lira Luttges, David Holtz, Bruce Sacerdote,\nPaul R\u00a8ottger, Mohammed Alsobay, Ray Duch, Matt Schwartz, David Autor, and Dean Eckles\nfor their helpful feedback. Author\u2019s contact information, code, and data are currently or will be\navailable at http://www.benjaminmanning.io/.\n\u2020Both authors contributed equally to this work.\n1\narXiv:2404.11794v2  [econ.GN]  25 Apr 2024\n"
        }
    ]
}

Now we can add the list of scenarios to to the survey and run it:

[10]:
results = survey.by(automated_social_scientist).run()

We can see a list of all the components of results that are directly accessible:

[11]:
results.columns
[11]:
['agent.agent_instruction',
 'agent.agent_name',
 'answer.authors',
 'answer.summary',
 'answer.thanks',
 'comment.authors_comment',
 'comment.summary_comment',
 'comment.thanks_comment',
 'generated_tokens.authors_generated_tokens',
 'generated_tokens.summary_generated_tokens',
 'generated_tokens.thanks_generated_tokens',
 'iteration.iteration',
 'model.frequency_penalty',
 'model.logprobs',
 'model.max_tokens',
 'model.model',
 'model.presence_penalty',
 'model.temperature',
 'model.top_logprobs',
 'model.top_p',
 'prompt.authors_system_prompt',
 'prompt.authors_user_prompt',
 'prompt.summary_system_prompt',
 'prompt.summary_user_prompt',
 'prompt.thanks_system_prompt',
 'prompt.thanks_user_prompt',
 'question_options.authors_question_options',
 'question_options.summary_question_options',
 'question_options.thanks_question_options',
 'question_text.authors_question_text',
 'question_text.summary_question_text',
 'question_text.thanks_question_text',
 'question_type.authors_question_type',
 'question_type.summary_question_type',
 'question_type.thanks_question_type',
 'raw_model_response.authors_cost',
 'raw_model_response.authors_one_usd_buys',
 'raw_model_response.authors_raw_model_response',
 'raw_model_response.summary_cost',
 'raw_model_response.summary_one_usd_buys',
 'raw_model_response.summary_raw_model_response',
 'raw_model_response.thanks_cost',
 'raw_model_response.thanks_one_usd_buys',
 'raw_model_response.thanks_raw_model_response',
 'scenario.filename',
 'scenario.page',
 'scenario.text']

We can select components of the results to inspect and print:

[12]:
results.select("summary", "authors", "thanks").print(format="rich")
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ answer                               answer                               answer                              ┃
┃ .summary                             .authors                             .thanks                             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ The paper presents a method for      ['Benjamin S. Manning', 'Kehang      ['Drew Houston', 'Jordan            │
│ automatically generating and         Zhu', 'John J. Horton']              Ellenberg', 'Benjamin Lira          │
│ testing social science hypotheses                                         Luttges', 'David Holtz', 'Bruce     │
│ using large language models (LLMs)                                        Sacerdote', 'Paul Röttger',         │
│ and structural causal models. These                                       'Mohammed Alsobay', 'Ray Duch',     │
│ structural causal models help in                                          'Matt Schwartz', 'David Autor',     │
│ formulating hypotheses, designing                                         'Dean Eckles']                      │
│ LLM-based agents, conducting                                                                                  │
│ experiments, and analyzing data.                                                                              │
│ The fitted models can be used for                                                                             │
│ predictions and planning further                                                                              │
│ experiments. The authors                                                                                      │
│ demonstrate this approach through                                                                             │
│ scenarios like negotiations, bail                                                                             │
│ hearings, job interviews, and                                                                                 │
│ auctions, where causal                                                                                        │
│ relationships are tested. The                                                                                 │
│ results show that while LLMs can                                                                              │
│ predict the direction of effects,                                                                             │
│ they struggle with estimating                                                                                 │
│ magnitudes. However, when                                                                                     │
│ conditioned on the fitted                                                                                     │
│ structural causal models, the                                                                                 │
│ accuracy of LLM predictions                                                                                   │
│ improves significantly. The study                                                                             │
│ highlights that LLMs possess more                                                                             │
│ knowledge than they can directly                                                                              │
│ express.                                                                                                      │
└─────────────────────────────────────┴─────────────────────────────────────┴─────────────────────────────────────┘

Posting to the Coop

The Coop is a platform for creating, storing and sharing LLM-based research. It is fully integrated with EDSL and accessible from your workspace or Coop account page. Learn more about creating an account and using the Coop.

Here we demonstrate how to post this notebook:

[13]:
from edsl import Notebook
[14]:
n = Notebook(path = "scenario_from_pdf.ipynb")
[15]:
n.push(description = "Example code for generating scenarios from PDFs", visibility = "public")
[15]:
{'description': 'Example code for generating scenarios from PDFs',
 'object_type': 'notebook',
 'url': 'https://www.expectedparrot.com/content/b9cb2a90-c3e3-4d80-8bb1-0e19b75b535d',
 'uuid': 'b9cb2a90-c3e3-4d80-8bb1-0e19b75b535d',
 'version': '0.1.33.dev1',
 'visibility': 'public'}