Extracting text

EDSL comes with a variety of question types that can be selected based on the form of response that you want. This notebook demonstrates how to use the QuestionExtract question type to return information extracted (or extrapolated) from a given text in the form of a Pythonic dictionary. The required parameters are question_name, question_text and and answer_template, which is a dictionary of example responses that the agent is prompted to use for reference (we will show this in the prompts).

Please see the Questions page of the docs for details on other question types.

Open In Colab

Question template

We start by importing the question type, and then use the .example() method to inspect the format of an example object:

[1]:
# ! pip install edsl
[2]:
from edsl.questions import QuestionExtract
[3]:
QuestionExtract.example()
{
  "question_name": "extract_name",
  "question_text": "My name is Moby Dick. I have a PhD in astrology, but I'm actually a truck driver",
  "answer_template": {
    "name": "John Doe",
    "profession": "Carpenter"
  },
  "question_type": "extract"
}
[3]:
{
    "question_name": "extract_name",
    "question_text": "My name is Moby Dick. I have a PhD in astrology, but I'm actually a truck driver",
    "answer_template": {
        "name": "John Doe",
        "profession": "Carpenter"
    },
    "question_type": "extract"
}

We can then run the example question and check that the agent’s response mirrors the answer_template that it was given:

[4]:
results = QuestionExtract.example().run()
results.select("extract_name").print(format="rich")
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ answer                                              ┃
┃ .extract_name                                       ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ {'name': 'Moby Dick', 'profession': 'truck driver'} │
└─────────────────────────────────────────────────────┘

Creating a question

Here we create a new example of the question type where we prompt the agent to review a (longer) text and return information about it. Note that we use a {{placeholder}} in the question so that we can parameterize it with different texts. This is useful when we want to conduct a data labeling task where we want to ask the same questions about many different pieces of data at once. This is done by creating Scenario objects for the inputs to the questions.

Note also that our instructions to the agent are quite short; we could substitute a more detailed question text with context about the actual task you want performed.

Learn more about using Scenario objects in the docs.

[5]:
simpsons = """
"The Simpsons" is an iconic American animated sitcom created by Matt Groening that debuted in 1989 on the Fox network.
The show is set in the fictional town of Springfield and centers on the Simpsons family, consisting of the bumbling but well-intentioned father Homer, the caring and patient mother Marge, and their three children: mischievous Bart, intelligent Lisa, and baby Maggie.
Renowned for its satirical take on the typical American family and society, the series delves into themes of politics, religion, and pop culture with a distinct blend of humor and wit.
Its longevity, marked by over thirty seasons, makes it one of the longest-running television series in history, influencing many other sitcoms and becoming deeply ingrained in popular culture.
"""
[6]:
from edsl.questions import QuestionExtract
from edsl import Scenario

q = QuestionExtract(
    question_name="example",
    question_text="Review the following text: {{ content }}",
    answer_template={
        "main_characters_list": ["name", "name"],
        "location": "location",
        "genre": "genre",
    },
)

scenario = Scenario({"content": simpsons})
results = q.by(scenario).run()
[7]:
results.select("example").print(format="rich")
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ answer                                                                                                          ┃
┃ .example                                                                                                        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ {'main_characters_list': ['Homer', 'Marge', 'Bart', 'Lisa', 'Maggie'], 'location': 'Springfield', 'genre':      │
│ 'animated sitcom'}                                                                                              │
└─────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Show prompts

We can inspect the prompts that were used to generate the response:

[8]:
results.select("prompt.*").print(format="rich")
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ prompt                                                  prompt                                                 ┃
┃ .example_system_prompt                                  .example_user_prompt                                   ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ {'text': 'You are answering questions as if you were a  {'text': 'You are given the following input: "Review   │
│ human. Do not break character. You are an agent with    the following text: \n"The Simpsons" is an iconic      │
│ the following persona:\n{}', 'class_name':              American animated sitcom created by Matt Groening that │
│ 'AgentInstruction'}                                     debuted in 1989 on the Fox network. \nThe show is set  │
│                                                         in the fictional town of Springfield and centers on    │
│                                                         the Simpsons family, consisting of the bumbling but    │
│                                                         well-intentioned father Homer, the caring and patient  │
│                                                         mother Marge, and their three children: mischievous    │
│                                                         Bart, intelligent Lisa, and baby Maggie. \nRenowned    │
│                                                         for its satirical take on the typical American family  │
│                                                         and society, the series delves into themes of          │
│                                                         politics, religion, and pop culture with a distinct    │
│                                                         blend of humor and wit. \nIts longevity, marked by     │
│                                                         over thirty seasons, makes it one of the               │
│                                                         longest-running television series in history,          │
│                                                         influencing many other sitcoms and becoming deeply     │
│                                                         ingrained in popular culture.\n".\nCreate an ANSWER    │
│                                                         should be formatted like this:                         │
│                                                         "{\'main_characters_list\': [\'name\', \'name\'],      │
│                                                         \'location\': \'location\', \'genre\':                 │
│                                                         \'genre\'}",\nand it should have the same keys but     │
│                                                         values extracted from the input.\nIf the value of a    │
│                                                         key is not present in the input, fill with             │
│                                                         "null".\nReturn a valid JSON formatted like            │
│                                                         this:\n{"answer": <put your ANSWER here>}\nONLY RETURN │
│                                                         THE JSON, AND NOTHING ELSE.', 'class_name': 'Extract'} │
└────────────────────────────────────────────────────────┴────────────────────────────────────────────────────────┘