Data cleaning

This notebook provides sample EDSL code for using a language model to conduct a data cleaning task. In a series of steps we use EDSL to prompt a language model to generate appropriate sense checks for a dataset and then run the sense checks in the form of a survey about the data, returning a new dataset consisting of the data failing the checks.

EDSL is an open-source library for simulating surveys, experiments and other research with AI agents and large language models. Before running the code below, please ensure that you have installed the EDSL library and either activated remote inference from your Coop account or stored API keys for the language models that you want to use with EDSL. Please also see our documentation page for tips and tutorials on getting started using EDSL.

Example data

Here we construct a dataset for our exercise: a random list of ages between 22 and 85 with some bad values mixed in. Our goal is to identify them:

[1]:
ages = [84, 62, 79, 57, 59, 55, 68, 66, 47, 54, 76, 33, 74, 56, 47, 24, 23, 38, 38, 54, 51, 84, 71,
        46, 38, 26, 50, 56, 62, 39, 31, 52, 69, 84, 69, 48, 48, 23, 65, 54, 78, 51, 69, 77, 75, 76,
        26, 44, 61, 32, 70, 24, 74, 22, 32, 24, 80, 65, 36, 42, 84, 66, 40, 85, 28, 22, 67, 25, 70,
        77, 53, 69, 64, 27, 61, 68, 68, 78, 0.99, 83, 58, 33, 46, 43, 50, 85, 28, 82, 50, 61, 66, 32,
        45, 70, 56, 50, 43, 30, 43, 55, 33, 72, 43, 43, -5, 32, 43, 45, 67, 84, 37, 63, 52, 53, 58,
        79, 79, 80, 62, 75, 57, 60, 39, 79, 49, 60, 60, 37, 45, 36, 1050, 73, 70, 56, 39, 58, 69, 77,
        68, 84, 78, 48, 31, 74, 27, 55, 56, 66, 35, 39, 57, 47, 29, 24, 47, 60, 43, 37, 84, 64, 28,
        22, 37, 71, 77, 76, 84, 63, 76, 58, 41, 72, 22, 63, 78, 49, 82, 69, "old", 37, 27, 29, 54, 83,
        80, 74, 48, 76, 49, 26, 38, 35, 36, 25, 23, 71, 33, 39, 40, 35, 85, 24, 57, 85, 63, 53, 62,
        47, 69, 76, 71, 48, 62, 23, 25, 84, 32, 63, 75, 31, 25, 50, 85, 36, 58, 85, 34, 62, 43, 2,
        50, 83, 44, 73, 81, 44, 43, 82, 84, 30, 24, 63, 63, 59, 46, 30, 62, 25, 52, 23, 100, 1.3, 3]

Quick question

With a small dataset, we may be able to design the entire task as a single question where we prompt a model to review all the data at once and flag bad data:

[2]:
from edsl import QuestionList, Scenario

q = QuestionList(
    question_name = "bad_ages",
    question_text = """
    Review the following list of observations of human ages
    and return a list of all the problematic ages: {{ ages }}
    """
)

s = Scenario({"ages":ages})

results = q.by(s).run()

results.select("bad_ages", "bad_ages_comment")
Remote Job Log (2024-12-14 10:38:06)
Remote inference activated. Sending job to server...
Your survey is running at the Expected Parrot server...
Job sent to server. (Job uuid=eb3332e8-0c40-4fe2-9f8c-a0ee69227e12).
Job status: running - last update: 2024-12-14 10:37:53 AM
Job status: running - last update: 2024-12-14 10:37:56 AM
[2]:
answer.bad_ages comment.bad_ages_comment
[0.99, -5, 1050, 'old', 1.3]# The problematic ages include non-integer values ("0.99", "1.3"), a negative value ("-5"), an extremely high value ("1050"), and a non-numeric string ("old").

This approach may be feasible for a small dataset that is easily checked. For larger datasets, we may encounter problems with input token limits, a model’s ability to accurately check a large volume of data at once, and responses that are not usefully formatted.

Below we demonstrate some ways of approaching the task in an iterative manner instead.

Constructing a question

We start by creating a question to prompt a model to draft sense check questions for our data. EDSL comes with a variety of question types that we can choose from based on the desired form of the response (multiple choice, free text, etc.). Here we use QuestionList in order to prompt the model to format its response as a list. We use a {{ placeholder }} for content that we will add to the question when we run it (a description of the data and a sample); this allows us to re-use the question with other contexts as desired:

[3]:
from edsl import QuestionList

q1 = QuestionList(
    question_name = "sense_check_questions",
    question_text = """
    You are being asked to suggest a list of sense checks for a dataset consisting of {{ data_description }}.
    Here is a sample of the data: {{ sample_data }}.
    Format the sense checks as a list of questions to be answered about each item in the dataset individually,
    using '<data>' as a placeholder for an item being reviewed in each question text.
    """,
    max_list_items = 4 # optional
)

Adding context to the question

Next we create Scenario objects representing the content that we want to add to the question when we run it. Here we create a single scenario for our example data:

[4]:
import random

sample_data = random.sample(ages, 10)
[5]:
from edsl import Scenario

s = Scenario({
    "data_description": "a list of human ages (in years)",
    "sample_data": sample_data
})
s
[5]:

Scenario

key value
data_descriptiona list of human ages (in years)
sample_data:0 84
sample_data:1 38
sample_data:2 55
sample_data:3 66
sample_data:4 41
sample_data:5 70
sample_data:6 82
sample_data:7 76
sample_data:8 62
sample_data:9 57

Running the question

We administer the question to a model by adding the scenarios and calling the run method. This generates a formatted dataset of Results that we can access with built-in methods for analysis. Here we inspect the answer:

[6]:
results = q1.by(s).run()
Remote Job Log (2024-12-14 10:38:58)
Remote inference activated. Sending job to server...
Your survey is running at the Expected Parrot server...
Job sent to server. (Job uuid=6c979ce9-b7f8-4e1a-b7a8-d543a90762c4).
Job status: running - last update: 2024-12-14 10:38:41 AM
Job status: running - last update: 2024-12-14 10:38:50 AM
[7]:
results.select("sense_check_questions")
[7]:
answer.sense_check_questions
['Is <data> a non-negative integer?', 'Is <data> a reasonable human age (e.g., less than 130)?', 'Is <data> consistent with the expected range of ages for the dataset?', 'Does <data> have any obvious data entry errors (e.g., typos)?']

Conducting the task

Next we want a model to answer each sense check question about each piece of data in the dataset. This can be done by using the sense check questions as scenarios of a new question explaining the task. We can use QuestionYesNo to easily filter the responses:

[8]:
from edsl import QuestionYesNo

q2 = QuestionYesNo(
    question_name = "check_data",
    question_text = """
    You are being asked to sense check a dataset consisting of {{ data_description }}.
    Consider the following item in the dataset: {{ age }}
    {{ sense_check_question }}
    """
)

We need to create a new set of scenarios for the question. We use ScenarioList objects to create all the combinations of values to add to the question (learn more about constructing scenarios from different data sources):

[9]:
from edsl import ScenarioList

sl = ScenarioList(
    Scenario({
        "data_description": "a list of human ages (in years)",
        "sample_data": sample_data,
        "age": age,
        "sense_check_question": sense_check_question
    }) for age in ages for sense_check_question in results.select("sense_check_questions").to_list()[0]
)

We can inspect the scenarios that we created:

[10]:
sl.sample(3)
[10]:

ScenarioList scenarios: 3; keys: ['sense_check_question', 'age', 'sample_data', 'data_description'];

data_description sample_data agesense_check_question
a list of human ages (in years)[84, 38, 55, 66, 41, 70, 82, 76, 62, 57] 39Is <data> a non-negative integer?
a list of human ages (in years)[84, 38, 55, 66, 41, 70, 82, 76, 62, 57] 27Is <data> a reasonable human age (e.g., less than 130)?
a list of human ages (in years)[84, 38, 55, 66, 41, 70, 82, 76, 62, 57] 62Does <data> have any obvious data entry errors (e.g., typos)?

Same as with a single scenario, we add all the scenarios to the question at once when we run it:

[11]:
results = q2.by(sl).run()
Remote Job Log (2024-12-14 10:40:37)
Remote inference activated. Sending job to server...
Your survey is running at the Expected Parrot server...
Job sent to server. (Job uuid=d2f3d7f1-4d08-4002-a349-f1c46b7d246c).
Job status: running - last update: 2024-12-14 10:39:47 AM
Job status: running - last update: 2024-12-14 10:39:51 AM
Job status: running - last update: 2024-12-14 10:39:54 AM
Job status: running - last update: 2024-12-14 10:39:58 AM
Job status: running - last update: 2024-12-14 10:40:01 AM
Job status: running - last update: 2024-12-14 10:40:05 AM
Job status: running - last update: 2024-12-14 10:40:08 AM
Job status: running - last update: 2024-12-14 10:40:11 AM
Job status: running - last update: 2024-12-14 10:40:15 AM
Job status: running - last update: 2024-12-14 10:40:19 AM
Job status: running - last update: 2024-12-14 10:40:22 AM
Job status: running - last update: 2024-12-14 10:40:26 AM
Job status: running - last update: 2024-12-14 10:40:29 AM
Job status: running - last update: 2024-12-14 10:40:32 AM

We can filter, sort, select and print any components of the results that are generated:

[12]:
(
    results
    .filter("check_data == 'No'")
    .sort_by("sense_check_question")
    .select("sense_check_question", "age")
)
[12]:
scenario.sense_check_question scenario.age
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 84
Does <data> have any obvious data entry errors (e.g., typos)? 62
Does <data> have any obvious data entry errors (e.g., typos)? 62
Does <data> have any obvious data entry errors (e.g., typos)? 62
Does <data> have any obvious data entry errors (e.g., typos)? 62
Does <data> have any obvious data entry errors (e.g., typos)? 62
Does <data> have any obvious data entry errors (e.g., typos)? 62
Does <data> have any obvious data entry errors (e.g., typos)? 62
Does <data> have any obvious data entry errors (e.g., typos)? 62
Does <data> have any obvious data entry errors (e.g., typos)? 62
Does <data> have any obvious data entry errors (e.g., typos)? 62
Does <data> have any obvious data entry errors (e.g., typos)? 62
Does <data> have any obvious data entry errors (e.g., typos)? 79
Does <data> have any obvious data entry errors (e.g., typos)? 79
Does <data> have any obvious data entry errors (e.g., typos)? 79
Does <data> have any obvious data entry errors (e.g., typos)? 79
Does <data> have any obvious data entry errors (e.g., typos)? 79
Does <data> have any obvious data entry errors (e.g., typos)? 57
Does <data> have any obvious data entry errors (e.g., typos)? 57
Does <data> have any obvious data entry errors (e.g., typos)? 57
Does <data> have any obvious data entry errors (e.g., typos)? 57
Does <data> have any obvious data entry errors (e.g., typos)? 57
Does <data> have any obvious data entry errors (e.g., typos)? 57
Does <data> have any obvious data entry errors (e.g., typos)? 59
Does <data> have any obvious data entry errors (e.g., typos)? 59
Does <data> have any obvious data entry errors (e.g., typos)? 59
Does <data> have any obvious data entry errors (e.g., typos)? 55
Does <data> have any obvious data entry errors (e.g., typos)? 55
Does <data> have any obvious data entry errors (e.g., typos)? 55
Does <data> have any obvious data entry errors (e.g., typos)? 55
Does <data> have any obvious data entry errors (e.g., typos)? 68
Does <data> have any obvious data entry errors (e.g., typos)? 68
Does <data> have any obvious data entry errors (e.g., typos)? 68
Does <data> have any obvious data entry errors (e.g., typos)? 68
Does <data> have any obvious data entry errors (e.g., typos)? 68
Does <data> have any obvious data entry errors (e.g., typos)? 68
Does <data> have any obvious data entry errors (e.g., typos)? 66
Does <data> have any obvious data entry errors (e.g., typos)? 66
Does <data> have any obvious data entry errors (e.g., typos)? 66
Does <data> have any obvious data entry errors (e.g., typos)? 66
Does <data> have any obvious data entry errors (e.g., typos)? 47
Does <data> have any obvious data entry errors (e.g., typos)? 47
Does <data> have any obvious data entry errors (e.g., typos)? 47
Does <data> have any obvious data entry errors (e.g., typos)? 47
Does <data> have any obvious data entry errors (e.g., typos)? 47
Does <data> have any obvious data entry errors (e.g., typos)? 47
Does <data> have any obvious data entry errors (e.g., typos)? 54
Does <data> have any obvious data entry errors (e.g., typos)? 54
Does <data> have any obvious data entry errors (e.g., typos)? 54
Does <data> have any obvious data entry errors (e.g., typos)? 54
Does <data> have any obvious data entry errors (e.g., typos)? 54
Does <data> have any obvious data entry errors (e.g., typos)? 54
Does <data> have any obvious data entry errors (e.g., typos)? 54
Does <data> have any obvious data entry errors (e.g., typos)? 76
Does <data> have any obvious data entry errors (e.g., typos)? 76
Does <data> have any obvious data entry errors (e.g., typos)? 76
Does <data> have any obvious data entry errors (e.g., typos)? 76
Does <data> have any obvious data entry errors (e.g., typos)? 76
Does <data> have any obvious data entry errors (e.g., typos)? 76
Does <data> have any obvious data entry errors (e.g., typos)? 76
Does <data> have any obvious data entry errors (e.g., typos)? 76
Does <data> have any obvious data entry errors (e.g., typos)? 76
Does <data> have any obvious data entry errors (e.g., typos)? 76
Does <data> have any obvious data entry errors (e.g., typos)? 33
Does <data> have any obvious data entry errors (e.g., typos)? 33
Does <data> have any obvious data entry errors (e.g., typos)? 33
Does <data> have any obvious data entry errors (e.g., typos)? 33
Does <data> have any obvious data entry errors (e.g., typos)? 33
Does <data> have any obvious data entry errors (e.g., typos)? 74
Does <data> have any obvious data entry errors (e.g., typos)? 74
Does <data> have any obvious data entry errors (e.g., typos)? 74
Does <data> have any obvious data entry errors (e.g., typos)? 74
Does <data> have any obvious data entry errors (e.g., typos)? 74
Does <data> have any obvious data entry errors (e.g., typos)? 74
Does <data> have any obvious data entry errors (e.g., typos)? 56
Does <data> have any obvious data entry errors (e.g., typos)? 56
Does <data> have any obvious data entry errors (e.g., typos)? 56
Does <data> have any obvious data entry errors (e.g., typos)? 56
Does <data> have any obvious data entry errors (e.g., typos)? 56
Does <data> have any obvious data entry errors (e.g., typos)? 56
Does <data> have any obvious data entry errors (e.g., typos)? 24
Does <data> have any obvious data entry errors (e.g., typos)? 24
Does <data> have any obvious data entry errors (e.g., typos)? 24
Does <data> have any obvious data entry errors (e.g., typos)? 24
Does <data> have any obvious data entry errors (e.g., typos)? 24
Does <data> have any obvious data entry errors (e.g., typos)? 24
Does <data> have any obvious data entry errors (e.g., typos)? 24
Does <data> have any obvious data entry errors (e.g., typos)? 24
Does <data> have any obvious data entry errors (e.g., typos)? 23
Does <data> have any obvious data entry errors (e.g., typos)? 23
Does <data> have any obvious data entry errors (e.g., typos)? 23
Does <data> have any obvious data entry errors (e.g., typos)? 23
Does <data> have any obvious data entry errors (e.g., typos)? 23
Does <data> have any obvious data entry errors (e.g., typos)? 23
Does <data> have any obvious data entry errors (e.g., typos)? 23
Does <data> have any obvious data entry errors (e.g., typos)? 23
Does <data> have any obvious data entry errors (e.g., typos)? 23
Does <data> have any obvious data entry errors (e.g., typos)? 38
Does <data> have any obvious data entry errors (e.g., typos)? 38
Does <data> have any obvious data entry errors (e.g., typos)? 38
Does <data> have any obvious data entry errors (e.g., typos)? 38
Does <data> have any obvious data entry errors (e.g., typos)? 38
Does <data> have any obvious data entry errors (e.g., typos)? 51
Does <data> have any obvious data entry errors (e.g., typos)? 51
Does <data> have any obvious data entry errors (e.g., typos)? 51
Does <data> have any obvious data entry errors (e.g., typos)? 71
Does <data> have any obvious data entry errors (e.g., typos)? 71
Does <data> have any obvious data entry errors (e.g., typos)? 71
Does <data> have any obvious data entry errors (e.g., typos)? 71
Does <data> have any obvious data entry errors (e.g., typos)? 71
Does <data> have any obvious data entry errors (e.g., typos)? 71
Does <data> have any obvious data entry errors (e.g., typos)? 71
Does <data> have any obvious data entry errors (e.g., typos)? 46
Does <data> have any obvious data entry errors (e.g., typos)? 46
Does <data> have any obvious data entry errors (e.g., typos)? 46
Does <data> have any obvious data entry errors (e.g., typos)? 46
Does <data> have any obvious data entry errors (e.g., typos)? 46
Does <data> have any obvious data entry errors (e.g., typos)? 26
Does <data> have any obvious data entry errors (e.g., typos)? 26
Does <data> have any obvious data entry errors (e.g., typos)? 26
Does <data> have any obvious data entry errors (e.g., typos)? 26
Does <data> have any obvious data entry errors (e.g., typos)? 26
Does <data> have any obvious data entry errors (e.g., typos)? 50
Does <data> have any obvious data entry errors (e.g., typos)? 50
Does <data> have any obvious data entry errors (e.g., typos)? 50
Does <data> have any obvious data entry errors (e.g., typos)? 50
Does <data> have any obvious data entry errors (e.g., typos)? 50
Does <data> have any obvious data entry errors (e.g., typos)? 50
Does <data> have any obvious data entry errors (e.g., typos)? 50
Does <data> have any obvious data entry errors (e.g., typos)? 50
Does <data> have any obvious data entry errors (e.g., typos)? 50
Does <data> have any obvious data entry errors (e.g., typos)? 50
Does <data> have any obvious data entry errors (e.g., typos)? 39
Does <data> have any obvious data entry errors (e.g., typos)? 39
Does <data> have any obvious data entry errors (e.g., typos)? 39
Does <data> have any obvious data entry errors (e.g., typos)? 39
Does <data> have any obvious data entry errors (e.g., typos)? 39
Does <data> have any obvious data entry errors (e.g., typos)? 39
Does <data> have any obvious data entry errors (e.g., typos)? 39
Does <data> have any obvious data entry errors (e.g., typos)? 31
Does <data> have any obvious data entry errors (e.g., typos)? 31
Does <data> have any obvious data entry errors (e.g., typos)? 31
Does <data> have any obvious data entry errors (e.g., typos)? 31
Does <data> have any obvious data entry errors (e.g., typos)? 31
Does <data> have any obvious data entry errors (e.g., typos)? 52
Does <data> have any obvious data entry errors (e.g., typos)? 52
Does <data> have any obvious data entry errors (e.g., typos)? 52
Does <data> have any obvious data entry errors (e.g., typos)? 52
Does <data> have any obvious data entry errors (e.g., typos)? 69
Does <data> have any obvious data entry errors (e.g., typos)? 69
Does <data> have any obvious data entry errors (e.g., typos)? 69
Does <data> have any obvious data entry errors (e.g., typos)? 69
Does <data> have any obvious data entry errors (e.g., typos)? 69
Does <data> have any obvious data entry errors (e.g., typos)? 69
Does <data> have any obvious data entry errors (e.g., typos)? 69
Does <data> have any obvious data entry errors (e.g., typos)? 69
Does <data> have any obvious data entry errors (e.g., typos)? 69
Does <data> have any obvious data entry errors (e.g., typos)? 69
Does <data> have any obvious data entry errors (e.g., typos)? 69
Does <data> have any obvious data entry errors (e.g., typos)? 48
Does <data> have any obvious data entry errors (e.g., typos)? 48
Does <data> have any obvious data entry errors (e.g., typos)? 48
Does <data> have any obvious data entry errors (e.g., typos)? 48
Does <data> have any obvious data entry errors (e.g., typos)? 48
Does <data> have any obvious data entry errors (e.g., typos)? 48
Does <data> have any obvious data entry errors (e.g., typos)? 48
Does <data> have any obvious data entry errors (e.g., typos)? 65
Does <data> have any obvious data entry errors (e.g., typos)? 65
Does <data> have any obvious data entry errors (e.g., typos)? 78
Does <data> have any obvious data entry errors (e.g., typos)? 78
Does <data> have any obvious data entry errors (e.g., typos)? 78
Does <data> have any obvious data entry errors (e.g., typos)? 78
Does <data> have any obvious data entry errors (e.g., typos)? 78
Does <data> have any obvious data entry errors (e.g., typos)? 78
Does <data> have any obvious data entry errors (e.g., typos)? 77
Does <data> have any obvious data entry errors (e.g., typos)? 77
Does <data> have any obvious data entry errors (e.g., typos)? 77
Does <data> have any obvious data entry errors (e.g., typos)? 77
Does <data> have any obvious data entry errors (e.g., typos)? 75
Does <data> have any obvious data entry errors (e.g., typos)? 75
Does <data> have any obvious data entry errors (e.g., typos)? 75
Does <data> have any obvious data entry errors (e.g., typos)? 75
Does <data> have any obvious data entry errors (e.g., typos)? 44
Does <data> have any obvious data entry errors (e.g., typos)? 44
Does <data> have any obvious data entry errors (e.g., typos)? 44
Does <data> have any obvious data entry errors (e.g., typos)? 44
Does <data> have any obvious data entry errors (e.g., typos)? 44
Does <data> have any obvious data entry errors (e.g., typos)? 61
Does <data> have any obvious data entry errors (e.g., typos)? 61
Does <data> have any obvious data entry errors (e.g., typos)? 61
Does <data> have any obvious data entry errors (e.g., typos)? 32
Does <data> have any obvious data entry errors (e.g., typos)? 32
Does <data> have any obvious data entry errors (e.g., typos)? 32
Does <data> have any obvious data entry errors (e.g., typos)? 32
Does <data> have any obvious data entry errors (e.g., typos)? 32
Does <data> have any obvious data entry errors (e.g., typos)? 32
Does <data> have any obvious data entry errors (e.g., typos)? 32
Does <data> have any obvious data entry errors (e.g., typos)? 70
Does <data> have any obvious data entry errors (e.g., typos)? 70
Does <data> have any obvious data entry errors (e.g., typos)? 70
Does <data> have any obvious data entry errors (e.g., typos)? 70
Does <data> have any obvious data entry errors (e.g., typos)? 70
Does <data> have any obvious data entry errors (e.g., typos)? 22
Does <data> have any obvious data entry errors (e.g., typos)? 22
Does <data> have any obvious data entry errors (e.g., typos)? 22
Does <data> have any obvious data entry errors (e.g., typos)? 22
Does <data> have any obvious data entry errors (e.g., typos)? 22
Does <data> have any obvious data entry errors (e.g., typos)? 22
Does <data> have any obvious data entry errors (e.g., typos)? 80
Does <data> have any obvious data entry errors (e.g., typos)? 80
Does <data> have any obvious data entry errors (e.g., typos)? 80
Does <data> have any obvious data entry errors (e.g., typos)? 80
Does <data> have any obvious data entry errors (e.g., typos)? 36
Does <data> have any obvious data entry errors (e.g., typos)? 36
Does <data> have any obvious data entry errors (e.g., typos)? 36
Does <data> have any obvious data entry errors (e.g., typos)? 36
Does <data> have any obvious data entry errors (e.g., typos)? 36
Does <data> have any obvious data entry errors (e.g., typos)? 36
Does <data> have any obvious data entry errors (e.g., typos)? 42
Does <data> have any obvious data entry errors (e.g., typos)? 40
Does <data> have any obvious data entry errors (e.g., typos)? 40
Does <data> have any obvious data entry errors (e.g., typos)? 40
Does <data> have any obvious data entry errors (e.g., typos)? 85
Does <data> have any obvious data entry errors (e.g., typos)? 85
Does <data> have any obvious data entry errors (e.g., typos)? 85
Does <data> have any obvious data entry errors (e.g., typos)? 85
Does <data> have any obvious data entry errors (e.g., typos)? 85
Does <data> have any obvious data entry errors (e.g., typos)? 85
Does <data> have any obvious data entry errors (e.g., typos)? 85
Does <data> have any obvious data entry errors (e.g., typos)? 85
Does <data> have any obvious data entry errors (e.g., typos)? 85
Does <data> have any obvious data entry errors (e.g., typos)? 85
Does <data> have any obvious data entry errors (e.g., typos)? 28
Does <data> have any obvious data entry errors (e.g., typos)? 28
Does <data> have any obvious data entry errors (e.g., typos)? 28
Does <data> have any obvious data entry errors (e.g., typos)? 28
Does <data> have any obvious data entry errors (e.g., typos)? 67
Does <data> have any obvious data entry errors (e.g., typos)? 67
Does <data> have any obvious data entry errors (e.g., typos)? 67
Does <data> have any obvious data entry errors (e.g., typos)? 25
Does <data> have any obvious data entry errors (e.g., typos)? 25
Does <data> have any obvious data entry errors (e.g., typos)? 25
Does <data> have any obvious data entry errors (e.g., typos)? 25
Does <data> have any obvious data entry errors (e.g., typos)? 25
Does <data> have any obvious data entry errors (e.g., typos)? 25
Does <data> have any obvious data entry errors (e.g., typos)? 25
Does <data> have any obvious data entry errors (e.g., typos)? 25
Does <data> have any obvious data entry errors (e.g., typos)? 25
Does <data> have any obvious data entry errors (e.g., typos)? 53
Does <data> have any obvious data entry errors (e.g., typos)? 53
Does <data> have any obvious data entry errors (e.g., typos)? 53
Does <data> have any obvious data entry errors (e.g., typos)? 53
Does <data> have any obvious data entry errors (e.g., typos)? 53
Does <data> have any obvious data entry errors (e.g., typos)? 64
Does <data> have any obvious data entry errors (e.g., typos)? 64
Does <data> have any obvious data entry errors (e.g., typos)? 64
Does <data> have any obvious data entry errors (e.g., typos)? 27
Does <data> have any obvious data entry errors (e.g., typos)? 27
Does <data> have any obvious data entry errors (e.g., typos)? 27
Does <data> have any obvious data entry errors (e.g., typos)? 27
Does <data> have any obvious data entry errors (e.g., typos)? 27
Does <data> have any obvious data entry errors (e.g., typos)? 0.99
Does <data> have any obvious data entry errors (e.g., typos)? 83
Does <data> have any obvious data entry errors (e.g., typos)? 83
Does <data> have any obvious data entry errors (e.g., typos)? 83
Does <data> have any obvious data entry errors (e.g., typos)? 83
Does <data> have any obvious data entry errors (e.g., typos)? 83
Does <data> have any obvious data entry errors (e.g., typos)? 58
Does <data> have any obvious data entry errors (e.g., typos)? 58
Does <data> have any obvious data entry errors (e.g., typos)? 58
Does <data> have any obvious data entry errors (e.g., typos)? 58
Does <data> have any obvious data entry errors (e.g., typos)? 58
Does <data> have any obvious data entry errors (e.g., typos)? 58
Does <data> have any obvious data entry errors (e.g., typos)? 58
Does <data> have any obvious data entry errors (e.g., typos)? 58
Does <data> have any obvious data entry errors (e.g., typos)? 58
Does <data> have any obvious data entry errors (e.g., typos)? 43
Does <data> have any obvious data entry errors (e.g., typos)? 43
Does <data> have any obvious data entry errors (e.g., typos)? 43
Does <data> have any obvious data entry errors (e.g., typos)? 43
Does <data> have any obvious data entry errors (e.g., typos)? 43
Does <data> have any obvious data entry errors (e.g., typos)? 43
Does <data> have any obvious data entry errors (e.g., typos)? 43
Does <data> have any obvious data entry errors (e.g., typos)? 43
Does <data> have any obvious data entry errors (e.g., typos)? 43
Does <data> have any obvious data entry errors (e.g., typos)? 43
Does <data> have any obvious data entry errors (e.g., typos)? 43
Does <data> have any obvious data entry errors (e.g., typos)? 43
Does <data> have any obvious data entry errors (e.g., typos)? 43
Does <data> have any obvious data entry errors (e.g., typos)? 82
Does <data> have any obvious data entry errors (e.g., typos)? 82
Does <data> have any obvious data entry errors (e.g., typos)? 82
Does <data> have any obvious data entry errors (e.g., typos)? 82
Does <data> have any obvious data entry errors (e.g., typos)? 82
Does <data> have any obvious data entry errors (e.g., typos)? 45
Does <data> have any obvious data entry errors (e.g., typos)? 45
Does <data> have any obvious data entry errors (e.g., typos)? 45
Does <data> have any obvious data entry errors (e.g., typos)? 45
Does <data> have any obvious data entry errors (e.g., typos)? 30
Does <data> have any obvious data entry errors (e.g., typos)? 30
Does <data> have any obvious data entry errors (e.g., typos)? 30
Does <data> have any obvious data entry errors (e.g., typos)? 30
Does <data> have any obvious data entry errors (e.g., typos)? 30
Does <data> have any obvious data entry errors (e.g., typos)? 72
Does <data> have any obvious data entry errors (e.g., typos)? 72
Does <data> have any obvious data entry errors (e.g., typos)? 72
Does <data> have any obvious data entry errors (e.g., typos)? 37
Does <data> have any obvious data entry errors (e.g., typos)? 37
Does <data> have any obvious data entry errors (e.g., typos)? 37
Does <data> have any obvious data entry errors (e.g., typos)? 37
Does <data> have any obvious data entry errors (e.g., typos)? 37
Does <data> have any obvious data entry errors (e.g., typos)? 37
Does <data> have any obvious data entry errors (e.g., typos)? 37
Does <data> have any obvious data entry errors (e.g., typos)? 37
Does <data> have any obvious data entry errors (e.g., typos)? 63
Does <data> have any obvious data entry errors (e.g., typos)? 63
Does <data> have any obvious data entry errors (e.g., typos)? 63
Does <data> have any obvious data entry errors (e.g., typos)? 63
Does <data> have any obvious data entry errors (e.g., typos)? 63
Does <data> have any obvious data entry errors (e.g., typos)? 63
Does <data> have any obvious data entry errors (e.g., typos)? 63
Does <data> have any obvious data entry errors (e.g., typos)? 63
Does <data> have any obvious data entry errors (e.g., typos)? 63
Does <data> have any obvious data entry errors (e.g., typos)? 63
Does <data> have any obvious data entry errors (e.g., typos)? 63
Does <data> have any obvious data entry errors (e.g., typos)? 63
Does <data> have any obvious data entry errors (e.g., typos)? 63
Does <data> have any obvious data entry errors (e.g., typos)? 60
Does <data> have any obvious data entry errors (e.g., typos)? 60
Does <data> have any obvious data entry errors (e.g., typos)? 60
Does <data> have any obvious data entry errors (e.g., typos)? 60
Does <data> have any obvious data entry errors (e.g., typos)? 60
Does <data> have any obvious data entry errors (e.g., typos)? 60
Does <data> have any obvious data entry errors (e.g., typos)? 60
Does <data> have any obvious data entry errors (e.g., typos)? 49
Does <data> have any obvious data entry errors (e.g., typos)? 49
Does <data> have any obvious data entry errors (e.g., typos)? 49
Does <data> have any obvious data entry errors (e.g., typos)? 49
Does <data> have any obvious data entry errors (e.g., typos)? 49
Does <data> have any obvious data entry errors (e.g., typos)? 49
Does <data> have any obvious data entry errors (e.g., typos)? 73
Does <data> have any obvious data entry errors (e.g., typos)? 73
Does <data> have any obvious data entry errors (e.g., typos)? 73
Does <data> have any obvious data entry errors (e.g., typos)? 35
Does <data> have any obvious data entry errors (e.g., typos)? 35
Does <data> have any obvious data entry errors (e.g., typos)? 35
Does <data> have any obvious data entry errors (e.g., typos)? 35
Does <data> have any obvious data entry errors (e.g., typos)? 35
Does <data> have any obvious data entry errors (e.g., typos)? 29
Does <data> have any obvious data entry errors (e.g., typos)? 29
Does <data> have any obvious data entry errors (e.g., typos)? 29
Does <data> have any obvious data entry errors (e.g., typos)? 41
Does <data> have any obvious data entry errors (e.g., typos)? 34
Does <data> have any obvious data entry errors (e.g., typos)? 2
Does <data> have any obvious data entry errors (e.g., typos)? 81
Does <data> have any obvious data entry errors (e.g., typos)? 81
Does <data> have any obvious data entry errors (e.g., typos)? 100
Does <data> have any obvious data entry errors (e.g., typos)? 3
Is <data> a non-negative integer? 0.99
Is <data> a non-negative integer? -5
Is <data> a non-negative integer? old
Is <data> a non-negative integer? old
Is <data> a non-negative integer? 1.3
Is <data> a non-negative integer? 1.3
Is <data> a reasonable human age (e.g., less than 130)? -5
Is <data> a reasonable human age (e.g., less than 130)? 1050
Is <data> a reasonable human age (e.g., less than 130)? old
Is <data> consistent with the expected range of ages for the dataset?-5
Is <data> consistent with the expected range of ages for the dataset?1050
Is <data> consistent with the expected range of ages for the dataset?old
Is <data> consistent with the expected range of ages for the dataset?1.3

Further exploration

This notebook can be readily edited and expanded for other data cleaning and data labeling purposes, or to add personas for AI agents answering the questions with relevant background and expertise. Learn more about using AI agents for your EDSL surveys.

Please see our documentation page for examples of other methods and use cases and let us know if you have any questions!

Posting to the Coop

The Coop is a platform for creating, storing and sharing LLM-based research. It is fully integrated with EDSL and accessible from your workspace or Coop account page. Learn more about creating an account and using the Coop.

Here we post this notebook:

[13]:
from edsl import Notebook
[14]:
n = Notebook(path = "data_cleaning.ipynb")
[15]:
info = n.push(description = "Example code for data cleaning", visibility = "public")
info
[15]:
{'description': 'Example code for data cleaning',
 'object_type': 'notebook',
 'url': 'https://www.expectedparrot.com/content/b5da44a9-6187-4454-ab3b-0e9d0350e005',
 'uuid': 'b5da44a9-6187-4454-ab3b-0e9d0350e005',
 'version': '0.1.39.dev1',
 'visibility': 'public'}

To update an object at the Coop:

[16]:
n = Notebook(path = "data_cleaning.ipynb") # resave
[17]:
n.patch(uuid = info["uuid"], value = n)
[17]:
{'status': 'success'}