Caching LLM Calls
The Cache class is used to store responses from a language model so that they can easily be retrieved, reused and shared.
What is a cache? (Wikipedia)Why caching?
Language model outputs are expensive to create, both in terms of time and money. As such, it is useful to store the outputs of a language model in a cache so that they can be re-used later.
Use cases:
Avoid re-running the same queries if a job fails only partially, only sending the new queries to the language model.
Share your cache with others so they can re-run your queries at no cost.
Use a common remote cache to avoid re-running queries that others have already run.
Build up training data to train or fine-tune a smaller model.
Build up a public repository of queries and responses so others can learn from them.
How it works
A Cache is a dictionary-like object that stores the inputs and outputs of a language model. Specifically, a cache has an attribute, data, that is dictionary-like.
The keys of a cache are hashes of the unique inputs to a language model. The values are CacheEntry objects, which store the inputs and outputs of a language model.
A cache can be stored as either a Python in-memory dictionary or a dictionary connected to a SQLite3 database. The default constructor is an in-memory dictionary. If a SQLite3 database is used, a cache will persist automatically between sessions. You can also specify that a cache be used for a specific session, in which case it will not persist between sessions.
After a session, the cache will have new entries. These can be written to a local SQLite3 database, a JSONL file, or a remote server.
Instantiating a new cache
This code will instantiate a new cache object but using a dictionary as the data attribute.
In-memory usage
from edsl import Cache
my_in_memory_cache = Cache()
It can then be passed as an object to a run method:
from edsl import QuestionFreeText
q = QuestionFreeText.example()
results = q.run(cache = my_in_memory_cache)
If an in-memory cache is not stored explicitly, the data will be lost when the session is over–unless it is written to a file or remote caching is instantiated. More on this below.
Local persistence for an in-memory cache
c = Cache()
# a bunch of operations
c.write_sqlite_db("example.db")
# or
c.write_jsonl("example.jsonl")
You can then load the cache from the SQLite3 database or JSONL file using Cache methods.
c = Cache.from_sqlite_db("example.db")
# or
c = Cache.from_jsonl("example.jsonl")
SQLite3Dict for transactions
Instead of using a dictionary as the data attribute, you can use a special dictionary-like object based on SQLite3. This will persist the cache between sessions. This is the “normal” way that a cache is used for runs where no specic cache is passed.
from edsl import Cache
from edsl.data.SQLiteDict import SQLiteDict
my_sqlite_cache = Cache(data = SQLiteDict("example.db"))
This will leave a SQLite3 database on the user’s machine at the file, in this case example.db in the current directory. It will persist between sessions and can be loaded using the from_sqlite_db method shown above.
Default SQLite Cache: .edsl_cache/data.db
By default, the cache will be stored in a SQLite3 database at the path .edsl_cache/data.db. You can interact with this cache directly, e.g.,
sqlite3 .edsl_cache/data.db
Setting a session cache
The set_session_cache function is used to set the cache for a session:
from edsl import Cache, set_session_cache
set_session_cache(Cache())
The cache can be set to a specific cache object, or it can be set to a dictionary or SQLite3Dict object.
from edsl import Cache, set_session_cache
from edsl.data import SQLiteDict
set_session_cache(Cache(data = SQLiteDict("example.db")))
# or
set_session_cache(Cache(data = {}))
This will set the cache for the current session, and you do not need to pass the cache object to the run method during the session.
The unset_session_cache function is used to unset the cache for a session:
from edsl import unset_session_cache
unset_session_cache()
This will unset the cache for the current session, and you will need to pass the cache object to the run method during the session.
Avoiding cache persistence
We can avoid cache persistence by passing cache=False to the run method:
from edsl import QuestionFreeText
q = QuestionFreeText.example()
results = q.run(cache = False)
For developers
Delayed cache-writing: Useful for remote caching
Separate from this remote cache syncing, delays can be made in writing to the cache itself. By default, the cache will write to the cache immediately after storing a new entry. However, this can be changed by setting the immediate_write parameter to False.
c = Cache(immediate_write = False)
This is useful when you want to store entries to the cache only after a block of code has been executed. This is also controlled by using the cache object as a context.
with c as cache:
# readings / writing
...
# The cache will be written to the cache persistence layer after the block of code has been executed
Cache class
The Cache class is used to store responses from a language model.
- class edsl.data.Cache.Cache(*, filename: str | None = None, data: 'SQLiteDict' | dict | None = None, immediate_write: bool = True, method=None, verbose=False)[source]
Bases:
Base
A class that represents a cache of responses from a language model.
- Parameters:
data – The data to initialize the cache with.
immediate_write – Whether to write to the cache immediately after storing a new entry.
Deprecated:
- Parameters:
method – The method of storage to use for the cache.
- __init__(*, filename: str | None = None, data: 'SQLiteDict' | dict | None = None, immediate_write: bool = True, method=None, verbose=False)[source]
Create two dictionaries to store the cache data.
- Parameters:
filename – The name of the file to read/write the cache from/to.
data – The data to initialize the cache with.
immediate_write – Whether to write to the cache immediately after storing a new entry.
method – The method of storage to use for the cache.
- add_from_dict(new_data: dict[str, CacheEntry], write_now: bool | None = True) None [source]
Add entries to the cache from a dictionary.
- Parameters:
write_now – Whether to write to the cache immediately (similar to immediate_write).
- add_from_jsonl(filename: str, write_now: bool | None = True) None [source]
Add entries to the cache from a JSONL.
- Parameters:
write_now – Whether to write to the cache immediately (similar to immediate_write).
- add_from_sqlite(db_path: str, write_now: bool | None = True)[source]
Add entries to the cache from an SQLite database.
- Parameters:
write_now – Whether to write to the cache immediately (similar to immediate_write).
- classmethod example(randomize: bool = False) Cache [source]
Returns an example Cache instance.
- Parameters:
randomize – If True, uses CacheEntry’s randomize method.
- fetch(*, model: str, parameters: dict, system_prompt: str, user_prompt: str, iteration: int)[source]
Fetch a value (LLM output) from the cache.
- Parameters:
model – The name of the language model.
parameters – The model parameters.
system_prompt – The system prompt.
user_prompt – The user prompt.
iteration – The iteration number.
Return None if the response is not found.
>>> c = Cache() >>> c.fetch(model="gpt-3", parameters="default", system_prompt="Hello", user_prompt="Hi", iteration=1)[0] is None True
- classmethod from_jsonl(jsonlfile: str, db_path: str | None = None) Cache [source]
Construct a Cache from a JSONL file.
- Parameters:
jsonlfile – The path to the JSONL file of cache entries.
db_path – The path to the SQLite database used to store the cache.
If db_path is None, the cache will be stored in memory, as a dictionary.
If db_path is provided, the cache will be stored in an SQLite database.
- keys()[source]
>>> from edsl import Cache >>> Cache.example().keys() ['5513286eb6967abc0511211f0402587d']
- store(model: str, parameters: str, system_prompt: str, user_prompt: str, response: dict, iteration: int) str [source]
Add a new key-value pair to the cache.
Key is a hash of the input parameters.
Output is the response from the language model.
How it works:
The key-value pair is added to self.new_entries
If immediate_write is True , the key-value pair is added to self.data
If immediate_write is False, the key-value pair is added to self.new_entries_to_write_later
>>> from edsl import Cache, Model, Question >>> m = Model("test") >>> c = Cache() >>> len(c) 0 >>> results = Question.example("free_text").by(m).run(cache = c, disable_remote_cache = True, disable_remote_inference = True) >>> len(c) 1