tree: bb2cecaaa50c6e646a5b306edf4c33587bace393

docs/data-store/README.md

Agent Data Store Documentation

Overview

The Agent Data Store provides persistent key-value storage for agents, enabling them to save and retrieve data across executions. Data is scoped per-user, allowing multi-agent workflows where agents share information through a common data layer.

Key Features:

Persistent storage across agent executions
User-level data isolation
Namespace organization for logical grouping
Namespace discovery for dynamic data access
Batch operations for efficiency
Automatic access tracking and metadata

Documentation Structure

This documentation is organized into several focused guides:

Core Documentation

Quick Start - Get started with the data store in 5 minutes
API Reference - Complete method specifications
Schema - Data model and document structure

Usage Guides

Namespaces - Organizing data with namespaces
Patterns - Common usage patterns and recipes
Cross-Agent Workflows - Sharing data between agents

Operations

Troubleshooting - Common issues and solutions
Administration - Database inspection and management

Quick Start

The data_store object is automatically available in every agent sandbox:

async def run(input_dict: dict, tools: dict) -> dict:
    # Store a value
    data_store.set("my-key", {"result": "some data"})
    
    # Retrieve a value
    value = data_store.get("my-key")
    
    # List all keys
    keys = data_store.list_keys()
    
    return {"stored_keys": keys}

No setup required—just use data_store in your agent code.

Key Concepts

User Isolation

All data is scoped to the current user. Agents can only access data belonging to their user:

User A's agents → User A's data (isolated)
User B's agents → User B's data (isolated)

Namespaces

Namespaces organize data by purpose. The default namespace is "default":

# Default namespace
data_store.set("key", value)

# Custom namespace
cache = data_store.use_namespace("api-cache")
cache.set("user-123", user_data)

Common namespace patterns:

files:{repo} - Repository file contents
summary:{repo} - File summaries
cache:{service} - API response caching
results:{job} - Job results

Namespace Discovery

Find what data exists before querying:

# Discover all namespaces with data
namespaces = data_store.list_namespaces()
# Returns: ["default", "files:my-repo", "cache:github", ...]

# Search for specific namespaces
for ns in namespaces:
    if ns.startswith("files:"):
        files = data_store.use_namespace(ns)
        print(f"{ns}: {len(files.list_keys())} files")

Data Model

Each stored record contains:

Field	Type	Description
`userId`	string	Owner of the data (automatic)
`namespace`	string	Logical grouping
`key`	string	Your key name
`value`	any	Your stored data (JSON-serializable)
`metadata`	object	Optional metadata dict
`createdByAgent`	string	Agent that created the record
`lastAccessedByAgent`	string	Last agent to access
`accessCount`	integer	Number of accesses
`createdAt`	datetime	Creation timestamp
`updatedAt`	datetime	Last update timestamp

Common Tasks

Cache Expensive Operations

cached = data_store.get("expensive-result")
if cached:
    return {"result": cached, "source": "cache"}

# Compute expensive result
result = await expensive_operation()
data_store.set("expensive-result", result)
return {"result": result, "source": "computed"}

Build Results Across Runs

summaries = data_store.use_namespace("code-summaries")
summaries.set(input_dict["file"], analysis)

# Get accumulated results
all_keys = summaries.list_keys()
return {"total_analyzed": len(all_keys)}

Share Data Between Agents

Producer Agent:

shared = data_store.use_namespace("shared-reports")
shared.set("quarterly-report", report)

Consumer Agent:

shared = data_store.use_namespace("shared-reports")
report = shared.get("quarterly-report")

Discover and Search Data

namespaces = data_store.list_namespaces()
results = []

for ns in namespaces:
    if "summary" in ns:
        store = data_store.use_namespace(ns)
        all_data = store.get_all()  # single indexed query per namespace
        for key, value in all_data.items():
            if query in str(value):
                results.append({"namespace": ns, "key": key})

API Summary

Basic Operations

Method	Description
`get(key, default=None)`	Retrieve a value
`set(key, value, metadata=None)`	Store a value
`delete(key)`	Delete a value
`list_keys(prefix=None)`	List keys in current namespace

Namespace Operations

Method	Description
`list_namespaces()`	List all namespaces with data
`use_namespace(namespace)`	Get proxy for different namespace
`clear()`	Delete all data in current namespace

Batch Operations

Method	Description
`get_all()`	Retrieve all key-value pairs in current namespace (single query)
`get_many(keys)`	Retrieve multiple values
`set_many(items, metadata=None)`	Store multiple values

See API Reference for complete documentation.

Storage Backend

The data store uses the same database backend as the rest of Gofannon (CouchDB, Firestore, or DynamoDB). Documents are stored in a dedicated agent_data_store collection/database.

Document IDs are constructed as: {user_id}:{namespace}:{base64_encoded_key}

Queries like list_keys(), list_namespaces(), and get_all() use indexed find() calls instead of full table scans. On CouchDB a Mango index on [userId, namespace] is automatically created at service startup and on first write to a new namespace. Firestore and DynamoDB use their native query mechanisms. The in-memory backend falls back to Python-side filtering.

Limitations

JSON only: Values must be JSON-serializable (no binary data)
User scoped: Agents cannot access other users' data
No TTL: Data persists until explicitly deleted
No transactions: Concurrent writes use last-write-wins
Size limits: Individual values should be under 1MB

Getting Help

If you encounter issues:

Check the Troubleshooting Guide
Verify data with Administration commands
Open an issue on GitHub

Documentation Version: 1.1 Last Updated: 2026-02-11 Maintainer: Gofannon Development Team