llm_service: opt-in Bedrock prompt caching via cache_prefix

Add a cache_prefix parameter to call_llm. When provided for a Claude
model, the stable text (system prompt, ASVS requirement text, shared
inventory) is prepended to the first message as its own content block
marked cache_control={"type":"ephemeral"}, ahead of the volatile
content. litellm translates this OpenAI-format marker to Bedrock's native
cachePoint; cached input bills at ~10% on a hit.

- Marker carries NO ttl (Bedrock 400s on ttl; litellm #17250/#15880).
- Injection builds a NEW message list; caller's messages are not mutated.
  Handles str content, existing block-form content, and empty lists.
- Gated to Claude models via _supports_prompt_caching; the OpenAI
  'responses' path builds its own input from the original messages and is
  untouched. stream_llm is intentionally not wired (the cached analysis
  path uses call_llm).
- Sub-2048-token prefixes are silently uncached (no error), so pass only a
  genuinely large prefix.

Rebuilt against the current gofannon tree (post bedrock-mythos STS work).
Unit-tested: block placement, no-ttl marker, no caller mutation,
block-form preservation, support gate.
1 file changed
tree: 1fdddfa99e5842a462e02beb394736fde0f4ed53
  1. .github/
  2. docs/
  3. webapp/
  4. website/
  5. .asf.yaml
  6. .dev-auth.yaml
  7. .gitignore
  8. CODE_OF_CONDUCT.md
  9. CONTRIBUTING.md
  10. dev-tail.sh
  11. LICENSE
  12. Makefile
  13. README.md
  14. run-all-tests.sh
README.md

Gofannon

PyPI License Issues

Gofannon is a provider- and model-agnostic toolkit and web application for prototyping AI agents and the lightweight web UIs that wrap them. Subject matter experts compose tools, data sources, and decision paths through a guided interface, preview agent interactions in real time, and hand off working agent-driven experiences without committing to a single AI framework or model provider.

What you can do

  • Prototype agents quickly. Compose tools, data sources, and decision paths through a guided interface, and iterate with real-time feedback.
  • Design lightweight web UIs. Pair agents with forms, chat surfaces, and dashboards to validate user journeys; export or embed prototypes to share with stakeholders.
  • Stay flexible. Gofannon supports multiple model providers (OpenAI, Anthropic, Gemini, and others via LiteLLM) and is designed to keep your work portable across them.

Quickstart

git clone https://github.com/The-AI-Alliance/gofannon.git
cd gofannon/webapp/infra/docker
docker-compose up --build

See the quickstart guide for details, including required environment configuration.

Documentation

Full documentation lives in docs/ and is published at https://the-ai-alliance.github.io/gofannon/. Highlights:

About the name

Gofannon is the Welsh god of smithcraft. See About the name for the story behind the choice.

Roadmap

Planned features and their current status are tracked in ROADMAP.md.

Community

Contributing

Contributions are welcome. See CONTRIBUTING.md for how to get started, including the “good first issue” label for newcomers and contribution guides for adding tools, integrating new agentic frameworks, and extending the web UI.

Acknowledgments

Thanks to the open-source community for contributions and support that have made this project possible.

Contributors

License

Gofannon is licensed under the Apache License, Version 2.0. See LICENSE for the full text.