Stay organized with collections

Save and categorize content based on your preferences.


The Gemini API context caching feature is designed to reduce the cost of
requests that contain repeat content with high input token counts.

When to use context caching

Context caching is particularly well suited to scenarios where a substantial
initial context is referenced repeatedly by shorter requests. Consider using
context caching for use cases such as:

  • Chatbots with extensive system instructions
  • Repetitive analysis of lengthy video files
  • Recurring queries against large document sets
  • Frequent code repository analysis or bug fixing

Cost-efficiency through caching

Context caching is a paid feature designed to reduce overall operational costs.
Billing is based on the following factors:

  1. Cache token count: The number of input tokens cached, billed at a
    reduced rate when included in subsequent prompts.
  2. Storage duration: The amount of time cached tokens are stored, billed
    hourly.
  3. Other factors: Other charges apply, such as for non-cached input tokens
    and output tokens.

For the most up-to-date pricing details, refer to the Gemini API
pricing page.

Get started with context caching soon

We’ll be launching context caching soon, along with technical documentation and
SDK support.


Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2024-05-13 UTC.