Pag-save sa Prompt Base sa Provider

Ang Superdav AI Agent v1.12.0 nagpaila og provider-aware prompt caching, nga makapahimo og pag-optimize sa gasto sa API ug latency pinaagi sa pag-cache sa mga prompts gikan sa lainlaing LLM providers. Ang matag provider adunay lahi kaayo nga mekanismo ug configuration sa caching.

Overview

Ang prompt caching nagtugot kanimo sa:

Pag-save og dagko, kasagarang gigamit nga mga prompts
Pagpamenos sa gasto sa API pinaagi sa paglikay sa sobra o duha ka beses nga pagproseso
Pagpaayo sa latency para sa mga request nga na-cache
Pagdumala sa cache lifecycle direkta

Lahi ang paagi sa pagpatuman og caching sa lainlaing providers:

Google Gemini: cachedContents API
Azure OpenAI: Prompt caching uban ang TTL (Time To Live)
OpenRouter: Caching nga espesipiko sa provider
Vertex Anthropic: Prompt caching uban ang cache control

Google Gemini: cachedContents API

Naghatag ang Google Gemini og klaro nga pagdumala sa cache pinaagi sa cachedContents API.

Configuration

$config = [
    'provider' => 'google-gemini',
    'model' => 'gemini-2.0-flash',
    'caching' => [
        'enabled' => true,
        'ttl' => 3600, // 1 hour in seconds
        'max_tokens' => 1000000, // Max tokens to cache
    ],
];

Paghimo og Na-cache nga Prompt

use Superdav\AI\Providers\GoogleGemini;

$gemini = new GoogleGemini( $config );

$cached_content = $gemini->create_cached_content(
    [
        'system_prompt' => 'You are a helpful assistant...',
        'context' => 'Large context document...',
        'ttl' => 3600,
    ]
);

// Returns: ['cache_id' => 'abc123', 'expires_at' => timestamp]

Paggamit sa Na-cache nga Prompt

$response = $gemini->generate(
    [
        'cache_id' => 'abc123',
        'prompt' => 'User question here',
    ]
);

Cache Lifecycle

// List cached contents
$caches = $gemini->list_cached_contents();

// Get cache details
$cache = $gemini->get_cached_content( 'abc123' );

// Extend cache TTL
$gemini->update_cached_content(
    'abc123',
    ['ttl' => 7200] // Extend to 2 hours
);

// Delete cache
$gemini->delete_cached_content( 'abc123' );

Best Practices para sa Gemini

I-set ang saktong TTL: Balanseha ang pagtipig og gasto batok sa ka-karaan (staleness) sa cache.
I-cache ang system prompts: Gamita pag-usab ang parehas nga system prompt sa tanang requests.
Monitor ang paggamit sa cache: Bantayi kung unsang mga cache ang kasagaran gigamit.
Limpyohan ang expired caches: Regular kining i-delete ang mga cache nga wala na magamit.

Azure OpenAI: Prompt Caching

Gisuportahan sa Azure OpenAI ang prompt caching uban ang automatic TTL management.

Configuration

$config = [
    'provider' => 'azure-openai',
    'model' => 'gpt-4-turbo',
    'api_version' => '2024-08-01-preview',
    'caching' => [
        'enabled' => true,
        'cache_control' => 'max_age=3600',
    ],
];

Pagpahaktol sa Caching

use Superdav\AI\Providers\AzureOpenAI;

$azure = new AzureOpenAI( $config );

$response = $azure->generate(
    [
        'system_prompt' => 'You are a helpful assistant...',
        'context' => 'Large context document...',
        'prompt' => 'User question here',
        'cache_control' => 'max_age=3600',
    ]
);

// Response includes cache usage:
// [
//     'content' => '...',
//     'cache_creation_input_tokens' => 1000,
//     'cache_read_input_tokens' => 500,
// ]

Cache Headers

Naggamit ang Azure OpenAI og HTTP headers para sa cache control:

Cache-Control: max_age=3600

Mga suportado nga values:

max_age=<seconds>: I-cache sulod sa gitakda nga panahon.
no_cache: Ayaw i-cache kining request.
no_store: Ayaw i-cache ug ayaw usab kini gamiton pag-usab.

Pagmonitor sa Cache Usage

$response = $azure->generate( [...] );

$cache_tokens = $response['cache_creation_input_tokens'] ?? 0;
$cache_hits = $response['cache_read_input_tokens'] ?? 0;

echo "Cache creation: $cache_tokens tokens\n";
echo "Cache hits: $cache_hits tokens\n";

Best Practices para sa Azure OpenAI

Gamit og consistent nga prompts: Ang parehas nga prompts makabenepisyo gikan sa caching.
I-set ang rasonable nga TTL: Balanseha ang gasto batok sa pagka-fresh (freshness).
Monitor ang cache metrics: Bantayi ang paghimo ug paggamit sa cache.
Batch og susama nga requests: Igrupo ang mga request aron ma-maximize ang cache hits.

OpenRouter: Provider-Specific Caching

Gisuportahan sa OpenRouter ang caching pinaagi sa underlying providers (OpenAI, Anthropic, etc.).

Configuration

$config = [
    'provider' => 'openrouter',
    'model' => 'openai/gpt-4-turbo',
    'caching' => [
        'enabled' => true,
        'provider_cache' => 'openai', // Use OpenAI's caching
    ],
];

Paggamit sa OpenRouter Caching

use Superdav\AI\Providers\OpenRouter;

$router = new OpenRouter( $config );

$response = $router->generate(
    [
        'system_prompt' => 'You are a helpful assistant...',
        'context' => 'Large context document...',
        'prompt' => 'User question here',
        'cache_control' => 'max_age=3600',
    ]
);

Provider-Specific Options

Lahi ang mekanismo sa caching sa lainlaing providers:

// OpenAI-compatible caching
$response = $router->generate(
    [
        'model' => 'openai/gpt-4-turbo',
        'cache_control' => 'max_age=3600',
    ]
);

// Anthropic-compatible caching
$response = $router->generate(
    [
        'model' => 'anthropic/claude-3-opus',
        'cache_control' => [
            'type' => 'ephemeral',
            'max_tokens' => 1000000,
        ],
    ]
);

Best Practices para sa OpenRouter

Kahibalo sa caching sa provider: Lahi ang mekanismo sa matag provider.
Test sa behavior sa caching: Siguroha nga naglihok og maayo ang caching uban sa imong napili nga provider.
Monitor ang gasto: Bantayi ang pagtipig gikan sa caching.
Gamit og consistent models: Ang pagbalhin sa modelo makapabagsak sa cache hits.

Vertex Anthropic: Prompt Caching with Cache Control

Gisuportahan sa Vertex Anthropic (Google Cloud) ang prompt caching uban ang klaro nga cache control.

Configuration

$config = [
    'provider' => 'vertex-anthropic',
    'model' => 'claude-3-opus',
    'project_id' => 'your-gcp-project',
    'region' => 'us-central1',
    'caching' => [
        'enabled' => true,
        'cache_control' => [
            'type' => 'ephemeral',
            'max_tokens' => 1000000,
        ],
    ],
];

Paggamit sa Vertex Anthropic Caching

use Superdav\AI\Providers\VertexAnthropic;

$vertex = new VertexAnthropic( $config );

$response = $vertex->generate(
    [
        'system_prompt' => 'You are a helpful assistant...',
        'context' => 'Large context document...',
        'prompt' => 'User question here',
        'cache_control' => [
            'type' => 'ephemeral',
            'max_tokens' => 1000000,
        ],
    ]
);

// Response includes cache metrics:
// [
//     'content' => '...',
//     'usage' => [
//         'input_tokens' => 1000,
//         'cache_creation_input_tokens' => 500,
//         'cache_read_input_tokens' => 300,
//     ],
// ]

Cache Control Types

ephemeral: I-cache sulod sa panahon sa request (default).
persistent: I-cache sa daghang requests (kon gisuportahan).

Pagmonitor sa Cache Usage

$response = $vertex->generate( [...] );

$usage = $response['usage'];
$cache_created = $usage['cache_creation_input_tokens'] ?? 0;
$cache_read = $usage['cache_read_input_tokens'] ?? 0;

echo "Cache created: $cache_created tokens\n";
echo "Cache read: $cache_read tokens\n";

Best Practices para sa Vertex Anthropic

Gamit og ephemeral caching: Maayo kini alang sa single-session caching.
I-set ang max_tokens nga saktong paagi: Balanseha ang gidak-on sa cache batok sa gasto.
Monitor ang cache metrics: Bantayi ang kaepektibo sa cache.
Test uban sa imong workload: Siguroha nga makabenepisyo ang caching sa imong kasamtangang gamit.

Cross-Provider Caching Strategy

Unified Configuration

$config = [
    'caching' => [
        'enabled' => true,
        'default_ttl' => 3600,
        'providers' => [
            'google-gemini' => [
                'ttl' => 3600,
                'max_tokens' => 1000000,
            ],
            'azure-openai' => [
                'cache_control' => 'max_age=3600',
            ],
            'vertex-anthropic' => [
                'cache_control' => [
                    'type' => 'ephemeral',
                    'max_tokens' => 1000000,
                ],
            ],
        ],
    ],
];

Provider Detection

$provider = $config['provider'];

$cache_config = $config['caching']['providers'][ $provider ]
    ?? $config['caching'];

// Use provider-specific caching configuration

Fallback Strategy

try {
    // Try caching with primary provider
    $response = $primary_provider->generate( $request );
} catch ( CacheException $e ) {
    // Fall back to non-cached request
    $response = $primary_provider->generate(
        array_merge( $request, ['cache_control' => 'no_cache'] )
    );
}

Cost Optimization

Calculate Savings

$cache_created_tokens = $response['cache_creation_input_tokens'] ?? 0;
$cache_read_tokens = $response['cache_read_input_tokens'] ?? 0;
$regular_tokens = $response['input_tokens'] ?? 0;

// Typical pricing (varies by provider):
$cache_creation_cost = $cache_created_tokens * 0.00001; // 10x cheaper
$cache_read_cost = $cache_read_tokens * 0.000001; // 100x cheaper
$regular_cost = $regular_tokens * 0.00001;

$total_cost = $cache_creation_cost + $cache_read_cost + $regular_cost;
$savings = ($regular_tokens * 0.00001) - $total_cost;

echo "Estimated savings: \$$savings\n";

Optimization Tips

I-cache ang dagkong system prompts: Kini ang pinakadako nga makatipig og gasto.
Gamit pag-usab sa context: I-cache ang mga context document nga kasagaran gigamit.
Batch og requests: Igrupo ang susama nga requests aron ma-maximize ang cache hits.
Monitor ang kaepektibo sa cache: Bantayi ang aktwal nga makatipig nga kwarta.
I-adjust ang TTL: Balanseha ang gasto batok sa pagka-freshness.

Troubleshooting

Cache not being used (Wala gigamit ang Cache)

Siguroha nga naka-enable ang caching sa configuration.
Susihon nga parehas gyud ang mga prompts (kinahanglan og exact match ang caching).
I-verify nga wala pa ka-expire ang cache.
Susihon ang limitasyon sa cache nga espesipiko sa provider.

Cache creation failing (Nawala ang paghimo sa Cache)

Siguroha nga anaa pa sa provider limits ang gidak-on sa cache.
Susihon nga husto ang syntax sa cache control.
Siguroha nga gisuportahan sa provider ang caching para sa imong model.
Basaha ang dokumentasyon sa provider bahin sa mga limitasyon.

Unexpected costs (Dili ma-expect nga gasto)

Monitor ang paghimo ug pagbasa sa cache tokens.
I-verify nga gigamit gyud ang cache.
Susihon kung aduna bay cache misses tungod sa kalainan sa prompts.
Konsideraha ang pag-adjust sa TTL o sa caching strategy.

Provider Comparison

Feature	Gemini	Azure OpenAI	OpenRouter	Vertex Anthropic
Cache API	cachedContents	HTTP headers	Provider-specific	Cache control
TTL control	Explicit	Via headers	Provider-dependent	Ephemeral/persistent
Max cache size	1M tokens	Provider-dependent	Provider-dependent	1M tokens
Cost reduction	90%	90%	Provider-dependent	90%
Monitoring	Detailed	Via metrics	Provider-dependent	Via usage

Next Steps

Pilia ang imong provider: Pagpili base sa imong panginahanglan.
I-configure ang caching: I-set up ang cache nga espesipiko sa provider.
Test sa caching: Siguroha nga naglihok kini uban sa imong mga prompts.
Monitor ang paggamit: Bantayi ang cache hits ug cost savings.
Optimize: I-adjust ang TTL ug cache strategy base sa resulta.

Overview​

Google Gemini: cachedContents API​

Configuration​

Paghimo og Na-cache nga Prompt​

Paggamit sa Na-cache nga Prompt​

Cache Lifecycle​

Best Practices para sa Gemini​

Azure OpenAI: Prompt Caching​

Configuration​

Pagpahaktol sa Caching​

Cache Headers​

Pagmonitor sa Cache Usage​

Best Practices para sa Azure OpenAI​

OpenRouter: Provider-Specific Caching​

Configuration​

Paggamit sa OpenRouter Caching​

Provider-Specific Options​

Best Practices para sa OpenRouter​

Vertex Anthropic: Prompt Caching with Cache Control​

Configuration​

Paggamit sa Vertex Anthropic Caching​

Cache Control Types​

Pagmonitor sa Cache Usage​

Best Practices para sa Vertex Anthropic​

Cross-Provider Caching Strategy​

Unified Configuration​

Provider Detection​

Fallback Strategy​

Cost Optimization​

Calculate Savings​

Optimization Tips​

Troubleshooting​

Cache not being used (Wala gigamit ang Cache)​

Cache creation failing (Nawala ang paghimo sa Cache)​

Unexpected costs (Dili ma-expect nga gasto)​

Provider Comparison​

Next Steps​

Overview

Google Gemini: cachedContents API

Configuration

Paghimo og Na-cache nga Prompt

Paggamit sa Na-cache nga Prompt

Cache Lifecycle

Best Practices para sa Gemini

Azure OpenAI: Prompt Caching

Configuration

Pagpahaktol sa Caching

Cache Headers

Pagmonitor sa Cache Usage

Best Practices para sa Azure OpenAI

OpenRouter: Provider-Specific Caching

Configuration

Paggamit sa OpenRouter Caching

Provider-Specific Options

Best Practices para sa OpenRouter

Vertex Anthropic: Prompt Caching with Cache Control

Configuration

Paggamit sa Vertex Anthropic Caching

Cache Control Types

Pagmonitor sa Cache Usage

Best Practices para sa Vertex Anthropic

Cross-Provider Caching Strategy

Unified Configuration

Provider Detection

Fallback Strategy

Cost Optimization

Calculate Savings

Optimization Tips

Troubleshooting

Cache not being used (Wala gigamit ang Cache)

Cache creation failing (Nawala ang paghimo sa Cache)

Unexpected costs (Dili ma-expect nga gasto)

Provider Comparison

Next Steps