vllm.entrypoints.serve.cache.api_router ¶
attach_router ¶
engine_client ¶
engine_client(request: Request) -> EngineClient
reset_mm_cache async ¶
Reset the multi-modal cache. Note that we currently do not check if the multi-modal cache is successfully reset in the API server.
Source code in vllm/entrypoints/serve/cache/api_router.py
reset_prefix_cache async ¶
reset_prefix_cache(
raw_request: Request,
reset_running_requests: bool = Query(default=False),
reset_external: bool = Query(default=False),
)
Reset the local prefix cache.
Optionally, if the query parameter reset_external=true also resets the external (connector-managed) prefix cache.
Note that we currently do not check if the prefix cache is successfully reset in the API server.
Example
POST /reset_prefix_cache?reset_external=true