How to read vLLM logs
There is a reason why certain things are logged.
This is obtained from https://www.kaggle.com/code/huikang/streaming-inference?scriptVersionId=282411196
INFO 11-28 11:45:51 [scheduler.py:216] Chunked prefill is enabled with max_num_batched_tokens=2048.
(APIServer pid=90) INFO 11-28 11:45:51 [api_server.py:1977] vLLM API server version 0.11.2
(APIServer pid=90) INFO 11-28 11:45:51 [utils.py:253] non-default args: {'host': '0.0.0.0', 'model': '/kaggle/input/gpt-oss-120b/transformers/default/1', 'max_model_len': 98304, 'served_model_name': ['vllm-model'], 'gpu_memory_utilization': 0.96, 'max_num_seqs': 6}
(APIServer pid=90) INFO 11-28 11:46:35 [model.py:631] Resolved architecture: GptOssForCausalLM
(APIServer pid=90) ERROR 11-28 11:46:35 [config.py:307] Error retrieving safetensors: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/kaggle/input/gpt-oss-120b/transformers/default/1'. Use `repo_type` argument if needed., retrying 1 of 2
(APIServer pid=90) ERROR 11-28 11:46:37 [config.py:305] Error retrieving safetensors: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/kaggle/input/gpt-oss-120b/transformers/default/1'. Use `repo_type` argument if needed.
(APIServer pid=90) INFO 11-28 11:46:37 [model.py:1968] Downcasting torch.float32 to torch.bfloat16.
(APIServer pid=90) INFO 11-28 11:46:37 [model.py:1745] Using max model len 98304
(APIServer pid=90) INFO 11-28 11:46:45 [scheduler.py:216] Chunked prefill is enabled with max_num_batched_tokens=8192.
(APIServer pid=90) INFO 11-28 11:46:45 [config.py:272] Overriding max cuda graph capture size to 1024 for performance.
(EngineCore_DP0 pid=289) INFO 11-28 11:47:23 [core.py:93] Initializing a V1 LLM engine (v0.11.2) with config: model='/kaggle/input/gpt-oss-120b/transformers/default/1', speculative_config=None, tokenizer='/kaggle/input/gpt-oss-120b/transformers/default/1', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=98304, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=mxfp4, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='openai_gptoss', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=vllm-model, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer'], 'compile_mm_encoder': False, 'use_inductor': None, 'compile_sizes': [], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'combo_kernels': True, 'benchmark_combo_kernel': True}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512, 528, 544, 560, 576, 592, 608, 624, 640, 656, 672, 688, 704, 720, 736, 752, 768, 784, 800, 816, 832, 848, 864, 880, 896, 912, 928, 944, 960, 976, 992, 1008, 1024], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {}, 'max_cudagraph_capture_size': 1024, 'local_cache_dir': None}
(EngineCore_DP0 pid=289) INFO 11-28 11:47:31 [parallel_state.py:1208] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://172.19.2.2:54751 backend=nccl
[W1128 11:47:31.402851082 socket.cpp:209] [c10d] The hostname of the client socket cannot be retrieved. err=-3
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=289) INFO 11-28 11:47:31 [parallel_state.py:1394] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=289) INFO 11-28 11:47:31 [gpu_model_runner.py:3259] Starting to load model /kaggle/input/gpt-oss-120b/transformers/default/1...
(EngineCore_DP0 pid=289) WARNING 11-28 11:47:32 [mxfp4.py:196] MXFP4 linear layer is not implemented - falling back to UnquantizedLinearMethod.
(EngineCore_DP0 pid=289) WARNING 11-28 11:47:32 [mxfp4.py:208] MXFP4 attention layer is not implemented. Skipping quantization for this layer.
(EngineCore_DP0 pid=289) INFO 11-28 11:47:32 [cuda.py:377] Using AttentionBackendEnum.TRITON_ATTN backend.
(EngineCore_DP0 pid=289) INFO 11-28 11:47:32 [layer.py:342] Enabled separate cuda stream for MoE shared_experts
(EngineCore_DP0 pid=289) INFO 11-28 11:47:32 [mxfp4.py:141] Using Marlin backend
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 0% Completed | 0/15 [00:00<?, ?it/s]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 7% Completed | 1/15 [00:33<07:54, 33.88s/it]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 13% Completed | 2/15 [01:06<07:12, 33.25s/it]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 20% Completed | 3/15 [01:46<07:14, 36.17s/it]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 27% Completed | 4/15 [02:24<06:46, 37.00s/it]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 33% Completed | 5/15 [02:56<05:52, 35.23s/it]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 40% Completed | 6/15 [03:33<05:22, 35.88s/it]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 47% Completed | 7/15 [04:08<04:43, 35.41s/it]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 53% Completed | 8/15 [04:47<04:16, 36.62s/it]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 60% Completed | 9/15 [05:22<03:36, 36.03s/it]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 67% Completed | 10/15 [06:03<03:08, 37.62s/it]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 73% Completed | 11/15 [06:36<02:25, 36.26s/it]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 80% Completed | 12/15 [07:15<01:51, 37.10s/it]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 87% Completed | 13/15 [07:47<01:11, 35.56s/it]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 93% Completed | 14/15 [08:26<00:36, 36.54s/it]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 100% Completed | 15/15 [09:05<00:00, 37.20s/it]
(EngineCore_DP0 pid=289)
Loading safetensors checkpoint shards: 100% Completed | 15/15 [09:05<00:00, 36.34s/it]
(EngineCore_DP0 pid=289)
(EngineCore_DP0 pid=289) INFO 11-28 11:57:30 [default_loader.py:314] Loading weights took 545.24 seconds
(EngineCore_DP0 pid=289) WARNING 11-28 11:57:30 [marlin_utils_fp4.py:204] Your GPU does not have native support for FP4 computation but FP4 quantization is being used. Weight-only FP4 compression will be used leveraging the Marlin kernel. This may degrade performance for compute-heavy workloads.
(EngineCore_DP0 pid=289) INFO 11-28 11:57:33 [gpu_model_runner.py:3338] Model loading took 65.9651 GiB memory and 600.408480 seconds
(EngineCore_DP0 pid=289) INFO 11-28 11:57:52 [backends.py:631] Using cache directory: /root/.cache/vllm/torch_compile_cache/7fcbe477d2/rank_0_0/backbone for vLLM's torch.compile
(EngineCore_DP0 pid=289) INFO 11-28 11:57:52 [backends.py:647] Dynamo bytecode transform time: 19.07 s
(EngineCore_DP0 pid=289) INFO 11-28 11:57:58 [backends.py:251] Cache the graph for dynamic shape for later use
(EngineCore_DP0 pid=289) INFO 11-28 11:58:38 [backends.py:282] Compiling a graph for dynamic shape takes 44.90 s
(EngineCore_DP0 pid=289) INFO 11-28 11:58:39 [monitor.py:34] torch.compile takes 63.97 s in total
(EngineCore_DP0 pid=289) INFO 11-28 11:58:41 [gpu_worker.py:359] Available KV cache memory: 8.99 GiB
(EngineCore_DP0 pid=289) INFO 11-28 11:58:41 [kv_cache_utils.py:1229] GPU KV cache size: 130,960 tokens
(EngineCore_DP0 pid=289) INFO 11-28 11:58:41 [kv_cache_utils.py:1234] Maximum concurrency for 98,304 tokens per request: 2.46x
(EngineCore_DP0 pid=289) 2025-11-28 11:58:41,951 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ...
(EngineCore_DP0 pid=289) 2025-11-28 11:58:41,973 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends
(EngineCore_DP0 pid=289)
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 0%| | 0/83 [00:00<?, ?it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 2%|â– | 2/83 [00:00<00:07, 11.39it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 5%|â– | 4/83 [00:00<00:06, 12.01it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 7%|â–‹ | 6/83 [00:00<00:06, 12.02it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 10%|â–‰ | 8/83 [00:00<00:06, 11.97it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 12%|█■| 10/83 [00:00<00:06, 11.85it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 14%|█■| 12/83 [00:01<00:05, 11.94it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 17%|█▋ | 14/83 [00:01<00:05, 11.94it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 19%|█▉ | 16/83 [00:01<00:05, 12.13it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 22%|██■| 18/83 [00:01<00:05, 12.19it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 24%|██■| 20/83 [00:01<00:05, 12.35it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 27%|██▋ | 22/83 [00:01<00:04, 12.59it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 29%|██▉ | 24/83 [00:01<00:04, 12.72it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 31%|███■| 26/83 [00:02<00:04, 12.87it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 34%|███▎ | 28/83 [00:02<00:04, 13.10it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 36%|███▌ | 30/83 [00:02<00:03, 13.27it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 39%|███▊ | 32/83 [00:02<00:04, 11.90it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 41%|████ | 34/83 [00:02<00:03, 12.53it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 43%|████▎ | 36/83 [00:02<00:03, 13.05it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 46%|████▌ | 38/83 [00:03<00:03, 13.24it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 48%|████▊ | 40/83 [00:03<00:03, 13.45it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 51%|█████ | 42/83 [00:03<00:03, 13.56it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 53%|█████▎ | 44/83 [00:03<00:02, 13.84it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 55%|█████▌ | 46/83 [00:03<00:02, 14.04it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 58%|█████▊ | 48/83 [00:03<00:02, 14.29it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 60%|██████ | 50/83 [00:03<00:02, 14.51it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 63%|██████▎ | 52/83 [00:03<00:02, 14.76it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 65%|██████▌ | 54/83 [00:04<00:02, 14.38it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 67%|██████▋ | 56/83 [00:04<00:01, 14.10it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 70%|██████▉ | 58/83 [00:04<00:01, 14.20it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 72%|███████■| 60/83 [00:04<00:01, 14.29it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 75%|███████■| 62/83 [00:04<00:01, 14.39it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 77%|███████▋ | 64/83 [00:04<00:01, 14.62it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 80%|███████▉ | 66/83 [00:04<00:01, 14.61it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 82%|████████■| 68/83 [00:05<00:01, 14.52it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 84%|████████■| 70/83 [00:05<00:00, 14.62it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 87%|████████▋ | 72/83 [00:05<00:00, 14.69it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 89%|████████▉ | 74/83 [00:05<00:00, 14.78it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 92%|█████████â–| 76/83 [00:05<00:00, 14.85it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 94%|█████████â–| 78/83 [00:05<00:00, 14.89it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 96%|█████████▋| 80/83 [00:05<00:00, 15.01it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 99%|█████████▉| 82/83 [00:06<00:00, 15.43it/s]
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|██████████| 83/83 [00:06<00:00, 13.62it/s]
(EngineCore_DP0 pid=289)
Capturing CUDA graphs (decode, FULL): 0%| | 0/3 [00:00<?, ?it/s]
Capturing CUDA graphs (decode, FULL): 33%|███▎ | 1/3 [00:01<00:02, 1.44s/it]
Capturing CUDA graphs (decode, FULL): 100%|██████████| 3/3 [00:02<00:00, 1.33it/s]
Capturing CUDA graphs (decode, FULL): 100%|██████████| 3/3 [00:02<00:00, 1.22it/s]
(EngineCore_DP0 pid=289) INFO 11-28 11:58:51 [gpu_model_runner.py:4244] Graph capturing finished in 9 secs, took 0.64 GiB
(EngineCore_DP0 pid=289) INFO 11-28 11:58:51 [core.py:250] init engine (profile, create kv cache, warmup model) took 78.26 seconds
(APIServer pid=90) INFO 11-28 11:58:53 [api_server.py:1725] Supported tasks: ['generate']
(APIServer pid=90) WARNING 11-28 11:58:53 [serving_responses.py:175] For gpt-oss, we ignore --enable-auto-tool-choice and always enable tool use.
(APIServer pid=90) INFO 11-28 11:58:53 [api_server.py:2052] Starting vLLM API server 0 on http://0.0.0.0:8000
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:38] Available routes are:
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /openapi.json, Methods: GET, HEAD
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /docs, Methods: GET, HEAD
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /docs/oauth2-redirect, Methods: GET, HEAD
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /redoc, Methods: GET, HEAD
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /health, Methods: GET
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /load, Methods: GET
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /tokenize, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /detokenize, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /v1/models, Methods: GET
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /version, Methods: GET
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /v1/responses, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /v1/responses/{response_id}, Methods: GET
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /v1/responses/{response_id}/cancel, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /v1/messages, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /v1/chat/completions, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /v1/completions, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /v1/embeddings, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /pooling, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /classify, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /score, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /v1/score, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /v1/audio/transcriptions, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /v1/audio/translations, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /rerank, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /v1/rerank, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /v2/rerank, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /scale_elastic_ep, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /is_scaling_elastic_ep, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /inference/v1/generate, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /ping, Methods: GET
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /ping, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /invocations, Methods: POST
(APIServer pid=90) INFO 11-28 11:58:53 [launcher.py:46] Route: /metrics, Methods: GET
(APIServer pid=90) INFO: Started server process [90]
(APIServer pid=90) INFO: Waiting for application startup.
(APIServer pid=90) INFO: Application startup complete.
(APIServer pid=90) INFO: 127.0.0.1:55440 - "GET /v1/models HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:55440 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:55446 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:55456 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:55464 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:55468 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:55484 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 11:59:04 [loggers.py:236] Engine 000: Avg prompt throughput: 250.9 tokens/s, Avg generation throughput: 395.5 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.3%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 11:59:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 544.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 4.3%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 11:59:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 536.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 6.4%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 11:59:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 526.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 8.4%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 11:59:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 514.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 10.4%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 11:59:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 504.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 12.3%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:00:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 495.5 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 14.2%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:00:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 486.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 16.0%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:00:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 480.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 17.9%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:00:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 474.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 19.7%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:00:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 468.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 21.5%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:00:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 457.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 23.2%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:01:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 453.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 25.0%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:01:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 447.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 26.6%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:01:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 435.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 28.3%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:01:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 429.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 29.9%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:01:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 426.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 31.6%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:01:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 424.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 33.2%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:02:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 417.5 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 34.8%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:02:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 410.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 36.4%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:02:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 405.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 37.9%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:02:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 403.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 39.4%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:02:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 396.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 41.0%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:02:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 393.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 42.5%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:03:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 385.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 43.9%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:03:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 381.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 45.4%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:03:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 382.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 46.8%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:03:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 376.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 48.3%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:03:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 372.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 49.7%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:03:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 367.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 51.1%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:04:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 365.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 52.5%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:04:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 360.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 53.9%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:04:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 358.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 55.3%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:04:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 347.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 56.6%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:04:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 352.1 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 57.9%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:04:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 348.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 59.3%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO: 127.0.0.1:44880 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:44840 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:44882 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:44850 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:44866 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:44892 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 12:05:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 331.8 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 81.6%
(APIServer pid=90) INFO 11-28 12:05:14 [loggers.py:236] Engine 000: Avg prompt throughput: 8011.6 tokens/s, Avg generation throughput: 1.8 tokens/s, Running: 4 reqs, Waiting: 2 reqs, GPU KV cache usage: 41.1%, Prefix cache hit rate: 3.4%
(APIServer pid=90) INFO 11-28 12:05:24 [loggers.py:236] Engine 000: Avg prompt throughput: 8011.4 tokens/s, Avg generation throughput: 114.7 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 61.3%, Prefix cache hit rate: 2.9%
(APIServer pid=90) INFO 11-28 12:05:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 337.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 62.5%, Prefix cache hit rate: 2.9%
(APIServer pid=90) INFO 11-28 12:05:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 334.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 63.8%, Prefix cache hit rate: 2.9%
(APIServer pid=90) INFO 11-28 12:05:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 333.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 65.1%, Prefix cache hit rate: 2.9%
(APIServer pid=90) INFO: 127.0.0.1:42728 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:42716 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:42742 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:42750 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:42758 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:42764 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 12:06:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 317.8 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 11.2%, Prefix cache hit rate: 2.9%
(APIServer pid=90) INFO 11-28 12:06:14 [loggers.py:236] Engine 000: Avg prompt throughput: 17485.7 tokens/s, Avg generation throughput: 253.1 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 67.2%, Prefix cache hit rate: 48.8%
(APIServer pid=90) INFO 11-28 12:06:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 325.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 68.4%, Prefix cache hit rate: 48.8%
(APIServer pid=90) INFO: 127.0.0.1:52524 - "GET /v1/models HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:52524 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:52526 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:52532 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:52546 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:52556 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:52562 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 12:06:34 [loggers.py:236] Engine 000: Avg prompt throughput: 126.0 tokens/s, Avg generation throughput: 479.3 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.3%, Prefix cache hit rate: 48.9%
(APIServer pid=90) INFO 11-28 12:06:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 543.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 4.3%, Prefix cache hit rate: 48.9%
(APIServer pid=90) INFO: 127.0.0.1:37306 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 12:06:54 [loggers.py:236] Engine 000: Avg prompt throughput: 262.3 tokens/s, Avg generation throughput: 523.1 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 6.3%, Prefix cache hit rate: 48.6%
(APIServer pid=90) INFO: 127.0.0.1:37310 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:56008 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 12:07:04 [loggers.py:236] Engine 000: Avg prompt throughput: 680.4 tokens/s, Avg generation throughput: 496.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 8.2%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO: 127.0.0.1:56012 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 12:07:14 [loggers.py:236] Engine 000: Avg prompt throughput: 381.2 tokens/s, Avg generation throughput: 487.6 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 47.3%
(APIServer pid=90) INFO: 127.0.0.1:38682 - "GET /v1/models HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:38682 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:38690 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:38694 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:38708 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:38720 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:38728 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 12:07:24 [loggers.py:236] Engine 000: Avg prompt throughput: 142.8 tokens/s, Avg generation throughput: 505.3 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.4%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:07:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 549.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 4.5%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:07:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 532.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 6.5%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:07:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 522.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 8.5%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:08:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 510.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 10.5%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:08:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 500.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 12.4%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:08:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 493.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 14.3%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:08:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 481.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 16.1%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:08:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 472.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 17.9%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:08:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 465.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 19.7%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:09:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 457.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 21.4%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:09:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 451.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 23.1%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:09:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 445.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 24.9%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:09:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 436.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 26.5%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:09:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 431.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 28.2%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:09:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 427.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 29.8%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:10:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 421.1 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 31.4%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:10:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 417.5 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 33.0%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:10:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 411.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 34.6%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:10:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 406.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 36.1%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:10:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 403.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 37.7%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:10:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 397.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 39.2%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO 11-28 12:11:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 394.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 40.7%, Prefix cache hit rate: 47.5%
(APIServer pid=90) INFO: 127.0.0.1:58892 - "GET /v1/models HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:58892 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:58902 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:58906 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:58922 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:58928 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:58944 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 12:11:14 [loggers.py:236] Engine 000: Avg prompt throughput: 190.2 tokens/s, Avg generation throughput: 406.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.2%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:11:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 557.3 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 3.3%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:11:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 547.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 5.4%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:11:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 534.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 7.5%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:11:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 520.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 9.5%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO: 127.0.0.1:45574 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 12:12:04 [loggers.py:236] Engine 000: Avg prompt throughput: 494.7 tokens/s, Avg generation throughput: 495.5 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 11.4%, Prefix cache hit rate: 47.1%
(APIServer pid=90) INFO 11-28 12:12:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 500.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 13.3%, Prefix cache hit rate: 47.1%
(APIServer pid=90) INFO 11-28 12:12:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 493.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 15.1%, Prefix cache hit rate: 47.1%
(APIServer pid=90) INFO 11-28 12:12:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 483.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 17.0%, Prefix cache hit rate: 47.1%
(APIServer pid=90) INFO 11-28 12:12:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 474.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 18.8%, Prefix cache hit rate: 47.1%
(APIServer pid=90) INFO: 127.0.0.1:54832 - "GET /v1/models HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:54832 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:54848 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:54858 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:54868 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:54882 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:54886 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 12:12:54 [loggers.py:236] Engine 000: Avg prompt throughput: 111.0 tokens/s, Avg generation throughput: 434.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.8%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:13:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 553.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.9%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:13:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 541.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 4.9%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:13:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 529.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 7.0%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:13:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 517.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 9.0%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:13:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 505.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 10.9%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:13:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 496.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 12.8%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:14:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 488.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 14.7%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:14:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 479.3 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 16.5%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:14:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 472.1 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 18.3%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:14:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 468.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 20.1%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:14:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 459.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 21.8%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:14:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 453.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 23.6%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:15:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 446.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 25.2%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:15:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 438.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 26.9%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:15:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 433.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 28.6%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:15:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 427.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 30.2%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO 11-28 12:15:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 420.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 31.8%, Prefix cache hit rate: 47.2%
(APIServer pid=90) INFO: 127.0.0.1:34364 - "GET /v1/models HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:34364 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:34366 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:34382 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:34398 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:34414 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:34426 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 12:15:54 [loggers.py:236] Engine 000: Avg prompt throughput: 201.0 tokens/s, Avg generation throughput: 398.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.9%, Prefix cache hit rate: 47.4%
(APIServer pid=90) INFO 11-28 12:16:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 555.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 3.0%, Prefix cache hit rate: 47.4%
(APIServer pid=90) INFO 11-28 12:16:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 541.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 5.1%, Prefix cache hit rate: 47.4%
(APIServer pid=90) INFO 11-28 12:16:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 529.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 7.1%, Prefix cache hit rate: 47.4%
(APIServer pid=90) INFO 11-28 12:16:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 519.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 9.1%, Prefix cache hit rate: 47.4%
(APIServer pid=90) INFO 11-28 12:16:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 506.9 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 11.0%, Prefix cache hit rate: 47.4%
(APIServer pid=90) INFO 11-28 12:16:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 497.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 12.9%, Prefix cache hit rate: 47.4%
(APIServer pid=90) INFO 11-28 12:17:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 487.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 14.7%, Prefix cache hit rate: 47.4%
(APIServer pid=90) INFO 11-28 12:17:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 483.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 16.6%, Prefix cache hit rate: 47.4%
(APIServer pid=90) INFO 11-28 12:17:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 472.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 18.4%, Prefix cache hit rate: 47.4%
(APIServer pid=90) INFO 11-28 12:17:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 463.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 20.2%, Prefix cache hit rate: 47.4%
(APIServer pid=90) INFO 11-28 12:17:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 458.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 21.9%, Prefix cache hit rate: 47.4%
(APIServer pid=90) INFO 11-28 12:17:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 451.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 23.7%, Prefix cache hit rate: 47.4%
(APIServer pid=90) INFO 11-28 12:18:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 445.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 25.3%, Prefix cache hit rate: 47.4%
(APIServer pid=90) INFO 11-28 12:18:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 439.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 27.0%, Prefix cache hit rate: 47.4%
(APIServer pid=90) INFO: 127.0.0.1:50154 - "GET /v1/models HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:50154 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:50168 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:50178 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:50192 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:50202 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:50214 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 12:18:24 [loggers.py:236] Engine 000: Avg prompt throughput: 202.2 tokens/s, Avg generation throughput: 432.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.2%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:18:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 557.3 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 3.3%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:18:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 540.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 5.4%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:18:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 528.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 7.4%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:19:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 520.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 9.4%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:19:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 502.1 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 11.3%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:19:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 492.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 13.2%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:19:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 491.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 15.0%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:19:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 481.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 16.9%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:19:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 472.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 18.7%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:20:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 467.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 20.4%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:20:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 459.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 22.2%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:20:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 450.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 24.0%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:20:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 444.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 25.6%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:20:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 436.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 27.3%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:20:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 432.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 29.0%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:21:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 426.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 30.6%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:21:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 423.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 32.2%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:21:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 415.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 33.8%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:21:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 411.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 35.4%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:21:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 406.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 36.9%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:21:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 400.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 38.5%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:22:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 394.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 40.0%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:22:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 391.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 41.5%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:22:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 388.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 42.9%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:22:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 382.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 44.4%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:22:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 377.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 45.8%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:22:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 374.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 47.3%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:23:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 372.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 48.7%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:23:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 368.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 50.1%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:23:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 365.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 51.5%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:23:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 363.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 52.9%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:23:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 359.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 54.3%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:23:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 355.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 55.6%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:24:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 355.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 57.0%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:24:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 349.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 58.3%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:24:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 346.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 59.6%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:24:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 343.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 60.9%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:24:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 339.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 62.2%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:24:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 338.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 63.5%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:25:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 332.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 64.8%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:25:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 330.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 66.1%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:25:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 327.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 67.3%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:25:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 325.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 68.6%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:25:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 324.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 69.8%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:25:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 321.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 71.0%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:26:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 319.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 72.2%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:26:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 316.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 73.4%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:26:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 314.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 74.6%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:26:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 312.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 75.8%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:26:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 306.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 77.0%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:26:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 309.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 78.1%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:27:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 308.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 79.4%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:27:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 303.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 80.5%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:27:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 291.5 tokens/s, Running: 5 reqs, Waiting: 0 reqs, GPU KV cache usage: 68.1%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:27:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 263.0 tokens/s, Running: 5 reqs, Waiting: 0 reqs, GPU KV cache usage: 69.1%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:27:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 261.5 tokens/s, Running: 5 reqs, Waiting: 0 reqs, GPU KV cache usage: 70.1%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:27:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 260.0 tokens/s, Running: 5 reqs, Waiting: 0 reqs, GPU KV cache usage: 71.1%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:28:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 258.5 tokens/s, Running: 5 reqs, Waiting: 0 reqs, GPU KV cache usage: 72.1%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:28:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 258.0 tokens/s, Running: 5 reqs, Waiting: 0 reqs, GPU KV cache usage: 73.1%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:28:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 256.5 tokens/s, Running: 5 reqs, Waiting: 0 reqs, GPU KV cache usage: 74.0%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:28:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 253.5 tokens/s, Running: 5 reqs, Waiting: 0 reqs, GPU KV cache usage: 75.0%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:28:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 252.5 tokens/s, Running: 5 reqs, Waiting: 0 reqs, GPU KV cache usage: 76.0%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO 11-28 12:28:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 250.0 tokens/s, Running: 5 reqs, Waiting: 0 reqs, GPU KV cache usage: 76.9%, Prefix cache hit rate: 47.7%
(APIServer pid=90) INFO: 127.0.0.1:47908 - "GET /v1/models HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:47908 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:47922 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:47928 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:47946 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:47938 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:47954 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 12:29:04 [loggers.py:236] Engine 000: Avg prompt throughput: 186.6 tokens/s, Avg generation throughput: 434.3 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.0%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:29:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 550.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 4.0%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:29:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 538.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 6.1%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:29:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 524.9 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 8.1%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:29:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 519.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 10.1%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:29:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 510.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 12.0%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:30:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 501.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 14.0%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:30:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 492.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 15.8%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:30:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 482.3 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 17.7%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:30:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 468.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 19.5%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:30:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 459.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 21.2%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:30:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 453.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 23.0%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:31:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 444.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 24.6%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:31:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 437.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 26.3%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:31:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 430.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 28.0%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:31:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 425.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 29.6%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:31:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 419.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 31.2%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:31:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 412.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 32.8%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:32:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 408.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 34.3%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:32:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 405.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 35.9%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:32:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 399.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 37.4%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:32:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 393.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 38.9%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:32:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 389.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 40.4%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:32:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 388.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 41.9%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:33:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 383.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 43.3%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:33:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 379.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 44.8%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:33:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 375.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 46.2%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:33:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 370.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 47.6%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:33:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 369.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 49.1%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:33:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 365.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 50.4%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO 11-28 12:34:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 361.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 51.8%, Prefix cache hit rate: 47.8%
(APIServer pid=90) INFO: 127.0.0.1:57058 - "GET /v1/models HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:57058 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:57074 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:57084 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:57078 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:57096 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:57094 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 12:34:14 [loggers.py:236] Engine 000: Avg prompt throughput: 287.4 tokens/s, Avg generation throughput: 382.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.2%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:34:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 558.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 3.4%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:34:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 547.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 5.5%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:34:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 537.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 7.5%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:34:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 522.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 9.5%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:35:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 510.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 11.4%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:35:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 501.5 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 13.4%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:35:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 490.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 15.2%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:35:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 479.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 17.1%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:35:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 470.3 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 18.9%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:35:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 460.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 20.6%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:36:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 452.9 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 22.4%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:36:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 448.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 24.1%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:36:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 438.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 25.7%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:36:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 435.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 27.4%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:36:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 426.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 29.0%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:36:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 421.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 30.6%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:37:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 415.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 32.2%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:37:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 408.5 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 33.8%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:37:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 405.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 35.3%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:37:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 399.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 36.9%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:37:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 394.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 38.4%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:37:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 389.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 39.8%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO 11-28 12:38:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 355.1 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 48.1%
(APIServer pid=90) INFO: 127.0.0.1:35994 - "GET /v1/models HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:35994 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:35998 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:36010 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:36020 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:36032 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO: 127.0.0.1:36048 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=90) INFO 11-28 12:38:14 [loggers.py:236] Engine 000: Avg prompt throughput: 154.8 tokens/s, Avg generation throughput: 553.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.6%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:38:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 547.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 4.6%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:38:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 535.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 6.7%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:38:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 523.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 8.7%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:38:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 511.7 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 10.7%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:39:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 504.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 12.6%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:39:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 495.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 14.5%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:39:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 484.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 16.3%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:39:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 476.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 18.1%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:39:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 465.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 19.9%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:39:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 458.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 21.6%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:40:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 452.9 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 23.3%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:40:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 447.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 25.1%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:40:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 442.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 26.8%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:40:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 432.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 28.4%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:40:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 427.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 30.1%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:40:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 421.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 31.7%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:41:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 417.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 33.3%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:41:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 412.2 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 34.8%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:41:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 403.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 36.4%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:41:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 393.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 37.9%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:41:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 398.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 39.4%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:41:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 396.6 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 40.9%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:42:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 388.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 42.4%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:42:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 383.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 43.9%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:42:24 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 379.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 45.3%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:42:34 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 375.0 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 46.8%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:42:44 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 371.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 48.2%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:42:54 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 367.8 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 49.6%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:43:04 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 365.4 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 51.0%, Prefix cache hit rate: 48.3%
(APIServer pid=90) INFO 11-28 12:43:14 [loggers.py:236] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 80.3 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 48.3%