[Bugfix][Metrics] Fix RayPrometheusMetric.labels() returning shared labeled child (#40840)
When vLLM runs with Ray Prometheus
vllm:request_success{finished_reason=...}only ever increments the repetition bucket regardless of the request’s actual finish reason; stop, length, abort, and error stay at zero. Root cause waslabels()mutated the wrapped Ray metric’s default tags in place and returned self, so every.labels(...)call on a given wrapper returned the same object.Co-authored-by: Marwan Sarieddine sarieddine.marwan@gmail.com Co-authored-by: Claude noreply@anthropic.com Signed-off-by: Marwan Sarieddine sarieddine.marwan@gmail.com Signed-off-by: Seiji Eicher seiji@anyscale.com
版权所有:中国计算机学会技术支持:开源发展技术委员会
京ICP备13000930号-9
京公网安备 11010802032778号
Easy, fast, and cheap LLM serving for everyone
| Documentation | Blog | Paper | Twitter/X | User Forum | Developer Slack |
🔥 We have built a vLLM website to help you get started with vLLM. Please visit vllm.ai to learn more. For events, please visit vllm.ai/events to join us.
About
vLLM is a fast and easy-to-use library for LLM inference and serving.
Originally developed in the Sky Computing Lab at UC Berkeley, vLLM has grown into one of the most active open-source AI projects built and maintained by a diverse community of many dozens of academic institutions and companies from over 2000 contributors.
vLLM is fast with:
vLLM is flexible and easy to use with:
vLLM seamlessly supports 200+ model architectures on Hugging Face, including:
Find the full list of supported models here.
Getting Started
Install vLLM with
uv(recommended) orpip:Or build from source for development.
Visit our documentation to learn more.
Contributing
We welcome and value any contributions and collaborations. Please check out Contributing to vLLM for how to get involved.
Citation
If you use vLLM for your research, please cite our paper:
Contact Us
Media Kit