VLLM

Apr 23, 2026 - 04:56

0 0

Page created

New page

{{lowercase title}}
{{Short description|Open-source software for large language model inference}}
{{Use mdy dates|date=April 2026}}

{{Infobox software
| name = vLLM
| logo = vLLM.svg
| author = Sky Computing Lab
[[University of California, Berkeley|Cal Berkeley]]
| developer = vLLM contributors
| released = 2023
| programming language = [[Python (programming language)|Python]], [[CUDA]], [[C++]]
| genre = [[Large language model]] [[inference engine]]
| license = [[Apache License 2.0]]
| website = {{URL|https://vllm.ai}}
| repo = {{URL|https://github.com/vllm-project/vllm}}
}}

'''vLLM''' is an open-source software framework for inference and serving of [[large language model]]s and related [[multimodal model]]s. Originally developed at the [[University of California, Berkeley]]'s Sky Computing Lab, the project is centered on ''PagedAttention'', a [[memory management|memory-management]] method for [[Transformer (deep learning)|transformer]] [[Transformer (deep learning)#KV caching|key–value cache]]s, and supports features such as continuous batching, [[distributed computing|distributed]] inference, [[Large language model#Quantization|quantization]], and [[OpenAI]]-compatible [[application programming interface|APIs]].{{cite web |title=GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for LLMs |url=https://github.com/vllm-project/vllm |website=GitHub |publisher=GitHub, Inc. |access-date=April 22, 2026}}{{cite conference |last1=Kwon |first1=Woosuk |last2=Li |first2=Zhuohan |last3=Zhuang |first3=Siyuan |last4=Sheng |first4=Ying |last5=Zheng |first5=Lianmin |last6=Yu |first6=Cody Hao |last7=Gonzalez |first7=Joseph E. |last8=Zhang |first8=Hao |last9=Stoica |first9=Ion |title=Efficient Memory Management for Large Language Model Serving with PagedAttention |conference=Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles |year=2023 |url=https://arxiv.org/abs/2309.06180 |access-date=April 22, 2026}}{{cite web |title=vLLM |url=https://pytorch.org/projects/vllm/ |website=PyTorch |publisher=PyTorch Foundation |access-date=April 22, 2026}} According to a project [[software maintainer|maintainer]], the "v" in vLLM originally referred to "virtual", inspired by [[virtual memory]].{{cite web |title=vLLM full name |url=https://github.com/vllm-project/vllm/issues/835 |website=GitHub |publisher=GitHub, Inc. |date=August 23, 2023 |access-date=April 22, 2026}}

== History ==
vLLM was introduced in 2023 by researchers affiliated with the Sky Computing Lab at UC Berkeley. Its core ideas were described in the 2023 paper ''Efficient Memory Management for Large Language Model Serving with PagedAttention'',{{cite arXiv |last1=Kwon |first1=Woosuk |last2=Li |first2=Zhuohan |last3=Zhuang |first3=Siyuan |last4=Sheng |first4=Ying |last5=Zheng |first5=Lianmin |last6=Yu |first6=Cody Hao |last7=Gonzalez |first7=Joseph E. |last8=Zhang |first8=Hao |last9=Stoica |first9=Ion |eprint=2309.06180 |title=Efficient Memory Management for Large Language Model Serving with PagedAttention |class=cs.LG |date=2023-09-12}} which presented the system as a [[High-throughput computing|high-throughput]] and [[Memory (computer)|memory]]-efficient serving engine for [[Large language model|large language model]]s.

In 2025, the [[PyTorch]] Foundation announced that vLLM had become a Foundation-hosted project. PyTorch's project page states that the [[University of California, Berkeley]] contributed vLLM to the [[Linux Foundation]] in July 2024.{{cite web |title=PyTorch Foundation Welcomes vLLM as a Hosted Project |url=https://pytorch.org/blog/pytorch-foundation-welcomes-vllm/ |website=PyTorch |publisher=PyTorch Foundation |date=May 7, 2025 |access-date=April 22, 2026}}

In January 2026, ''[[TechCrunch]]'' reported that the creators of vLLM had launched the startup Inferact to commercialize the project, raising $150 million in seed funding.{{cite web |last=Temkin |first=Marina |title=Inference startup Inferact lands $150M to commercialize vLLM |url=https://techcrunch.com/2026/01/22/inference-startup-inferact-lands-150m-to-commercialize-vllm/ |website=TechCrunch |date=January 22, 2026 |access-date=April 22, 2026}}

== Architecture ==
According to its 2023 paper, vLLM was designed to improve the efficiency of [[large language model]] serving by reducing memory waste in the [[Transformer (deep learning)#KV caching|key–value cache]] used during [[Transformer (deep learning)|transformer]] inference. The paper introduced ''PagedAttention'', an algorithm inspired by [[virtual memory]] and [[paging]] techniques in [[operating system]]s, and described vLLM as using block-level memory management and request scheduling to increase [[throughput]] while maintaining similar [[Latency (engineering)|latency]].

The project documentation and repository describe support for continuous batching, chunked prefill, [[speculative decoding]], prefix caching, [[Large language model#Quantization|quantization]], and multiple forms of [[distributed computing|distributed]] inference and serving. [[PyTorch]] has described vLLM as a high-throughput, memory-efficient inference and serving engine that supports a range of hardware back ends, including [[Nvidia|NVIDIA]] and [[Advanced Micro Devices|AMD]] [[graphics processing unit|GPUs]], [[Tensor Processing Unit|Google TPUs]], [[AWS]] Trainium, and [[Intel]] processors.

== See also ==
* [[SGLang]]
* [[llama.cpp]]
* [[OpenVINO]]
* [[Open Neural Network Exchange]]
* [[Comparison of deep learning software]]
* [[Comparison of machine learning software]]
* [[Lists of open-source artificial intelligence software]]

== External links ==
* [https://catalog.ngc.nvidia.com/orgs/nvidia/containers/vllm?version=26.03.post1-py3 vLLM on NVIDIA NGC]
* [https://pytorch.org/projects/vllm/ vLLM project page at PyTorch]

== References ==
{{reflist}}

Assam HS Result 2026: When will AHSEC Class 1...

Karnataka PUC 2 Result 2026: Rank 4 holder Sa...

AP Inter Result 2026: BIEAP 1st, 2nd year res...

BIEAP Inter Result 2026: Andhra Pradesh 1st, ...

MP Board Result 2026: MPBSE 10th, 12th result...

शादी के बाद बदली किस्मत! दिनेश विश्नोई बने RA...

VIDEO: कांग्रेस अपने पापों का प्रायश्चित करेग...

गुच्छों में बिकती चूड़ियां, बॉक्स में बेचो और...

आगरा में चाय का अलग अंदाज, बिना पानी बनती है ...

पत्रकारिता और राजनीति के एक युग का अंत, बीजेप...

CSAT Comprehension Practice Quiz - 67

State PCS Quiz (22th April 2026)

Editorial Quiz (21th April 2026)

Current Affairs Quiz (22th April 2026)

Maths – Practice Quiz 68

VLLM

What's Your Reaction?

Follow Us

Recommended Posts

Learn.Freesuccess.in - Micro Courses, Future of Smart L...

Important Books And Notes For UPSC, RBI, SEBI, NABARD, ...

Economics Books And Class Notes For UPSC, RBI Grade B A...

IGNOU RBI Grade B Notes For Phase 2 (Management, Financ...

Download Free Physics NCERT Books | All Class PDFs | Go...

UPSC Previous Years Question Papers | Free download Goo...

Popular Tags

Trending Posts

Canada producing more graduates than ever — so why is i...

Important Books And Notes For UPSC, RBI, SEBI, NABARD, ...

MP Board 10th, 12th Result 2026 declared at mpbse.mponl...

VLLM

What's Your Reaction?

Related Posts

Popular Posts

Follow Us

Recommended Posts

Popular Tags