sgl-project/sglang

Fork: 635 Star: 6943 (更新于 2025-01-01 14:17:01)

license: Apache-2.0

Language: Python .

SGLang is a fast serving framework for large language models and vision language models.

最后发布版本： v0.3.0 ( 2024-09-04 19:50:29)

官方网址 GitHub网址

介绍
版本
相关

PyPI - Downloads

News

[2024/12] 🔥 SGLang v0.4: Zero-Overhead Batch Scheduler, Cache-Aware Load Balancer, Faster Structured Outputs (blog).
[2024/10] 🔥 The First SGLang Online Meetup (slides).
[2024/09] SGLang v0.3 Release: 7x Faster DeepSeek MLA, 1.5x Faster torch.compile, Multi-Image/Video LLaVA-OneVision (blog).
[2024/07] Faster Llama3 Serving with SGLang Runtime (vs. TensorRT-LLM, vLLM) (blog).

[2024/02] SGLang enables 3x faster JSON decoding with compressed finite state machine (blog).
[2024/04] SGLang is used by the official LLaVA-NeXT (video) release (blog).
[2024/01] SGLang provides up to 5x faster inference with RadixAttention (blog).
[2024/01] SGLang powers the serving of the official LLaVA v1.6 release demo (usage).

About

SGLang is a fast serving framework for large language models and vision language models. It makes your interaction with models faster and more controllable by co-designing the backend runtime and frontend language. The core features include:

Fast Backend Runtime: Provides efficient serving with RadixAttention for prefix caching, jump-forward constrained decoding, overhead-free CPU scheduler, continuous batching, token attention (paged attention), tensor parallelism, FlashInfer kernels, chunked prefill, and quantization (FP8/INT4/AWQ/GPTQ).
Flexible Frontend Language: Offers an intuitive interface for programming LLM applications, including chained generation calls, advanced prompting, control flow, multi-modal inputs, parallelism, and external interactions.
Extensive Model Support: Supports a wide range of generative models (Llama, Gemma, Mistral, QWen, DeepSeek, LLaVA, etc.), embedding models (e5-mistral, gte, mcdse) and reward models (Skywork), with easy extensibility for integrating new models.
Active Community: SGLang is open-source and backed by an active community with industry adoption.

Getting Started

Benchmark and Performance

Learn more in our release blogs: v0.2 blog, v0.3 blog, v0.4 blog

Roadmap

Development Roadmap (2024 Q4)

Adoption and Sponsorship

The project is supported by (alphabetically): AMD, Baseten, DataCrunch, Etched, Hyperbolic, Jam & Tea Studios, LinkedIn, LMSYS.org, Meituan, NVIDIA, RunPod, Stanford, UC Berkeley, UCLA, xAI, 01.AI.