v0.1.18
版本发布时间: 2024-07-04 14:35:42
sgl-project/sglang最新发布版本:v0.3.0(2024-09-04 19:50:29)
Highlight
- 2x large batch prefill improvement with the new flashinfer kernels #579
- Multi-node tensor parallelism #550
- New model support: ChatGLM #516
What's Changed
- Fix missing numpy dependency in pyproject.toml by @fpreiss in https://github.com/sgl-project/sglang/pull/524
- Fix RAG nb, parea setup (parea -> parea-ai) by @fpreiss in https://github.com/sgl-project/sglang/pull/525
- [Minor] Correct Optional type hints in api by @fpreiss in https://github.com/sgl-project/sglang/pull/526
- Add ChatGLM Model Support by @Qubitium in https://github.com/sgl-project/sglang/pull/516
- Fix Regression: Disable p2p for 4090 by @ZX-ModelCloud in https://github.com/sgl-project/sglang/pull/531
- Decode Incrementally by @hnyls2002 in https://github.com/sgl-project/sglang/pull/517
- Fix dependency by @merrymercy in https://github.com/sgl-project/sglang/pull/538
- Fix dependency & crash issues by @Ying1123 in https://github.com/sgl-project/sglang/pull/539
- Higher priority for user input of max_prefill_tokens & format by @Ying1123 in https://github.com/sgl-project/sglang/pull/540
- Add disk cache for loading ShareGPT dataset. by @hnyls2002 in https://github.com/sgl-project/sglang/pull/542
- Fix tp worker only checking req[0] for stream by @Qubitium in https://github.com/sgl-project/sglang/pull/546
- Fix the Jump-Forward with Chinese by @hnyls2002 in https://github.com/sgl-project/sglang/pull/551
- Update fused_moe by @merrymercy in https://github.com/sgl-project/sglang/pull/553
- Multi-node Tensor Parallelism by @Ying1123 in https://github.com/sgl-project/sglang/pull/550
- Update flashinfer to 0.0.5 by @merrymercy in https://github.com/sgl-project/sglang/pull/554
- Follow-up fixes for flashinfer 0.0.5 by @merrymercy in https://github.com/sgl-project/sglang/pull/556
- Fix latency benchmark by @hnyls2002 in https://github.com/sgl-project/sglang/pull/557
- Clean up logits processor by @merrymercy in https://github.com/sgl-project/sglang/pull/558
- Update test_flashinfer by @hnyls2002 in https://github.com/sgl-project/sglang/pull/560
- Allow running with vllm==0.4.3 by @merrymercy in https://github.com/sgl-project/sglang/pull/561
- Add a new arguments log_level_http to control the HTTP logging by @merrymercy in https://github.com/sgl-project/sglang/pull/563
- Add sglang.bench_latency for offline benchmark by @merrymercy in https://github.com/sgl-project/sglang/pull/564
- Warmup cublas by @merrymercy in https://github.com/sgl-project/sglang/pull/566
- Increase the number of thread limitation for tp worker managers. by @merrymercy in https://github.com/sgl-project/sglang/pull/567
- Update readme by @merrymercy in https://github.com/sgl-project/sglang/pull/568
- Expose dtype argument by @merrymercy in https://github.com/sgl-project/sglang/pull/569
- Update benchmark script by @Ying1123 in https://github.com/sgl-project/sglang/pull/571
- Minor fix in compiler & format by @ZackZeng999 in https://github.com/sgl-project/sglang/pull/545
- Update run_batch interface and max_prefill_tokens by @Ying1123 in https://github.com/sgl-project/sglang/pull/574
- Fix flashinfer version by @PanJason in https://github.com/sgl-project/sglang/pull/576
- [BugFix] gemma loading weights "lm_head.weight" key error by @dhgarcia in https://github.com/sgl-project/sglang/pull/577
- Turn on flashinfer by default by @Ying1123 in https://github.com/sgl-project/sglang/pull/578
- fix the broken server args by @hnyls2002 in https://github.com/sgl-project/sglang/pull/585
- 2x performance improvement for large prefill & Fix workspace conflicts by @Ying1123 in https://github.com/sgl-project/sglang/pull/579
New Contributors
- @fpreiss made their first contribution in https://github.com/sgl-project/sglang/pull/524
- @ZackZeng999 made their first contribution in https://github.com/sgl-project/sglang/pull/545
- @PanJason made their first contribution in https://github.com/sgl-project/sglang/pull/576
- @dhgarcia made their first contribution in https://github.com/sgl-project/sglang/pull/577
Full Changelog: https://github.com/sgl-project/sglang/compare/v0.1.17...v0.1.18