MyGit

llamastack/llama-stack

Fork: 1296 Star: 8314 (更新于 2026-04-08 23:26:26)

license: MIT

Language: Python .

Composable building blocks to build LLM Apps

最后发布版本: v0.7.1 ( 2026-04-08 22:16:58)

官方网址 GitHub网址

Llama Stack

PyPI Version PyPI Downloads Docker Hub Pulls License Discord Unit Tests Integration Tests OpenResponses Conformance Ask DeepWiki

Quick Start | Documentation | OpenAI API Compatibility | Discord

Open-source agentic API server for building AI applications. OpenAI-compatible. Any model, any infrastructure.

Llama Stack Architecture

Llama Stack is a drop-in replacement for the OpenAI API that you can run anywhere — your laptop, your datacenter, or the cloud. Use any OpenAI-compatible client or agentic framework. Swap between Llama, GPT, Gemini, Mistral, or any model without changing your application code.

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8321/v1", api_key="fake")
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Hello"}],
)

What you get

  • Chat Completions & Embeddings — standard /v1/chat/completions, /v1/completions, and /v1/embeddings endpoints, compatible with any OpenAI client
  • Responses API — server-side agentic orchestration with tool calling, MCP server integration, and built-in file search (RAG) in a single API call (learn more)
  • Vector Stores & Files/v1/vector_stores and /v1/files for managed document storage and search
  • Batches/v1/batches for offline batch processing
  • Open Responses conformant — the Responses API implementation passes the Open Responses conformance test suite

Use any model, use any infrastructure

Llama Stack has a pluggable provider architecture. Develop locally with Ollama, deploy to production with vLLM, or connect to a managed service — the API stays the same.

See the provider documentation for the full list.

Get started

Install and run a Llama Stack server:

# One-line install
curl -LsSf https://github.com/llamastack/llama-stack/raw/main/scripts/install.sh | bash

# Or install via uv
uv pip install llama-stack

# Start the server (uses the starter distribution with Ollama)
llama stack run

Then connect with any OpenAI client — Python, TypeScript, curl, or any framework that speaks the OpenAI API.

See the Quick Start guide for detailed setup.

Resources

Client SDKs:

Language SDK Package
Python llama-stack-client-python PyPI version
TypeScript llama-stack-client-typescript NPM version

Community

We hold regular community calls every Thursday at 09:00 AM PST — see the Community Event on Discord for details.

Star History Chart

Thanks to all our amazing contributors!

Llama Stack contributors

最近版本更新:(数据更新于 2026-04-16 04:04:52)

2026-04-08 22:16:58 v0.7.1

2026-04-02 04:52:28 v0.7.0

2026-03-30 21:09:35 v0.6.1

2026-03-11 23:01:41 v0.6.0

2026-03-06 21:21:59 v0.5.2

2026-02-20 03:01:18 v0.5.1

2026-02-20 02:55:49 v0.4.5

2026-02-06 01:20:42 v0.5.0

2026-01-31 00:25:49 v0.4.4

2026-01-27 05:51:10 v0.4.3

llamastack/llama-stack同语言 Python最近更新仓库

2026-04-18 20:31:49 PrimeIntellect-ai/verifiers

2026-04-18 10:24:54 seleniumbase/SeleniumBase

2026-04-18 08:21:06 pydantic/pydantic-ai

2026-04-18 06:13:38 microsoft/agent-framework

2026-04-18 04:59:46 mealie-recipes/mealie

2026-04-18 04:42:35 PostHog/posthog