v0.5.0
版本发布时间: 2024-12-05 05:35:06
ollama/ollama最新发布版本:v0.5.1(2024-12-07 07:48:45)
New models
- Llama 3.3: a new state of the art 70B model. Llama 3.3 70B offers similar performance compared to Llama 3.1 405B model.
- Snowflake Arctic Embed 2: Snowflake's frontier embedding model. Arctic Embed 2.0 adds multilingual support without sacrificing English performance or scalability.
Structured outputs
Ollama now supports structured outputs, making it possible to constrain a model's output to a specific format defined by a JSON schema. The Ollama Python and JavaScript libraries have been updated to support structured outputs, together with Ollama's OpenAI-compatible API endpoints.
REST API
To use structured outputs in Ollama's generate or chat APIs, provide a JSON schema object in the format
parameter:
curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
"model": "llama3.1",
"messages": [{"role": "user", "content": "Tell me about Canada."}],
"stream": false,
"format": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"capital": {
"type": "string"
},
"languages": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"name",
"capital",
"languages"
]
}
}'
Python library
Using the Ollama Python library, pass in the schema as a JSON object to the format
parameter as either dict
or use Pydantic (recommended) to serialize the schema using model_json_schema()
.
from ollama import chat
from pydantic import BaseModel
class Country(BaseModel):
name: str
capital: str
languages: list[str]
response = chat(
messages=[
{
'role': 'user',
'content': 'Tell me about Canada.',
}
],
model='llama3.1',
format=Country.model_json_schema(),
)
country = Country.model_validate_json(response.message.content)
print(country)
JavaScript library
Using the Ollama JavaScript library, pass in the schema as a JSON object to the format
parameter as either object
or use Zod (recommended) to serialize the schema using zodToJsonSchema()
:
import ollama from 'ollama';
import { z } from 'zod';
import { zodToJsonSchema } from 'zod-to-json-schema';
const Country = z.object({
name: z.string(),
capital: z.string(),
languages: z.array(z.string()),
});
const response = await ollama.chat({
model: 'llama3.1',
messages: [{ role: 'user', content: 'Tell me about Canada.' }],
format: zodToJsonSchema(Country),
});
const country = Country.parse(JSON.parse(response.message.content));
console.log(country);
What's Changed
- Fixed error importing model vocabulary files
- Experimental: new flag to set KV cache quantization to 4-bit (
q4_0
), 8-bit (q8_0
) or 16-bit (f16
). This reduces VRAM requirements for longer context windows.- To enable for all models, use
OLLAMA_FLASH_ATTENTION=1 OLLAMA_KV_CACHE_TYPE=q4_0 ollama serve
- Note: in the future flash attention will be enabled by default where available, with kv cache quantization available on a per-model basis
- Thank you @sammcj for the contribution in in https://github.com/ollama/ollama/pull/7926
- To enable for all models, use
New Contributors
- @dmayboroda made their first contribution in https://github.com/ollama/ollama/pull/7906
- @Geometrein made their first contribution in https://github.com/ollama/ollama/pull/7908
- @owboson made their first contribution in https://github.com/ollama/ollama/pull/7693
Full Changelog: https://github.com/ollama/ollama/compare/v0.4.7...v0.5.0
1、 ollama-darwin 63.17MB
2、 Ollama-darwin.zip 185.42MB
3、 ollama-linux-amd64-rocm.tgz 1.13GB
4、 ollama-linux-amd64.tgz 1.71GB
5、 ollama-linux-arm64-jetpack5.tgz 451.56MB
6、 ollama-linux-arm64-jetpack6.tgz 414.59MB
7、 ollama-linux-arm64.tgz 1.44GB
8、 ollama-windows-amd64.zip 1.79GB
9、 ollama-windows-arm64.zip 22.56MB
10、 OllamaSetup.exe 746.8MB
11、 sha256sum.txt 916B