zurawiki/tiktoken-rs
Fork: 68 Star: 381 (更新于 2026-04-09 02:56:48)
license: MIT
Language: Rust .
Ready-made tokenizer library for working with GPT and tiktoken
最后发布版本: v0.11.0 ( 2026-04-09 00:06:08)
tiktoken-rs
Rust library for tokenizing text with OpenAI models using tiktoken.
This library provides a set of ready-made tokenizer libraries for working with GPT, tiktoken and related OpenAI models. Use cases cover tokenizing and counting tokens in text inputs.
This library is built on top of the tiktoken library and includes some additional features and enhancements for ease of use with Rust code.
Supports all current OpenAI models including GPT-5.4, GPT-5, GPT-4.1, GPT-4o, o1, o3, o4-mini, and gpt-oss models.
Scope: This crate is focused on OpenAI tokenizers (tiktoken). For non-OpenAI models (Llama, Gemini, Mistral, etc.), use the HuggingFace
tokenizerscrate.
Examples
For full working examples for all supported features, see the examples directory in the repository.
Usage
- Install this tool locally with
cargo
cargo add tiktoken-rs
Then in your rust code, call the API
Counting token length
use tiktoken_rs::o200k_base;
let bpe = o200k_base().unwrap();
let tokens = bpe.encode_with_special_tokens(
"This is a sentence with spaces"
);
println!("Token count: {}", tokens.len());
For repeated calls, use the singleton to avoid re-initializing the tokenizer:
use tiktoken_rs::o200k_base_singleton;
let bpe = o200k_base_singleton();
let tokens = bpe.encode_with_special_tokens(
"This is a sentence with spaces"
);
println!("Token count: {}", tokens.len());
Counting max_tokens parameter for a chat completion request
use tiktoken_rs::{get_chat_completion_max_tokens, ChatCompletionRequestMessage};
let messages = vec![
ChatCompletionRequestMessage {
content: Some("You are a helpful assistant that only speaks French.".to_string()),
role: "system".to_string(),
..Default::default()
},
ChatCompletionRequestMessage {
content: Some("Hello, how are you?".to_string()),
role: "user".to_string(),
..Default::default()
},
ChatCompletionRequestMessage {
content: Some("Parlez-vous francais?".to_string()),
role: "system".to_string(),
..Default::default()
},
];
let max_tokens = get_chat_completion_max_tokens("o1-mini", &messages).unwrap();
println!("max_tokens: {}", max_tokens);
Counting max_tokens parameter for a chat completion request with async-openai
Need to enable the async-openai feature in your Cargo.toml file.
use tiktoken_rs::async_openai::get_chat_completion_max_tokens;
use async_openai::types::chat::{
ChatCompletionRequestMessage, ChatCompletionRequestSystemMessage,
ChatCompletionRequestSystemMessageContent, ChatCompletionRequestUserMessage,
ChatCompletionRequestUserMessageContent,
};
let messages = vec![
ChatCompletionRequestMessage::System(ChatCompletionRequestSystemMessage {
content: ChatCompletionRequestSystemMessageContent::Text(
"You are a helpful assistant that only speaks French.".to_string(),
),
name: None,
}),
ChatCompletionRequestMessage::User(ChatCompletionRequestUserMessage {
content: ChatCompletionRequestUserMessageContent::Text(
"Hello, how are you?".to_string(),
),
name: None,
}),
];
let max_tokens = get_chat_completion_max_tokens("o1-mini", &messages).unwrap();
println!("max_tokens: {}", max_tokens);
tiktoken supports these encodings used by OpenAI models:
| Encoding name | OpenAI models |
|---|---|
o200k_harmony |
gpt-oss-20b, gpt-oss-120b |
o200k_base |
GPT-5 series, o1/o3/o4 series, gpt-4o, gpt-4.5, gpt-4.1, codex-* |
cl100k_base |
gpt-4, gpt-3.5-turbo, text-embedding-ada-002, text-embedding-3-* |
p50k_base |
Code models, text-davinci-002, text-davinci-003 |
p50k_edit |
Edit models like text-davinci-edit-001, code-davinci-edit-001 |
r50k_base (or gpt2) |
GPT-3 models like davinci |
Context sizes
| Model | Context window |
|---|---|
gpt-5.4, gpt-5.4-pro |
1,050,000 |
gpt-4.1, gpt-4.1-mini, gpt-4.1-nano |
1,047,576 |
gpt-5, gpt-5-mini, gpt-5-nano, gpt-5.4-mini, gpt-5.4-nano |
400,000 |
o1, o3, o3-mini, o3-pro, o4-mini |
200,000 |
codex-mini |
200,000 |
gpt-oss |
131,072 |
gpt-4o, gpt-4o-mini |
128,000 |
o1-mini, gpt-5.3-codex-spark |
128,000 |
gpt-3.5-turbo |
16,385 |
gpt-4 |
8,192 |
See the examples in the repo for use cases. For more context on the different tokenizers, see the OpenAI Cookbook
Encountered any bugs?
If you encounter any bugs or have any suggestions for improvements, please open an issue on the repository.
Acknowledgements
Thanks @spolu for the original code, and .tiktoken files.
License
This project is licensed under the MIT License.
最近版本更新:(数据更新于 2026-04-09 02:56:32)
2026-04-09 00:06:08 v0.11.0
2026-04-08 14:06:46 v0.10.0
2025-11-10 05:50:29 v0.9.1
2025-05-19 13:12:09 v0.7.0
2024-10-14 08:48:03 v0.6.0
2024-05-16 09:53:50 v0.5.9
2023-12-22 11:25:26 v0.5.8
2023-11-14 22:05:55 v0.5.7
2023-10-22 07:24:45 v0.5.6
2023-10-16 22:36:01 v0.5.5
主题(topics):
bpe, openai, rust, tokenizer
zurawiki/tiktoken-rs同语言 Rust最近更新仓库
2026-04-18 09:05:52 rust-lang/rust-analyzer
2026-04-18 08:53:11 scanopy/scanopy
2026-04-18 08:42:47 zed-industries/zed
2026-04-17 22:16:54 FuelLabs/fuel-core
2026-04-17 21:25:21 stakpak/agent
2026-04-17 20:02:38 meteroid-oss/meteroid
