v0.5.0

版本发布时间: 2024-12-21 04:50:31

pytorch/torchtune最新发布版本:v0.5.0(2024-12-21 04:50:31)

Highlights

We are releasing torchtune v0.5.0 with lots of exciting new features! This includes Kaggle integration, a QAT + LoRA training recipe, improved integrations with Hugging Face and vLLM, Gemma2 models, a recipe enabling finetuning for LayerSkip via early exit, and support for NPU devices.

Kaggle integration (#2002)

torchtune is proud to announce our integration with Kaggle! You can now finetune all your favorite models using torchtune in Kaggle notebooks with Kaggle model hub integration. Download a model from the Kaggle Hub, finetune on your dataset with any torchtune recipe, then pick your best model and upload your best checkpoint to the Kaggle Hub to share with the community. Check out our example Kaggle notebook here to get started!

QAT + LoRA training recipe (#1931)

If you've seen the Llama 3.2 quantized models, you may know that they were trained using quantization-aware training with LoRA adapters. This is an effective way to maintain good model performance when you need to quantize for on-device inference. Now you can train your own quant-friendly LoRA models in torchtune with our QAT + LoRA recipe!

To finetune Llama 3.2 3B with QAT + LoRA, you can run:

# Download Llama 3.2 3B
tune download meta-llama/Llama-3.2-3B-Instruct --ignore-patterns "original/consolidated.00.pth"

# Finetune on two devices
tune run --nproc_per_node 2 qat_lora_finetune_distributed --config llama3_2/3B_qat_lora

Improved Hugging Face and vLLM integration (#2074)

We heard your feedback, and we're happy to say that it's now easier than ever to load your torchtune models into Hugging Face or vLLM! It's as simple as:

from transformers import AutoModelForCausalLM

trained_model_path = "/path/to/my/torchtune/checkpoint"

model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=trained_model_path,
)

See the full examples in our docs! Hugging Face, vLLM

Gemma 2 models (#1835)

We now support models from the Gemma 2 family! This includes the 2B, 9B, and 27B sizes, with recipes for full, LoRA, and QLoRA finetuning on one or more devices. For example, you can finetune Gemma 2 27B with QLoRA by running:

# Download Gemma 2 27B
tune download google/gemma-2-27b --ignore-patterns "gemma-2-27b.gguf"

# Finetune on a single GPU
tune run lora_finetune_single_device --config gemma2/27B_qlora_single_device

A huge thanks to @Optimox for landing these models!

Early exit training recipe (#1076)

LayerSkip is an end-to-end solution to speed up LLM inference. By combining layer dropout with an appropriate dropout schedule and using an early exit loss during training, you can increase the accuracy of early exit at inference time. You can use our early exit config to reproduce experiments from LayerSkip, LayerDrop, and other papers.

You can try torchtune's early exit recipe by running the following:

# Download Llama2 7B
tune download meta-llama/Llama-2-7b-hf --output-dir /tmp/Llama-2-7b-hf

# Finetune with early exit on four devices
tune run --nnodes 1 --nproc_per_node 4 dev/early_exit_finetune_distributed --config recipes/dev/7B_full_early_exit.yaml

NPU support (#1826)

We are excited to share that torchtune can now be used on Ascend NPU devices! All your favorite single-device recipes can be run as-is, with support for distributed recipes coming later. A huge thanks to @noemotiovon for their work to enable this!

What's Changed

nit: Correct compile_loss return type hint by @bradhilton in https://github.com/pytorch/torchtune/pull/1940
Fix grad accum + FSDP CPU offload, pass None via CLI by @ebsmothers in https://github.com/pytorch/torchtune/pull/1941
QAT tutorial nit by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1945
A more encompassing fix for offloading + ac by @janeyx99 in https://github.com/pytorch/torchtune/pull/1936
Add Qwen2.5 to live docs by @RdoubleA in https://github.com/pytorch/torchtune/pull/1949
[Bug] model_type argument as str for checkpoints classes by @smujjiga in https://github.com/pytorch/torchtune/pull/1946
llama3.2 90b config updates + nits by @RdoubleA in https://github.com/pytorch/torchtune/pull/1950
Add Ascend NPU as a backend by @noemotiovon in https://github.com/pytorch/torchtune/pull/1826
fix missing key by @felipemello1 in https://github.com/pytorch/torchtune/pull/1952
update memory optimization tutorial by @felipemello1 in https://github.com/pytorch/torchtune/pull/1948
update configs by @felipemello1 in https://github.com/pytorch/torchtune/pull/1954
add expandable segment to integration tests by @felipemello1 in https://github.com/pytorch/torchtune/pull/1963
Fix check in load_from_full_state_dict for modified state dicts by @RylanC24 in https://github.com/pytorch/torchtune/pull/1967
Update torchtune generation to be more flexible by @RylanC24 in https://github.com/pytorch/torchtune/pull/1970
feat: add gemma2b variants by @Optimox in https://github.com/pytorch/torchtune/pull/1835
typo by @felipemello1 in https://github.com/pytorch/torchtune/pull/1972
Update QAT: add grad clipping, torch.compile, collate fn by @andrewor14 in https://github.com/pytorch/torchtune/pull/1854
VQA Documentation by @calvinpelletier in https://github.com/pytorch/torchtune/pull/1974
Convert all non-rgb images to rgb by @vancoykendall in https://github.com/pytorch/torchtune/pull/1976
Early fusion multimodal models by @RdoubleA in https://github.com/pytorch/torchtune/pull/1904
Refactor Recipe State Dict Code by @pbontrager in https://github.com/pytorch/torchtune/pull/1964
Update KV Cache to use num_kv_heads instead of num_heads by @mirceamironenco in https://github.com/pytorch/torchtune/pull/1961
Migrate to epochs: 1 in all configs by @thomasjpfan in https://github.com/pytorch/torchtune/pull/1981
Make sure CLIP resized pos_embed is contiguous by @gau-nernst in https://github.com/pytorch/torchtune/pull/1986
Add **quantization_kwargs to FrozenNF4Linear and LoRALinear and DoRALinear by @joecummings in https://github.com/pytorch/torchtune/pull/1987
Enables Python 3.13 for nightly builds by @thomasjpfan in https://github.com/pytorch/torchtune/pull/1988
DOC Fixes custom message transform example by @thomasjpfan in https://github.com/pytorch/torchtune/pull/1983
Pass quantization_kwargs to CLIP builders by @joecummings in https://github.com/pytorch/torchtune/pull/1994
Adding MM eval tests / attention bugfixes by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1989
Update Qwen2.5 configs by @joecummings in https://github.com/pytorch/torchtune/pull/1999
nit: Fix/add some type annotations by @bradhilton in https://github.com/pytorch/torchtune/pull/1982
Fixing special_tokens arg in Llama3VisionTransform by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2000
Recent updates to the README by @joecummings in https://github.com/pytorch/torchtune/pull/1979
Bump version to 0.5.0 by @joecummings in https://github.com/pytorch/torchtune/pull/2009
gemma2 had wrong path to scheduler by @felipemello1 in https://github.com/pytorch/torchtune/pull/2013
Create _export directory in torchtune by @Jack-Khuu in https://github.com/pytorch/torchtune/pull/2011
torchrun defaults for concurrent distributed training jobs by @ebsmothers in https://github.com/pytorch/torchtune/pull/2015
Remove unused FSDP components by @ebsmothers in https://github.com/pytorch/torchtune/pull/2016
2D RoPE + CLIP updates by @RdoubleA in https://github.com/pytorch/torchtune/pull/1973
Some KD recipe cleanup by @ebsmothers in https://github.com/pytorch/torchtune/pull/2020
Remove lr_scheduler requirement in lora_dpo_single_device by @thomasjpfan in https://github.com/pytorch/torchtune/pull/1991
chore: remove PyTorch 2.5.0 checks by @JP-sDEV in https://github.com/pytorch/torchtune/pull/1877
Make tokenize tests readable by @krammnic in https://github.com/pytorch/torchtune/pull/1868
add flags to readme by @felipemello1 in https://github.com/pytorch/torchtune/pull/2003
Support for unsharded parameters in state_dict APIs by @RdoubleA in https://github.com/pytorch/torchtune/pull/2023
[WIP] Reducing eval vision tests runtime by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2022
log rank zero everywhere by @RdoubleA in https://github.com/pytorch/torchtune/pull/2030
Add LR Scheduler to full finetune distributed by @parthsarthi03 in https://github.com/pytorch/torchtune/pull/2017
Fix Qlora/lora for 3.2 vision by @felipemello1 in https://github.com/pytorch/torchtune/pull/2028
CLIP Text Encoder by @calvinpelletier in https://github.com/pytorch/torchtune/pull/1969
feat(cli): allow users to download models from Kaggle by @KeijiBranshi in https://github.com/pytorch/torchtune/pull/2002
remove default to ignore safetensors by @felipemello1 in https://github.com/pytorch/torchtune/pull/2042
Remove deprecated TiedEmbeddingTransformerDecoder by @EmilyIsCoding in https://github.com/pytorch/torchtune/pull/2047
Use hf transfer as default by @felipemello1 in https://github.com/pytorch/torchtune/pull/2046
Fix issue in loading mixed precision vocab pruned models during torchtune generation for evaluation by @ifed-ucsd in https://github.com/pytorch/torchtune/pull/2043
[export] Add exportable attention and kv cache by @larryliu0820 in https://github.com/pytorch/torchtune/pull/2049
Switch to PyTorch's built-in RMSNorm by @calvinpelletier in https://github.com/pytorch/torchtune/pull/2054
[export] Add exportable position embedding by @larryliu0820 in https://github.com/pytorch/torchtune/pull/2068
MM Docs nits by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2067
Add support for QAT + LoRA by @andrewor14 in https://github.com/pytorch/torchtune/pull/1931
Add ability to shard custom layers for DPO and LoRA distributed by @joecummings in https://github.com/pytorch/torchtune/pull/2072
[ez] remove stale pytorch version check by @ebsmothers in https://github.com/pytorch/torchtune/pull/2075
Fail early with packed=True on MM datasets. by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2080
Error message on packed=True for stack exchange dataset by @joecummings in https://github.com/pytorch/torchtune/pull/2079
Fix nightly tests for qat_lora_fintune_distributed by @andrewor14 in https://github.com/pytorch/torchtune/pull/2085
Update build_linux_wheels.yaml - Pass test-infra input params by @atalman in https://github.com/pytorch/torchtune/pull/2086
DPO Activation Offloading by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2087
Deprecate SimpoLoss by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2063
DPO Recipe Doc by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2091
initial commit by @songhappy in https://github.com/pytorch/torchtune/pull/1953
Vector Quantized Embeddings by @RdoubleA in https://github.com/pytorch/torchtune/pull/2040
Fix bug in loading multimodal datasets and update tests accordingly by @joecummings in https://github.com/pytorch/torchtune/pull/2110
Set gloo process group for FSDP with CPU offload by @ebsmothers in https://github.com/pytorch/torchtune/pull/2108
Llama 3.3 70B by @pbontrager in https://github.com/pytorch/torchtune/pull/2124
Llama 3.3 readme updates by @ebsmothers in https://github.com/pytorch/torchtune/pull/2125
update configs by @felipemello1 in https://github.com/pytorch/torchtune/pull/2107
Reduce logging output for distributed KD by @joecummings in https://github.com/pytorch/torchtune/pull/2120
Support Early Exit Loss and/or Layer Dropout by @mostafaelhoushi in https://github.com/pytorch/torchtune/pull/1076
Update checkpointing directory -> using vLLM and from_pretrained by @felipemello1 in https://github.com/pytorch/torchtune/pull/2074
pass correct arg by @felipemello1 in https://github.com/pytorch/torchtune/pull/2127
update configs by @felipemello1 in https://github.com/pytorch/torchtune/pull/2128
fix qat_lora_test by @felipemello1 in https://github.com/pytorch/torchtune/pull/2131
guard ckpt imports by @felipemello1 in https://github.com/pytorch/torchtune/pull/2133
[bug fix] add parents=True by @felipemello1 in https://github.com/pytorch/torchtune/pull/2136
[bug fix] re-add model by @felipemello1 in https://github.com/pytorch/torchtune/pull/2135
Update save sizes into GiB by @joecummings in https://github.com/pytorch/torchtune/pull/2143
[bug fix] remove config download when source is kaggle by @felipemello1 in https://github.com/pytorch/torchtune/pull/2144
[fix] remove "with_suffix" by @felipemello1 in https://github.com/pytorch/torchtune/pull/2146
DoRA fixes by @ebsmothers in https://github.com/pytorch/torchtune/pull/2139
[Fix] Llama 3.2 Vision decoder_trainable flag fixed by @pbontrager in https://github.com/pytorch/torchtune/pull/2150

New Contributors

@bradhilton made their first contribution in https://github.com/pytorch/torchtune/pull/1940
@smujjiga made their first contribution in https://github.com/pytorch/torchtune/pull/1946
@noemotiovon made their first contribution in https://github.com/pytorch/torchtune/pull/1826
@RylanC24 made their first contribution in https://github.com/pytorch/torchtune/pull/1967
@vancoykendall made their first contribution in https://github.com/pytorch/torchtune/pull/1976
@Jack-Khuu made their first contribution in https://github.com/pytorch/torchtune/pull/2011
@JP-sDEV made their first contribution in https://github.com/pytorch/torchtune/pull/1877
@KeijiBranshi made their first contribution in https://github.com/pytorch/torchtune/pull/2002
@EmilyIsCoding made their first contribution in https://github.com/pytorch/torchtune/pull/2047
@ifed-ucsd made their first contribution in https://github.com/pytorch/torchtune/pull/2043
@larryliu0820 made their first contribution in https://github.com/pytorch/torchtune/pull/2049
@atalman made their first contribution in https://github.com/pytorch/torchtune/pull/2086
@songhappy made their first contribution in https://github.com/pytorch/torchtune/pull/1953
@mostafaelhoushi made their first contribution in https://github.com/pytorch/torchtune/pull/1076

Full Changelog: https://github.com/pytorch/torchtune/compare/v0.4.0...v0.5.0

相关地址：原始地址下载(tar) 下载(zip)

查看：2024-12-21发行的版本