v0.5.0
版本发布时间: 2024-12-21 04:50:31
pytorch/torchtune最新发布版本:v0.5.0(2024-12-21 04:50:31)
Highlights
We are releasing torchtune v0.5.0 with lots of exciting new features! This includes Kaggle integration, a QAT + LoRA training recipe, improved integrations with Hugging Face and vLLM, Gemma2 models, a recipe enabling finetuning for LayerSkip via early exit, and support for NPU devices.
Kaggle integration (#2002)
torchtune is proud to announce our integration with Kaggle! You can now finetune all your favorite models using torchtune in Kaggle notebooks with Kaggle model hub integration. Download a model from the Kaggle Hub, finetune on your dataset with any torchtune recipe, then pick your best model and upload your best checkpoint to the Kaggle Hub to share with the community. Check out our example Kaggle notebook here to get started!
QAT + LoRA training recipe (#1931)
If you've seen the Llama 3.2 quantized models, you may know that they were trained using quantization-aware training with LoRA adapters. This is an effective way to maintain good model performance when you need to quantize for on-device inference. Now you can train your own quant-friendly LoRA models in torchtune with our QAT + LoRA recipe!
To finetune Llama 3.2 3B with QAT + LoRA, you can run:
# Download Llama 3.2 3B
tune download meta-llama/Llama-3.2-3B-Instruct --ignore-patterns "original/consolidated.00.pth"
# Finetune on two devices
tune run --nproc_per_node 2 qat_lora_finetune_distributed --config llama3_2/3B_qat_lora
Improved Hugging Face and vLLM integration (#2074)
We heard your feedback, and we're happy to say that it's now easier than ever to load your torchtune models into Hugging Face or vLLM! It's as simple as:
from transformers import AutoModelForCausalLM
trained_model_path = "/path/to/my/torchtune/checkpoint"
model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path=trained_model_path,
)
See the full examples in our docs! Hugging Face, vLLM
Gemma 2 models (#1835)
We now support models from the Gemma 2 family! This includes the 2B, 9B, and 27B sizes, with recipes for full, LoRA, and QLoRA finetuning on one or more devices. For example, you can finetune Gemma 2 27B with QLoRA by running:
# Download Gemma 2 27B
tune download google/gemma-2-27b --ignore-patterns "gemma-2-27b.gguf"
# Finetune on a single GPU
tune run lora_finetune_single_device --config gemma2/27B_qlora_single_device
A huge thanks to @Optimox for landing these models!
Early exit training recipe (#1076)
LayerSkip is an end-to-end solution to speed up LLM inference. By combining layer dropout with an appropriate dropout schedule and using an early exit loss during training, you can increase the accuracy of early exit at inference time. You can use our early exit config to reproduce experiments from LayerSkip, LayerDrop, and other papers.
You can try torchtune's early exit recipe by running the following:
# Download Llama2 7B
tune download meta-llama/Llama-2-7b-hf --output-dir /tmp/Llama-2-7b-hf
# Finetune with early exit on four devices
tune run --nnodes 1 --nproc_per_node 4 dev/early_exit_finetune_distributed --config recipes/dev/7B_full_early_exit.yaml
NPU support (#1826)
We are excited to share that torchtune can now be used on Ascend NPU devices! All your favorite single-device recipes can be run as-is, with support for distributed recipes coming later. A huge thanks to @noemotiovon for their work to enable this!
What's Changed
- nit: Correct compile_loss return type hint by @bradhilton in https://github.com/pytorch/torchtune/pull/1940
- Fix grad accum + FSDP CPU offload, pass None via CLI by @ebsmothers in https://github.com/pytorch/torchtune/pull/1941
- QAT tutorial nit by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1945
- A more encompassing fix for offloading + ac by @janeyx99 in https://github.com/pytorch/torchtune/pull/1936
- Add Qwen2.5 to live docs by @RdoubleA in https://github.com/pytorch/torchtune/pull/1949
- [Bug] model_type argument as str for checkpoints classes by @smujjiga in https://github.com/pytorch/torchtune/pull/1946
- llama3.2 90b config updates + nits by @RdoubleA in https://github.com/pytorch/torchtune/pull/1950
- Add Ascend NPU as a backend by @noemotiovon in https://github.com/pytorch/torchtune/pull/1826
- fix missing key by @felipemello1 in https://github.com/pytorch/torchtune/pull/1952
- update memory optimization tutorial by @felipemello1 in https://github.com/pytorch/torchtune/pull/1948
- update configs by @felipemello1 in https://github.com/pytorch/torchtune/pull/1954
- add expandable segment to integration tests by @felipemello1 in https://github.com/pytorch/torchtune/pull/1963
- Fix check in
load_from_full_state_dict
for modified state dicts by @RylanC24 in https://github.com/pytorch/torchtune/pull/1967 - Update torchtune generation to be more flexible by @RylanC24 in https://github.com/pytorch/torchtune/pull/1970
- feat: add gemma2b variants by @Optimox in https://github.com/pytorch/torchtune/pull/1835
- typo by @felipemello1 in https://github.com/pytorch/torchtune/pull/1972
- Update QAT: add grad clipping, torch.compile, collate fn by @andrewor14 in https://github.com/pytorch/torchtune/pull/1854
- VQA Documentation by @calvinpelletier in https://github.com/pytorch/torchtune/pull/1974
- Convert all non-rgb images to rgb by @vancoykendall in https://github.com/pytorch/torchtune/pull/1976
- Early fusion multimodal models by @RdoubleA in https://github.com/pytorch/torchtune/pull/1904
- Refactor Recipe State Dict Code by @pbontrager in https://github.com/pytorch/torchtune/pull/1964
- Update KV Cache to use num_kv_heads instead of num_heads by @mirceamironenco in https://github.com/pytorch/torchtune/pull/1961
- Migrate to
epochs: 1
in all configs by @thomasjpfan in https://github.com/pytorch/torchtune/pull/1981 - Make sure CLIP resized pos_embed is contiguous by @gau-nernst in https://github.com/pytorch/torchtune/pull/1986
- Add **quantization_kwargs to
FrozenNF4Linear
andLoRALinear
andDoRALinear
by @joecummings in https://github.com/pytorch/torchtune/pull/1987 - Enables Python 3.13 for nightly builds by @thomasjpfan in https://github.com/pytorch/torchtune/pull/1988
- DOC Fixes custom message transform example by @thomasjpfan in https://github.com/pytorch/torchtune/pull/1983
- Pass quantization_kwargs to CLIP builders by @joecummings in https://github.com/pytorch/torchtune/pull/1994
- Adding MM eval tests / attention bugfixes by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/1989
- Update Qwen2.5 configs by @joecummings in https://github.com/pytorch/torchtune/pull/1999
- nit: Fix/add some type annotations by @bradhilton in https://github.com/pytorch/torchtune/pull/1982
- Fixing
special_tokens
arg inLlama3VisionTransform
by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2000 - Recent updates to the README by @joecummings in https://github.com/pytorch/torchtune/pull/1979
- Bump version to 0.5.0 by @joecummings in https://github.com/pytorch/torchtune/pull/2009
- gemma2 had wrong path to scheduler by @felipemello1 in https://github.com/pytorch/torchtune/pull/2013
- Create _export directory in torchtune by @Jack-Khuu in https://github.com/pytorch/torchtune/pull/2011
- torchrun defaults for concurrent distributed training jobs by @ebsmothers in https://github.com/pytorch/torchtune/pull/2015
- Remove unused FSDP components by @ebsmothers in https://github.com/pytorch/torchtune/pull/2016
- 2D RoPE + CLIP updates by @RdoubleA in https://github.com/pytorch/torchtune/pull/1973
- Some KD recipe cleanup by @ebsmothers in https://github.com/pytorch/torchtune/pull/2020
- Remove lr_scheduler requirement in lora_dpo_single_device by @thomasjpfan in https://github.com/pytorch/torchtune/pull/1991
- chore: remove PyTorch 2.5.0 checks by @JP-sDEV in https://github.com/pytorch/torchtune/pull/1877
- Make tokenize tests readable by @krammnic in https://github.com/pytorch/torchtune/pull/1868
- add flags to readme by @felipemello1 in https://github.com/pytorch/torchtune/pull/2003
- Support for unsharded parameters in state_dict APIs by @RdoubleA in https://github.com/pytorch/torchtune/pull/2023
- [WIP] Reducing eval vision tests runtime by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2022
- log rank zero everywhere by @RdoubleA in https://github.com/pytorch/torchtune/pull/2030
- Add LR Scheduler to full finetune distributed by @parthsarthi03 in https://github.com/pytorch/torchtune/pull/2017
- Fix Qlora/lora for 3.2 vision by @felipemello1 in https://github.com/pytorch/torchtune/pull/2028
- CLIP Text Encoder by @calvinpelletier in https://github.com/pytorch/torchtune/pull/1969
- feat(cli): allow users to download models from Kaggle by @KeijiBranshi in https://github.com/pytorch/torchtune/pull/2002
- remove default to ignore safetensors by @felipemello1 in https://github.com/pytorch/torchtune/pull/2042
- Remove deprecated
TiedEmbeddingTransformerDecoder
by @EmilyIsCoding in https://github.com/pytorch/torchtune/pull/2047 - Use hf transfer as default by @felipemello1 in https://github.com/pytorch/torchtune/pull/2046
- Fix issue in loading mixed precision vocab pruned models during torchtune generation for evaluation by @ifed-ucsd in https://github.com/pytorch/torchtune/pull/2043
- [export] Add exportable attention and kv cache by @larryliu0820 in https://github.com/pytorch/torchtune/pull/2049
- Switch to PyTorch's built-in RMSNorm by @calvinpelletier in https://github.com/pytorch/torchtune/pull/2054
- [export] Add exportable position embedding by @larryliu0820 in https://github.com/pytorch/torchtune/pull/2068
- MM Docs nits by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2067
- Add support for QAT + LoRA by @andrewor14 in https://github.com/pytorch/torchtune/pull/1931
- Add ability to shard custom layers for DPO and LoRA distributed by @joecummings in https://github.com/pytorch/torchtune/pull/2072
- [ez] remove stale pytorch version check by @ebsmothers in https://github.com/pytorch/torchtune/pull/2075
- Fail early with
packed=True
on MM datasets. by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2080 - Error message on
packed=True
for stack exchange dataset by @joecummings in https://github.com/pytorch/torchtune/pull/2079 - Fix nightly tests for qat_lora_fintune_distributed by @andrewor14 in https://github.com/pytorch/torchtune/pull/2085
- Update build_linux_wheels.yaml - Pass test-infra input params by @atalman in https://github.com/pytorch/torchtune/pull/2086
- DPO Activation Offloading by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2087
- Deprecate
SimpoLoss
by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2063 - DPO Recipe Doc by @SalmanMohammadi in https://github.com/pytorch/torchtune/pull/2091
- initial commit by @songhappy in https://github.com/pytorch/torchtune/pull/1953
- Vector Quantized Embeddings by @RdoubleA in https://github.com/pytorch/torchtune/pull/2040
- Fix bug in loading multimodal datasets and update tests accordingly by @joecummings in https://github.com/pytorch/torchtune/pull/2110
- Set gloo process group for FSDP with CPU offload by @ebsmothers in https://github.com/pytorch/torchtune/pull/2108
- Llama 3.3 70B by @pbontrager in https://github.com/pytorch/torchtune/pull/2124
- Llama 3.3 readme updates by @ebsmothers in https://github.com/pytorch/torchtune/pull/2125
- update configs by @felipemello1 in https://github.com/pytorch/torchtune/pull/2107
- Reduce logging output for distributed KD by @joecummings in https://github.com/pytorch/torchtune/pull/2120
- Support Early Exit Loss and/or Layer Dropout by @mostafaelhoushi in https://github.com/pytorch/torchtune/pull/1076
- Update checkpointing directory -> using vLLM and from_pretrained by @felipemello1 in https://github.com/pytorch/torchtune/pull/2074
- pass correct arg by @felipemello1 in https://github.com/pytorch/torchtune/pull/2127
- update configs by @felipemello1 in https://github.com/pytorch/torchtune/pull/2128
- fix qat_lora_test by @felipemello1 in https://github.com/pytorch/torchtune/pull/2131
- guard ckpt imports by @felipemello1 in https://github.com/pytorch/torchtune/pull/2133
- [bug fix] add parents=True by @felipemello1 in https://github.com/pytorch/torchtune/pull/2136
- [bug fix] re-add model by @felipemello1 in https://github.com/pytorch/torchtune/pull/2135
- Update save sizes into GiB by @joecummings in https://github.com/pytorch/torchtune/pull/2143
- [bug fix] remove config download when source is kaggle by @felipemello1 in https://github.com/pytorch/torchtune/pull/2144
- [fix] remove "with_suffix" by @felipemello1 in https://github.com/pytorch/torchtune/pull/2146
- DoRA fixes by @ebsmothers in https://github.com/pytorch/torchtune/pull/2139
- [Fix] Llama 3.2 Vision decoder_trainable flag fixed by @pbontrager in https://github.com/pytorch/torchtune/pull/2150
New Contributors
- @bradhilton made their first contribution in https://github.com/pytorch/torchtune/pull/1940
- @smujjiga made their first contribution in https://github.com/pytorch/torchtune/pull/1946
- @noemotiovon made their first contribution in https://github.com/pytorch/torchtune/pull/1826
- @RylanC24 made their first contribution in https://github.com/pytorch/torchtune/pull/1967
- @vancoykendall made their first contribution in https://github.com/pytorch/torchtune/pull/1976
- @Jack-Khuu made their first contribution in https://github.com/pytorch/torchtune/pull/2011
- @JP-sDEV made their first contribution in https://github.com/pytorch/torchtune/pull/1877
- @KeijiBranshi made their first contribution in https://github.com/pytorch/torchtune/pull/2002
- @EmilyIsCoding made their first contribution in https://github.com/pytorch/torchtune/pull/2047
- @ifed-ucsd made their first contribution in https://github.com/pytorch/torchtune/pull/2043
- @larryliu0820 made their first contribution in https://github.com/pytorch/torchtune/pull/2049
- @atalman made their first contribution in https://github.com/pytorch/torchtune/pull/2086
- @songhappy made their first contribution in https://github.com/pytorch/torchtune/pull/1953
- @mostafaelhoushi made their first contribution in https://github.com/pytorch/torchtune/pull/1076
Full Changelog: https://github.com/pytorch/torchtune/compare/v0.4.0...v0.5.0