transformers/docker
Marc Sun 28de2f4de3
[Quantization] Quanto quantizer (#29023)
* start integration

* fix

* add and debug tests

* update tests

* make pytorch serialization works

* compatible with device_map and offload

* fix tests

* make style

* add ref

* guard against safetensors

* add float8 and style

* fix is_serializable

* Fix shard_checkpoint compatibility with quanto

* more tests

* docs

* adjust memory

* better

* style

* pass tests

* Update src/transformers/modeling_utils.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add is_safe_serialization instead

* Update src/transformers/quantizers/quantizer_quanto.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add QbitsTensor tests

* fix tests

* simplify activation list

* Update docs/source/en/quantization.md

Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* better comment

* Update tests/quantization/quanto_integration/test_quanto.py

Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* Update tests/quantization/quanto_integration/test_quanto.py

Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>

* find and fix edge case

* Update docs/source/en/quantization.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* pass weights_only_kwarg instead

* fix shard_checkpoint loading

* simplify update_missing_keys

* Update tests/quantization/quanto_integration/test_quanto.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* recursion to get all tensors

* block serialization

* skip serialization tests

* fix

* change by cuda:0 for now

* fix regression

* update device_map

* fix doc

* add noteboon

* update torch_dtype

* update doc

* typo

* typo

* remove comm

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: David Corvoysier <david.corvoysier@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
2024-03-15 11:51:29 -04:00
..
transformers-all-latest-gpu Use torch 2.2 for daily CI (model tests) (#29208) 2024-02-23 21:37:08 +08:00
transformers-doc-builder Use python 3.10 for docbuild (#28399) 2024-01-11 14:39:49 +01:00
transformers-gpu TF: TF 2.10 unpin + related onnx test skips (#18995) 2022-09-12 19:30:27 +01:00
transformers-past-gpu Byebye pytorch 1.9 (#24080) 2023-06-16 16:38:23 +02:00
transformers-pytorch-amd-gpu Add deepspeed test to amd scheduled CI (#27633) 2023-12-11 16:33:36 +01:00
transformers-pytorch-deepspeed-amd-gpu Add deepspeed test to amd scheduled CI (#27633) 2023-12-11 16:33:36 +01:00
transformers-pytorch-deepspeed-latest-gpu Use torch 2.2 for deepspeed CI (#29246) 2024-02-27 17:51:37 +08:00
transformers-pytorch-deepspeed-nightly-gpu Update CUDA versions for DeepSpeed (#27853) 2023-12-05 16:15:21 -05:00
transformers-pytorch-gpu [SDPA] Make sure attn mask creation is always done on CPU (#28400) 2024-01-09 11:05:19 +01:00
transformers-pytorch-tpu Rename master to main for notebooks links and leftovers (#16397) 2022-03-25 09:12:23 -04:00
transformers-quantization-latest-gpu [Quantization] Quanto quantizer (#29023) 2024-03-15 11:51:29 -04:00
transformers-tensorflow-gpu Update TF pin in docker image (#25343) 2023-08-07 12:32:34 +02:00