Commit Graph

3665 Commits

Author SHA1 Message Date
Fanli Lin fa7f3cf336
[tests] enable test_pipeline_accelerate_top_p on XPU (#29309)
* use torch_device

* Update tests/pipelines/test_pipelines_text_generation.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix style

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-05 09:16:05 +01:00
Ilyas Moutawwakil 4fc708f98c
Exllama kernels support for AWQ models (#28634)
* added exllama kernels support for awq models

* doc

* style

* Update src/transformers/modeling_utils.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* refactor

* moved exllama post init to after device dispatching

* bump autoawq version

* added exllama test

* style

* configurable exllama kernels

* copy exllama_config from gptq

* moved exllama version check to post init

* moved to quantization dockerfile

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-03-05 03:22:48 +01:00
NielsRogge 836921fdeb
Add UDOP (#22940)
* First draft

* More improvements

* More improvements

* More fixes

* Fix copies

* More improvements

* More fixes

* More improvements

* Convert checkpoint

* More improvements, set up tests

* Fix more tests

* Add UdopModel

* More improvements

* Fix equivalence test

* More fixes

* Redesign model

* Extend conversion script

* Use real inputs for conversion script

* Add image processor

* Improve conversion script

* Add UdopTokenizer

* Add fast tokenizer

* Add converter

* Update README's

* Add processor

* Add fully fledged tokenizer

* Add fast tokenizer

* Use processor in conversion script

* Add tokenizer tests

* Fix one more test

* Fix more tests

* Fix tokenizer tests

* Enable fast tokenizer tests

* Fix more tests

* Fix additional_special_tokens of fast tokenizer

* Fix tokenizer tests

* Fix more tests

* Fix equivalence test

* Rename image to pixel_values

* Rename seg_data to bbox

* More renamings

* Remove vis_special_token

* More improvements

* Add docs

* Fix copied from

* Update slow tokenizer

* Update fast tokenizer design

* Make text input optional

* Add first draft of processor tests

* Fix more processor tests

* Fix decoder_start_token_id

* Fix test_initialization

* Add integration test

* More improvements

* Improve processor, add test

* Add more copied from

* Add more copied from

* Add more copied from

* Add more copied from

* Remove print statement

* Update README and auto mapping

* Delete files

* Delete another file

* Remove code

* Fix test

* Fix docs

* Remove asserts

* Add doc tests

* Include UDOP in exotic model tests

* Add expected tesseract decodings

* Add sentencepiece

* Use same design as T5

* Add UdopEncoderModel

* Add UdopEncoderModel to tests

* More fixes

* Fix fast tokenizer

* Fix one more test

* Remove parallelisable attribute

* Fix copies

* Remove legacy file

* Copy from T5Tokenizer

* Fix rebase

* More fixes, copy from T5

* More fixes

* Fix init

* Use ArthurZ/udop for tests

* Make all model tests pass

* Remove UdopForConditionalGeneration from auto mapping

* Fix more tests

* fixups

* more fixups

* fix the tokenizers

* remove un-necessary changes

* nits

* nits

* replace truncate_sequences_boxes with truncate_sequences for fix-copies

* nit current path

* add a test for input ids

* ids that we should get taken from c9f7a32f57

* nits converting

* nits

* apply ruff

* nits

* nits

* style

* fix slow order of addition

* fix udop fast range as well

* fixup

* nits

* Add docstrings

* Fix gradient checkpointing

* Update code examples

* Skip tests

* Update integration test

* Address comment

* Make fixup

* Remove extra ids from tokenizer

* Skip test

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update year

* Address comment

* Address more comments

* Address comments

* Add copied from

* Update CI

* Rename script

* Update model id

* Add AddedToken, skip tests

* Update CI

* Fix doc tests

* Do not use Tesseract for the doc tests

* Remove kwargs

* Add original inputs

* Update casting

* Fix doc test

* Update question

* Update question

* Use LayoutLMv3ImageProcessor

* Update organization

* Improve docs

* Update forward signature

* Make images optional

* Remove deprecated device argument

* Add comment, add add_prefix_space

* More improvements

* Remove kwargs

---------

Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-04 18:49:02 +01:00
Donggeun Yu ed74d97871
DeformableDETR support bfloat16 (#29232)
* Update ms_deform_attn_cuda.cu

* Update ms_deform_attn_cuda.cuh

* Update modeling_deformable_detr.py

* Update src/transformers/models/deformable_detr/modeling_deformable_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update modeling_deformable_detr.py

* python utils/check_copies.py --fix_and_overwrite

* Fix dtype missmatch error

* Update test_modeling_deformable_detr.py

* Update test_modeling_deformable_detr.py

* Update modeling_deformable_detr.py

* Update modeling_deformable_detr.py

* Support DeformableDETR with bfloat16

* Add test code

* Use AT_DISPATCH_FLOATING_TYPES_AND2

Use AT_DISPATCH_FLOATING_TYPES_AND2

* Update tests/models/deformable_detr/test_modeling_deformable_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/deformable_detr/test_modeling_deformable_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix not found require_torch_bf16 function

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-03-04 14:18:09 +00:00
Zach Mueller 1681a6d452
🚨 Fully revert atomic checkpointing 🚨 (#29370)
Fully revert atomic checkpointing
2024-03-04 06:17:42 -05:00
Nick DeGroot 8ef9862864
Fix OneFormer `post_process_instance_segmentation` for panoptic tasks (#29304)
* 🐛 Fix oneformer instance post processing when using panoptic task type

*  Add unit test for oneformer instance post processing panoptic bug

---------

Co-authored-by: Nick DeGroot <1966472+nickthegroot@users.noreply.github.com>
2024-03-04 11:04:49 +00:00
Fanli Lin aade711d1e
[tests] enable automatic speech recognition pipeline tests on XPU (#29308)
* use require_torch_gpu

* enable on XPU
2024-03-04 08:24:38 +01:00
Zach Mueller 1a7c117df9
Fix deprecated arg issue (#29372)
* Fix deprecated arg issue

* Trainer check too

* Check for dict or dataclass

* Simplify, make config always AcceleratorConfig

* Upstream to Trainer
2024-03-01 12:00:29 -05:00
Marc Sun cec773345a
Fix llama + gemma accelete tests (#29380) 2024-03-01 10:32:36 -05:00
amyeroberts f1b1379f37
[`YOLOS`] Fix - return padded annotations (#29300)
* Fix yolos processing

* Add back slow marker - protects for pycocotools in slow

* Slow decorator goes above copied from header
2024-03-01 09:42:13 +00:00
Sanchit Gandhi 0a0a279e99
🚨🚨[Whisper Tok] Update integration test (#29368)
* [Whisper Tok] Update integration test

* make style
2024-03-01 09:22:31 +00:00
Younes Belkada 50db7ca4e8
FIX [`quantization` / `ESM`] Fix ESM 8bit / 4bit with bitsandbytes (#29329)
* fix ESM 8bit

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fixup

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-03-01 03:01:53 +01:00
Yih-Dar 44fe1a1cc4
Avoid using uncessary `get_values(MODEL_MAPPING)` (#29362)
* more fixes

* more fixes

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-02-29 17:19:17 +08:00
Younes Belkada b647acdb53
FIX [`CI`] `require_read_token` in the llama FA2 test (#29361)
Update test_modeling_llama.py
2024-02-29 04:49:01 +01:00
Younes Belkada 8d8ac9c2df
FIX [`CI`]: Fix failing tests for peft integration (#29330)
fix failing tests for peft integration
2024-02-29 03:56:16 +01:00
Younes Belkada 1aee9afd1c
FIX [`CI` / `starcoder2`] Change starcoder2 path to correct one for slow tests (#29359)
change starcoder2 path to correct one
2024-02-29 03:52:13 +01:00
Arthur 8a8a0a4ae0
[`Llama ROPE`] Fix torch export but also slow downs in forward (#29198)
* remove control flow

* update gptneox

* update ....

* nits

* Actually let's just break. Otherwise we are silently failing which imo is not optimal

* version BC

* fix tests

* fix eager causal

* nit

* add a test

* style

* nits

* nits

* more nits for the test

* update and fix

* make sure cuda graphs are not skipped

* read token is needed for meta llama

* update!

* fiixup

* compile test should be slow

* fix thet fix copies

* stle 🫠
2024-02-28 10:45:53 +01:00
Younes Belkada ad00c482c7
FIX [`Gemma` / `CI`] Make sure our runners have access to the model (#29242)
* pu hf token in gemma tests

* update suggestion

* add to flax

* revert

* fix

* fixup

* forward contrib credits from discussion

---------

Co-authored-by: ArthurZucker <ArthurZucker@users.noreply.github.com>
2024-02-28 06:25:23 +01:00
RaymondLi0 63caa370e6
Starcoder2 model - bis (#29215)
* Copy model

* changes

* misc

* fixes

* add embed and residual dropout (#30)

* misc

* remove rms norm and gated MLP

* remove copied mentions where its not a copy anymore

* remove unused _shape

* copied from mistral instead

* fix copies

* fix copies

* add not doctested

* fix

* fix copyright

* Update docs/source/en/model_doc/starcoder2.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/starcoder2/configuration_starcoder2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/starcoder2/configuration_starcoder2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix doc

* revert some changes

* add fa2 tests

* fix styling nit

* fix

* push dummy docs

---------

Co-authored-by: Joel Lamy-Poirier <joel.lamy-poirier@servicenow.com>
Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-02-28 01:24:34 +01:00
Raushan Turganbay ddf7ac4237
Token level timestamps for long-form generation in Whisper (#29148) 2024-02-27 18:15:26 +00:00
Andrei Panferov e3fc90ae68
Cleaner Cache `dtype` and `device` extraction for CUDA graph generation for quantizers compatibility (#29079)
* input_layernorm as the beacon of hope

* cleaner dtype extraction

* AQLM + CUDA graph test

* is available check

* shorter text test
2024-02-27 09:32:39 +01:00
FredericOdermatt 871ba71dfa
GenerationConfig validate both constraints and force_words_ids (#29163)
GenerationConfig validate both options for constrained decoding: constraints and force_words_ids
2024-02-27 01:43:52 +01:00
Eduardo Pacheco 3fcfbe7549
Adding SegGPT (#27735)
* First commit

* Improvements

* More improvements

* Converted original checkpoint to HF checkpoint

* Fix style

* Fixed forward

* More improvements

* More improvements

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Remove asserts

* Remove unnecessary attributes

* Changed model name to camel case

* Improve forward doc

* Improve tests

* More improvements

* Fix copies

* Fix doc

* Make SegGptImageProcessor more flexible

* Added few-shot test

* Fix style

* Update READMEs and docs

* Update READMEs

* Make inputs required

* Add SegGptForImageSegmentation

* Make tests pass

* Rename to out_indicies

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Fixed naming convention

* Copying SegGptMlp from modeling_sam.py

* Some minor improvements

* Remove mlp_ratio

* Fix docstrings

* Fixed docstring match

* Objects defined before use

* Storing only patch_size and beta for SegGptLoss

* removed _prepare_inputs method

* Removed modified from headers

* Renamed to output_indicies

* Removed unnecessary einsums

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixing issues

* Raise error as soon as possible

* More fixes

* Fix merge

* Added palette to SegGptImageProcessor

* Fixed typo

* Fixed shape typo

* Added permute before doing palette to class mapping

* Fixed style

* Fixed and added tests

* Fixed docstrings

* Matching SegFormer API for post_processing_semantic_segmentation

* Fixed copies

* Fixed SegGptImageProcessor to handle both binary and RGB masks

* Updated docstrings of SegGptImageProcessor

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update docs/source/en/model_doc/seggpt.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/convert_seggpt_to_hf.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/seggpt/test_modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Object definitions above & fix style

* Renamed output_indices to intermediate_feature_indices

* Removed unnecessary check on bool_masked_pos

* Loss first in the outputs

* Added validation for do_normalize

* Improved SegGptImageProcessor and added new tests

* Added comment

* Added docstrings to SegGptLoss

* Reimplemented ensemble condition logic in SegGptEncoder

* Update src/transformers/models/seggpt/__init__.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/convert_seggpt_to_hf.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Updated docstrings to use post_process_semantic_segmentation

* Fixed typo on docstrings

* moved pixel values test to test_image_processing_seggpt

* Addressed comments

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/image_processing_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated docstrings for SegGptLoss

* Address comments

* Added SegGpt example to model docs

* Update src/transformers/models/seggpt/modeling_seggpt.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* moved patchify and unpatchify

* Rename checkpoint

* Renamed intermediate_features to intermediate_hidden_states for consistency

* Update src/transformers/models/seggpt/configuration_seggpt.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Replaced post_process_masks for post_process_semantic_segmentation in the docs

---------

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Niels <niels.rogge1@gmail.com>
Co-authored-by: Eduardo Pacheco <eduardo.pacheco@limehome.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-26 18:17:19 +00:00
Raushan Turganbay 8f2f0f0f85
Track each row separately for stopping criteria (#29116) 2024-02-26 16:06:16 +00:00
Merve Noyan 7c4995f93d
Add feature extraction mapping for automatic metadata update (#28944)
* add feature extraction mapping

* added prefix

* ruff check

* minor fix

* Update modeling_auto.py

* fix typo

* remove prefix to make variable public/importable

* Update src/transformers/models/auto/modeling_auto.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fixes

* addressed comments

* nit

* fix-copies

* remove from tests

* this should fix

* Update tests/models/convnextv2/test_modeling_convnextv2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* nits

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-26 10:35:37 +00:00
Matt 371b572e55
Allow remote code repo names to contain "." (#29175)
* stash commit

* stash commit

* It works!

* Remove unnecessary change

* We don't actually need the cache_dir!

* Update docstring

* Add test

* Add test with custom cache dir too

* Update model repo path
2024-02-23 12:46:31 +00:00
Sanchit Gandhi 2a9b1f80c4
[Gemma] Fix eager attention (#29187)
* fix modelling code

* add tests

* fix tests

* add some logit tests

* style

* fix fix
2024-02-22 01:07:52 +01:00
Arthur 594c1277b2
[ `gemma`] Adds support for Gemma 💎 (#29167)
* inital commit

* update

* update conversion checkpoint

* update conversion script

* nits

* some fixes

* nits

* merge

* fix permute

* nits

* fix

* nits

* nits

* nits

* fix rope

* fix both rope

* nites

* style

* make sure flax works

* fix flax init code

* fix foward

* nits

* print flax generation out

* current code

* nits

* SIIIIIIIIIIIIIIIIIII

* update

* add new tokenizer

* correct fast tokenizer

* fix conversion

* more comments

* fix modeling and conversion

* nits and nits

* nits testing

* add some tokenization tests

* add some edge cases

* add slow tests and fix them

* fixup

* fix copies for modeling

* fix copies

* add 7B slow tests

* fix

* fix

* fix tests

* make tokenizer cis go green

* styling

* last tokenizer nits

* update jax tests

* fix flax for 7b

* add jit testing 🤗

* cleanups

* isolated nit, inv_freq for rotary_emb.inv_freq

* propagate to jax

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* adjust test

* fix conversion script

* change name

* correct file names

* update conversion script

* Fix bos and eos token ids in the model configuration (#3)

* update modelling

* update conversion script

* add static cache for gemma

* fix sdpa generate

* fix batched

* multiple fixes

* fix FA2

* final fix

* Rename a few missing strings and filenames (#4)

* merge with upstream main

* fix copies

* fix copies

* fix fixup

* fix fixup

* fix

* fix

* final tests

* fix fx gemma tests

* fix fx bf16/fp16 tests

* update slow fx tests

* fx slow tests: one logits, one generation

* move jit test standalone

* Apply suggestions from code review

* nits

* tokenizer updates

* more tokenization updates: custom GemmaSentencepieceExtrator

* style

* Update src/transformers/cache_utils.py

* Update src/transformers/models/gemma/__init__.py

* Update tests/models/gemma/test_modeling_flax_gemma.py

* small nits

* style

* update tokenization test

* fix the rotary embedding

* with style

* fix slow tests

* WARNING this commit might be very important for precisions

* Update tests/models/gemma/test_modeling_flax_gemma.py

* Update src/transformers/models/gemma/configuration_gemma.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update src/transformers/models/gemma/modeling_flax_gemma.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* small nits here and there!

* forgotten nit

* remove on the fly computation of inv_freq

* revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_flax_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* nit conversion script link

* fix some tests

* add not doctest and pr doctest

* repo consistency

* fix last CIs 🚀

* update all readmes

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-02-21 14:21:28 +01:00
Ekaterina Aidova 1d0ea7abe0
support SDPA Attention in stablelm (#29106)
* support SDPA Attention in stablelm

* add integration test

* add fallback for output_attentions

* Update src/transformers/models/stablelm/modeling_stablelm.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/stablelm/test_modeling_stablelm.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/stablelm/modeling_stablelm.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* handle non-contiguous states

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-02-21 13:12:49 +01:00
Joao Gante 3994fa5baf
🚨 Llama: update rope scaling to match static cache changes (#29143) 2024-02-21 09:47:41 +00:00
amyeroberts e770f0316d
[`pipeline`] Add pool option to image feature extraction pipeline (#28985)
* Add pool option

* PR comments - error message and exact outputs check
2024-02-20 20:22:08 +00:00
Joao Gante 857fd8eaab
Generate: missing generation config eos token setting in encoder-decoder tests (#29146) 2024-02-20 16:17:51 +00:00
Pablo Montalvo 1c81132e80
Raise unused kwargs image processor (#29063)
* draft processor arg capture

* add missing vivit model

* add new common test for image preprocess signature

* fix quality

* fix up

* add back missing validations

* quality

* move info level to warning for unused kwargs
2024-02-20 16:20:20 +01:00
amyeroberts 0996a10077
Revert low cpu mem tie weights (#29135)
* Revert "Add tie_weights() to LM heads and set bias in set_output_embeddings() (#28948)"

This reverts commit 725f4ad1cc.

* Revert "Patch to skip failing `test_save_load_low_cpu_mem_usage` tests (#29043)"

This reverts commit 4156f517ce.
2024-02-20 12:06:46 +00:00
Arthur 15cfe38942
[`Core tokenization`] `add_dummy_prefix_space` option to help with latest issues (#28010)
* add add_dummy_prefix_space option to slow

* checking kwargs might be better. Should be there for all spm tokenizer IMO

* nits

* fix copies

* more copied

* nits

* add prefix space

* nit

* nits

* Update src/transformers/convert_slow_tokenizer.py

* fix inti

* revert wrong styling

* fix

* nits

* style

* updates

* make sure we use slow tokenizer for conversion instead of looking for the decoder

* support llama ast well

* update llama tokenizer fast

* nits

* nits nits nits

* update the doc

* update

* update to fix tests

* skip unrelated tailing test

* Update src/transformers/convert_slow_tokenizer.py

* add proper testing

* test decode as well

* more testing

* format

* fix llama test

* Apply suggestions from code review
2024-02-20 12:50:31 +01:00
Younes Belkada efdd436663
FIX [`PEFT` / `Trainer` ] Handle better peft + quantized compiled models (#29055)
* handle peft + compiled models

* add tests

* fixup

* adapt from suggestions

* clarify comment
2024-02-20 12:45:08 +01:00
Joao Gante a7755d2409
Generate: unset GenerationConfig parameters do not raise warning (#29119) 2024-02-20 11:34:31 +00:00
Joao Gante 7d312ad2e9
Llama: fix batched generation (#29109) 2024-02-20 10:23:17 +00:00
Younes Belkada ff76e7c212
FIX [`bnb` / `tests`] Propagate the changes from #29092 to 4-bit tests (#29122)
* forgot to push the changes for 4bit ..

* trigger CI
2024-02-20 11:11:15 +01:00
Younes Belkada f7ef7cec6c
FEAT [`Trainer` / `bnb`]: Add RMSProp from `bitsandbytes` to HF `Trainer` (#29082)
* add RMSProp to Trainer

* revert some change

* Update src/transformers/trainer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-20 02:43:02 +01:00
Titus 5ce90f3212
Bnb test fix for different hardwares (#29066)
* generated text on A10G

* generated text in CI

* Apply suggestions from code review

add explanatory comments

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-02-19 18:04:44 +00:00
Max Baak 08cd694ef0
ENH: added new output_logits option to generate function (#28667)
output_logits option behaves like output_scores, but returns the raw, unprocessed prediction logit scores,
ie. the values before they undergo logit processing and/or warping. The latter happens by default for the
regular output scores.

It's useful to have the unprocessed logit scores in certain circumstances. For example, unprocessed logit scores
are very useful with causallm models when one wants to determine the probability of a certain answer, e.g.
when asking a question with a yes/no answer. In that case getting the next-token probabilities of both "yes" and
"no" (and/or their relative ratio) is of interest for classification. The reason for getting these _before_ logit
processing and/or warping is b/c a) that can change the probabilities or b) reject the tokens of interest / reduce
the number of tokens to just 1.

For an example use-case see paper TabLLM: Few-shot Classification of Tabular Data with Large Language Models
by Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi Jiang, and David Sontag.
https://arxiv.org/abs/2210.10723

In addition:
- added dedicated unit test: tests/generation/test_utils/test_return_unprocessed_logit_scores
  which tests return of logics with output_logits=True in generation.
- set output_logits=True in all other generation unit tests, that also have output_scores=True.

Implemented @gante's and @amyeroberts review feedback

Co-authored-by: kx79wq <max.baak@ing.com>
2024-02-19 17:34:17 +00:00
Lysandre Debut 9830858671
Fix the `bert-base-cased` tokenizer configuration test (#29105)
Fix test
2024-02-19 13:23:25 +01:00
Younes Belkada a75a6c9315
FIX [`bnb` / `tests`]: Fix currently failing bnb tests (#29092)
Update test_mixed_int8.py
2024-02-19 10:39:12 +01:00
Matt 2f1003be86
Add chat support to text generation pipeline (#28945)
* Add chat support to text generation pipeline

* Better handling of single elements

* Deprecate ConversationalPipeline

* stash commit

* Add missing add_special_tokens kwarg

* Update chat templating docs to refer to TextGenerationPipeline instead of ConversationalPipeline

* Add TF tests

* @require_tf

* Add type hint

* Add specific deprecation version

* Remove unnecessary do_sample

* Remove todo - the discrepancy has been resolved

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/pipelines/text_generation.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-16 16:41:01 +00:00
Zach Mueller 636b03244c
Fix trainer test wrt DeepSpeed + auto_find_bs (#29061)
* FIx trainer test

* Update tests/trainer/test_trainer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-16 10:04:24 -05:00
Richard Lee be42c24d14
Honor trust_remote_code for custom tokenizers (#28854)
* pass through trust_remote_code for dynamically loading unregistered tokenizers specified by config
add test

* change directories back to previous directory after test

* fix ruff check

* Add a note to that block for future in case we want to remove it later

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
2024-02-16 13:40:23 +00:00
Sourab Mangrulkar b262808656
fix failing trainer ds tests (#29057) 2024-02-16 17:18:45 +05:30
Jonathan Mamou 258da40efd
fix num_assistant_tokens with heuristic schedule (#28759)
* fix heuristic num_assistant_tokens_schedule

* Update src/transformers/generation/configuration_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update utils.py

check that candidate_generator.assistant_model exists since some some speculations (like ngram and PLD) don't have assistant_model attribute

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/generation/test_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* merge conflict

* fix docstring

* make fixup

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-16 11:44:58 +00:00
Raushan Turganbay aee11fe427
Fix max_length criteria when using inputs_embeds (#28994)
* fix max_length for inputs_embeds

* make style

* Update src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Static Cache: load models with MQA or GQA (#28975)

* fix

* fix tests

* fix tests

* Update src/transformers/generation/utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* more fixes

* make style

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-16 11:25:12 +00:00
Lysandre Debut f497f564bb
Update all references to canonical models (#29001)
* Script & Manual edition

* Update
2024-02-16 08:16:58 +01:00
amyeroberts 4156f517ce
Patch to skip failing `test_save_load_low_cpu_mem_usage` tests (#29043)
* Patch to skip currently failing tests

* Whoops - wrong place
2024-02-15 17:26:33 +00:00
Donggeun Yu 5b6fa2306a
DeformableDetrModel support fp16 (#29013)
* Update ms_deform_attn_cuda.cu

* Update ms_deform_attn_cuda.cuh

* Update modeling_deformable_detr.py

* Update src/transformers/models/deformable_detr/modeling_deformable_detr.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update modeling_deformable_detr.py

* python utils/check_copies.py --fix_and_overwrite

* Fix dtype missmatch error

* Update test_modeling_deformable_detr.py

* Update test_modeling_deformable_detr.py

* Update modeling_deformable_detr.py

* Update modeling_deformable_detr.py

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-15 12:31:09 +00:00
Arthur f3788b09e1
Fix static generation when compiling! (#28937)
* wow I was scared!

* fix everything

* nits

* make it BC?

* add todo

* nits

* is_tracing should still be used to pass tracing tests

* nits

* some nits to make sure genration works with static cache uncompiled

* fix sdpa

* fix FA2 for both static and dynamic in a better way?

* style

* fix-copies

* fix fix copies

* fix sequential beam searcg

* style

* use `keys_to_ignore`

* nit

* correct dtype inference when init

* :( the fix for FA2 is still not optimal to investigate!

* styling

* nits

* nit

* this might work better

* add comment

* Update src/transformers/models/llama/modeling_llama.py

* "position_ids" -> "cache_position"

* style

* nit

* Remove changes that should no be propagatted just yet

* Apply suggestions from code review

* Styling

* make sure we raise an errir for static cache with FA2 enabled

* move  to the bottom of the signature

* style

* Update src/transformers/models/llama/modeling_llama.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_llama.py

* nit in the name

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-02-15 06:27:40 +01:00
Younes Belkada 7a0fccc6eb
FIX [`Trainer` / tags]: Fix trainer + tags when users do not pass `"tags"` to `trainer.push_to_hub()` (#29009)
* fix trainer tags

* add test
2024-02-14 23:56:35 +01:00
amyeroberts 0199a484eb
Backbone kwargs in config (#28784)
* Enable instantiating model with pretrained backbone weights

* Clarify pretrained import

* Use load_backbone instead

* Add backbone_kwargs to config

* Pass kwargs to constructors

* Fix up

* Input verification

* Add tests

* Tidy up

* Update tests/utils/test_backbone_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-02-14 20:46:44 +00:00
JB (Don) 725f4ad1cc
Add tie_weights() to LM heads and set bias in set_output_embeddings() (#28948)
* Add tie_weights() to LM heads and set bias in set_output_embeddings()

The bias were not tied correctly in some LM heads, and this change should fix that.

* Moving test_save_and_load_low_cpu_mem_usage to ModelTesterMixin

* Adding _tie_weights() to MPNet and Vilt

* Skip test for low cpu mem usage for Deta/DeformableDetr since they cannot init on meta device

* Rename to test name to save_load to match the convention
2024-02-14 20:39:01 +00:00
Raushan Turganbay 354775bc57
Fix flaky test vision encoder-decoder generate (#28923) 2024-02-14 15:40:57 +00:00
Zach Mueller 0507e69d34
Introduce AcceleratorConfig dataclass (#28664)
* Introduce acceleratorconfig dataclass

* Extra second warn

* Move import

* Try moving import under is_accelerate_available

* Quality

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Clean

* Remove to_kwargs

* Change version

* Improve tests by including dispatch and split batches

* Improve reliability

* Update tests/trainer/test_trainer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fixup tests and review nits

* Make tests pass

* protect import

* Protect import

* Empty-Commit

* Make training_args.to_dict handle the AcceleratorConfig

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-14 10:18:09 -05:00
Huazhong Ji 69ca640dd6
Set the dataset format used by `test_trainer` to float32 (#28920)
Co-authored-by: unit_test <test@unit.com>
2024-02-14 13:55:12 +00:00
Andrei Panferov 1ecf5f7c98
AQLM quantizer support (#28928)
* aqlm init

* calibration and dtypes

* docs

* Readme update

* is_aqlm_available

* Simpler link in docs

* Test TODO real reference

* init _import_structure fix

* AqlmConfig autodoc

* integration aqlm

* integrations in tests

* docstring fix

* legacy typing

* Less typings

* More kernels information

* Performance -> Accuracy

* correct tests

* remoced multi-gpu test

* Update docs/source/en/quantization.md

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Brought back multi-gpu tests

* Update src/transformers/integrations/aqlm.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update tests/quantization/aqlm_integration/test_aqlm.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------

Co-authored-by: Andrei Panferov <blacksamorez@yandex-team.ru>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2024-02-14 09:25:41 +01:00
NielsRogge 63ffd56d02
Add SiglipForImageClassification and CLIPForImageClassification (#28952)
* First draft

* Add CLIPForImageClassification

* Remove scripts

* Fix doctests
2024-02-14 08:41:31 +01:00
Jonathan Tow de6029a059
Add `StableLM` (#28810)
* Add `StableLM`

* fix(model): re-create from `huggingface-cli add-new-model-like persimmon`

* fix: re-add changes to address comments

* fix(readme): add links to paper

* fix(tokenization_auto): remove `GPTNeoXTokenizerFastFast` ref

* fix(tests): re-add `@slow` decorator to integration tests

* fix(tests): import slow...

* fix(readme_hd): remove whitespace edit

* fix(tokenizer): auto tokenizer tuple

* skip doctests for `modeling_stablelm`
2024-02-14 07:15:18 +01:00
amyeroberts bd4b83e1ba
[`DETR`] Update the processing to adapt masks & bboxes to reflect padding (#28363)
* Update the processing so bbox coords are adjusted for padding

* Just pad masks

* Tidy up, add tests

* Better tests

* Fix yolos and mark as slow for pycocotols

* Fix yolos - return_tensors

* Clarify padding and normalization behaviour
2024-02-13 18:27:06 +00:00
Joao Gante 3e70a207df
Static Cache: load models with MQA or GQA (#28975) 2024-02-13 09:58:19 +00:00
Hiroshi Matsuda da20209dbc
Add sudachi_projection option to BertJapaneseTokenizer (#28503)
* add sudachi_projection option

* Upgrade sudachipy>=0.6.8

* add a test case for sudachi_projection

* Compatible with older versions of SudachiPy

* make fixup

* make style

* error message for unidic download

* revert jumanpp test cases

* format options for sudachi_projection

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* format options for sudachi_split_mode and sudachi_dict_type

* comment

* add tests for full_tokenizer kwargs

* pass projection arg directly

* require_sudachi_projection

* make style

* revert upgrade sudachipy

* check is_sudachi_projection_available()

* revert dependency_version_table and bugfix

* style format

* simply raise ImportError

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* simply raise ImportError

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-02-13 04:47:20 +01:00
Arthur b44567538b
[`NllbTokenizer`] refactor with added tokens decoder (#27717)
* refactor with addedtokens decoder

* style

* get rid of lang code to id

* style

* keep some things for BC

* update tests

* add the mask token at the end of the vocab

* nits

* nits

* fix final tests

* style

* nits

* Update src/transformers/models/nllb/tokenization_nllb_fast.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* nits

* style?

* Update src/transformers/convert_slow_tokenizer.py

* make it a tad bit more custom

* ruff please stop
Co-Authored by avidale

<dale.david@mail.ru>

* Update
Co-authored-by: avidale
<dale.david@mail.ru>

* Update
Co-authored-by: avidale <dale.david@mail.ru>

* oupts

* ouft

* nites

* test

* fix the remaining failing tests

* style

* fix failing test

* ficx other test

* temp dir + test the raw init

* update test

* style

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-13 03:49:20 +01:00
Klaus Hipp fe3df9d5b3
[Docs] Add language identifiers to fenced code blocks (#28955)
Add language identifiers to code blocks
2024-02-12 10:48:31 -08:00
Joao Gante e30bbb2685
Tests: tag `test_save_load_fast_init_from_base` as flaky (#28930) 2024-02-12 14:43:34 +00:00
NielsRogge f278ef20ed
[Nougat] Fix pipeline (#28242)
* Fix pipeline

* Remove print statements

* Address comments

* Address issue

* Remove unused imports
2024-02-12 10:21:15 +01:00
Raushan Turganbay d628664688
Support batched input for decoder start ids (#28887)
* support batched input for decoder start ids

* Fix typos

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* minor changes

* fix: decoder_start_id as list

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

* empty commit

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-02-08 16:00:53 +00:00
Arthur 115ac94d06
[`Core generation`] Adds support for static KV cache (#27931)
Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-02-08 11:50:34 +01:00
Daniel Korat abf8f54a01
⚠️ Raise `Exception` when trying to generate 0 tokens ⚠️ (#28621)
* change warning to exception

* Update src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* validate `max_new_tokens` > 0 in `GenerationConfig`

* fix truncation test parameterization in `TextGenerationPipelineTests`

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
2024-02-07 13:42:01 +01:00
Yih-Dar 40658be461
Hotfix - make `torchaudio` get the correct version in `torch_and_flax_job` (#28899)
* check

* check

* check

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-02-06 21:00:42 +01:00
Yih-Dar 76b4f666f5
Revert "[WIP] Hard error when ignoring tensors." (#28898)
Revert "[WIP] Hard error when ignoring tensors. (#27484)"

This reverts commit 2da28c4b41.
2024-02-06 17:18:30 +01:00
Yih-Dar 6529a5b5c1
Fix `FastSpeech2ConformerModelTest` and skip it on CPU (#28888)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-02-06 11:05:23 +01:00
nakranivaibhav 2e7c942c81
Adds LlamaForQuestionAnswering class in modeling_llama.py along with AutoModel Support (#28777)
* This is a test commit

* testing commit

* final commit with some changes

* Removed copy statement

* Fixed formatting issues

* Fixed error added past_key_values in the forward method

* Fixed a trailing whitespace. Damn the formatting rules are strict

* Added the copy statement
2024-02-06 03:41:42 +01:00
amyeroberts ba3264b4e8
Image Feature Extraction pipeline (#28216)
* Draft pipeline

* Fixup

* Fix docstrings

* Update doctest

* Update pipeline_model_mapping

* Update docstring

* Update tests

* Update src/transformers/pipelines/image_feature_extraction.py

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Fix docstrings - review comments

* Remove pipeline mapping for composite vision models

* Add to pipeline tests

* Remove for flava (multimodal)

* safe pil import

* Add requirements for pipeline run

* Account for super slow efficientnet

* Review comments

* Fix tests

* Swap order of kwargs

* Use build_pipeline_init_args

* Add back FE pipeline for Vilt

* Include image_processor_kwargs in docstring

* Mark test as flaky

* Update TODO

* Update tests/pipelines/test_pipelines_image_feature_extraction.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Add license header

---------

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-02-05 14:50:07 +00:00
Yoach Lacombe 7addc9346c
Correct wav2vec2-bert inputs_to_logits_ratio (#28821)
* Correct wav2vec2-bert inputs_to_logits_ratio

* correct ratio

* correct ratio, clean asr pipeline

* refactor on one line
2024-02-05 13:14:47 +00:00
Nicolas Patry 2da28c4b41
[WIP] Hard error when ignoring tensors. (#27484)
* [WIP] Hard error when ignoring tensors.

* Better selection/error when saving a checkpoint.

- Find all names we should normally drop (those are in the transformers
  config)
- Find all disjoint tensors (for those we can safely trigger a copy to
  get rid of the sharing before saving)
- Clone those disjoint tensors getting rid of the issue
- Find all identical names (those should be declared in the config
  but we try to find them all anyway.)
- For all identical names:
  - If they are in the config, just ignore them everything is fine
  - If they are not, warn about them.
- For all remainder tensors which are shared yet neither identical NOR
  disjoint. raise a hard error.

* Adding a failing test on `main` that passes here.

* We don't need to keep the subfolder logic in this test.

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-02-05 09:17:24 +01:00
amyeroberts 3d2900e829
Mark `test_encoder_decoder_model_generate` for `vision_encoder_deocder` as flaky (#28842)
Mark test as flaky
2024-02-02 16:57:08 +00:00
Yih-Dar a7cb92aa03
fix / skip (for now) some tests before switch to torch 2.2 (#28838)
* fix / skip some tests before we can switch to torch 2.2

* style

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-02-02 14:11:50 +01:00
Juri Ganitkevitch ec29d25d9f
Add missing None check for hf_quantizer (#28804)
* Add missing None check for hf_quantizer

* Add test, fix logic.

* make style

* Switch test model to Mistral

* Comment

* Update tests/test_modeling_utils.py

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-02-02 09:34:12 +01:00
Klaus Hipp 721ee783ca
[Docs] Fix spelling and grammar mistakes (#28825)
* Fix typos and grammar mistakes in docs and examples

* Fix typos in docstrings and comments

* Fix spelling of `tokenizer` in model tests

* Remove erroneous spaces in decorators

* Remove extra spaces in Markdown link texts
2024-02-02 08:45:00 +01:00
fxmarty 709dc43239
Fix symbolic_trace with kv cache (#28724)
* fix symbolic_trace with kv cache

* comment & better test
2024-02-01 09:45:02 +01:00
JB (Don) 0d26abdd3a
Adding [T5/MT5/UMT5]ForTokenClassification (#28443)
* Adding [T5/MT5/UMT5]ForTokenClassification

* Add auto mappings for T5ForTokenClassification and variants

* Adding ForTokenClassification to the list of models

* Adding attention_mask param to the T5ForTokenClassification test

* Remove outdated comment in test

* Adding EncoderOnly and Token Classification tests for MT5 and UMT5

* Fix typo in umt5 string

* Add tests for all the existing MT5 models

* Fix wrong comment in dependency_versions_table

* Reverting change to common test for _keys_to_ignore_on_load_missing

The test is correctly picking up redundant keys in _keys_to_ignore_on_load_missing.

* Removing _keys_to_ignore_on_missing from MT5 since the key is not used in the model

* Add fix-copies to MT5ModelTest
2024-02-01 03:53:49 +01:00
Joao Gante beb2a09687
DeepSpeed: hardcode `torch.arange` dtype on `float` usage to avoid incorrect initialization (#28760) 2024-01-31 14:39:07 +00:00
Kian Sierra McGettigan f7076cd346
Flax mistral (#26943)
* direct copy from llama work

* mistral modules forward pass working

* flax mistral forward pass with sliding window

* added tests

* added layer collection approach

* Revert "added layer collection approach"

This reverts commit 0e2905bf22.

* Revert "Revert "added layer collection approach""

This reverts commit fb17b6187a.

* fixed attention outputs

* added mistral to init and auto

* fixed import name

* fixed layernorm weight dtype

* freeze initialized weights

* make sure conversion consideres bfloat16

* added backend

* added docstrings

* added cache

* fixed sliding window causal mask

* passes cache tests

* passed all tests

* applied make style

* removed commented out code

* applied fix-copies ignored other model changes

* applied make fix-copies

* removed unused functions

* passed generation integration test

* slow tests pass

* fixed slow tests

* changed default dtype from jax.numpy.float32 to float32 for docstring check

* skip cache test  for FlaxMistralForSequenceClassification since if pad_token_id in input_ids it doesn't score previous input_ids

* updated checkpoint since from_pt not included

* applied black style

* removed unused args

* Applied styling and fixup

* changed checkpoint for doc back

* fixed rf after adding it to hf hub

* Add dummy ckpt

* applied styling

* added tokenizer to new ckpt

* fixed slice format

* fix init and slice

* changed ref for placeholder TODO

* added copies from Llama

* applied styling

* applied fix-copies

* fixed docs

* update weight dtype reconversion for sharded weights

* removed Nullable input ids

* Removed unnecessary output attentions in Module

* added embedding weight initialziation

* removed unused past_key_values

* fixed deterministic

* Fixed RMS Norm and added copied from

* removed input_embeds

* applied make style

* removed nullable input ids from sequence classification model

* added copied from GPTJ

* added copied from Llama on FlaxMistralDecoderLayer

* added copied from to FlaxMistralPreTrainedModel methods

* fix test deprecation warning

* freeze gpt neox random_params and fix copies

* applied make style

* fixed doc issue

* skipped docstring test to allign # copied from

* applied make style

* removed FlaxMistralForSequenceClassification

* removed unused padding_idx

* removed more sequence classification

* removed sequence classification

* applied styling and consistency

* added copied from in tests

* removed sequence classification test logic

* applied styling

* applied make style

* removed freeze and fixed copies

* undo test change

* changed repeat_kv to tile

* fixed to key value groups

* updated copyright year

* split casual_mask

* empty to rerun failed pt_flax_equivalence test FlaxWav2Vec2ModelTest

* went back to 2023 for tests_pr_documentation_tests

* went back to 2024

* changed tile to repeat

* applied make style

* empty for retry on Wav2Vec2
2024-01-31 14:19:02 +01:00
Patrick von Platen 65a926e82b
[Whisper] Refactor forced_decoder_ids & prompt ids (#28687)
* up

* Fix more

* Correct more

* Fix more tests

* fix fast tests

* Fix more

* fix more

* push all files

* finish all

* make style

* Fix timestamp wrap

* make style

* make style

* up

* up

* up

* Fix lang detection behavior

* Fix lang detection behavior

* Add lang detection test

* Fix lang detection behavior

* make style

* Update src/transformers/models/whisper/generation_whisper.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* better error message

* make style tests

* add warning

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-01-31 14:02:07 +02:00
tom-p-reichel ae0c27adfa
don't initialize the output embeddings if we're going to tie them to input embeddings (#28192)
* test that tied output embeddings aren't initialized on load

* don't initialize the output embeddings if we're going to tie them to the input embeddings
2024-01-31 02:19:18 +01:00
Matt 415e9a0980
Add tf_keras imports to prepare for Keras 3 (#28588)
* Port core files + ESM (because ESM code is odd)

* Search-replace in modelling code

* Fix up transfo_xl as well

* Fix other core files + tests (still need to add correct import to tests)

* Fix cookiecutter

* make fixup, fix imports in some more core files

* Auto-add imports to tests

* Cleanup, add imports to sagemaker tests

* Use correct exception for importing tf_keras

* Fixes in modeling_tf_utils

* make fixup

* Correct version parsing code

* Ensure the pipeline tests correctly revert to float32 after each test

* Ensure the pipeline tests correctly revert to float32 after each test

* More tf.keras -> keras

* Add dtype cast

* Better imports of tf_keras

* Add a cast for tf.assign, just in case

* Fix callback imports
2024-01-30 17:26:36 +00:00
amyeroberts 2fa1c808ae
[`Backbone`] Use `load_backbone` instead of `AutoBackbone.from_config` (#28661)
* Enable instantiating model with pretrained backbone weights

* Remove doc updates until changes made in modeling code

* Use load_backbone instead

* Add use_timm_backbone to the model configs

* Add missing imports and arguments

* Update docstrings

* Make sure test is properly configured

* Include recent DPT updates
2024-01-30 16:54:09 +00:00
Poedator d78e78a0e4
`HfQuantizer` class for quantization-related stuff in `modeling_utils.py` (#26610)
* squashed earlier commits for easier rebase

* rm rebase leftovers

* 4bit save enabled @quantizers

* TMP gptq test use exllama

* fix AwqConfigTest::test_wrong_backend for A100

* quantizers AWQ fixes

* _load_pretrained_model low_cpu_mem_usage branch

* quantizers style

* remove require_low_cpu_mem_usage attr

* rm dtype arg from process_model_before_weight_loading

* rm config_origin from Q-config

* rm inspect from q_config

* fixed docstrings in QuantizationConfigParser

* logger.warning fix

* mv is_loaded_in_4(8)bit to BnbHFQuantizer

* is_accelerate_available error msg fix in quantizer

* split is_model_trainable in bnb quantizer class

* rm llm_int8_skip_modules as separate var in Q

* Q rm todo

* fwd ref to HFQuantizer in type hint

* rm note re optimum.gptq.GPTQQuantizer

* quantization_config in __init__ simplified

* replaced NonImplemented with  create_quantized_param

* rm load_in_4/8_bit deprecation warning

* QuantizationConfigParser refactoring

* awq-related minor changes

* awq-related changes

* awq config.modules_to_not_convert

* raise error if no q-method in q-config in args

* minor cleanup

* awq quantizer docstring

* combine common parts in bnb process_model_before_weight_loading

* revert test_gptq

* .process_model_ cleanup

* restore dict config warning

* removed typevars in quantizers.py

* cleanup post-rebase 16 jan

* QuantizationConfigParser classmethod refactor

* rework of handling of unexpected aux elements of bnb weights

* moved q-related stuff from save_pretrained to quantizers

* refactor v1

* more changes

* fix some tests

* remove it from main init

* ooops

* Apply suggestions from code review

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* fix awq issues

* fix

* fix

* fix

* fix

* fix

* fix

* add docs

* Apply suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/hf_quantizer.md

* address comments

* fix

* fixup

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* address final comment

* update

* Update src/transformers/quantizers/base.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/quantizers/auto.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

* add kwargs update

* fixup

* add `optimum_quantizer` attribute

* oops

* rm unneeded file

* fix doctests

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-30 02:48:25 +01:00
Omar Sanseviero a989c6c6eb
Don't allow passing `load_in_8bit` and `load_in_4bit` at the same time (#28266)
* Update quantization_config.py

* Style

* Protect from setting directly

* add tests

* Update tests/quantization/bnb/test_4bit.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-01-30 01:43:40 +01:00
amyeroberts 9e8f35fa28
Mark test_constrained_beam_search_generate as flaky (#28757)
* Make test_constrained_beam_search_generate as flaky

* Update tests/generation/test_utils.py
2024-01-29 15:22:25 +00:00
Wesley Gifford f72c7c22d9
PatchtTST and PatchTSMixer fixes (#28083)
* 🐛 fix .max bug

* remove prediction_length from regression output dimensions

* fix parameter names, fix output names, update tests

* ensure shape for PatchTST

* ensure output shape for PatchTSMixer

* update model, batch, and expected for regression distribution test

* update test expected

Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>

* Update tests/models/patchtst/test_modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/patchtst/test_modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/patchtst/test_modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* standardize on patch_length

Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>

* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Make arguments more explicit

Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>

* adjust prepared inputs

Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>

---------

Signed-off-by: Wesley M. Gifford <wmgifford@us.ibm.com>
Co-authored-by: Wesley M. Gifford <wmgifford@us.ibm.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-01-29 10:09:26 +00:00
Angela Yi 243e186efb
Add serialization logic to pytree types (#27871)
* Add serialized type name to pytrees

* Modify context

* add serde test
2024-01-29 10:41:20 +01:00
Sanchit Gandhi de13a951b3
[Flax] Update no init test for Flax v0.7.1 (#28735) 2024-01-26 18:20:39 +00:00
NielsRogge 963db81a5a
Add Depth Anything (#28654)
* First draft

* More improvements

* More improvements

* More improvements

* More improvements

* Add docs

* Remove file

* Add copied from

* Address comments

* Address comments

* Address comments

* Fix style

* Update docs

* Convert all checkpoints, add integration test

* Rename checkpoints

* Add pretrained backbone attributes

* Fix default config

* Address comment

* Add figure to docs

* Fix bug thanks to @xenova

* Update conversion script

* Fix integration test
2024-01-25 09:34:50 +01:00
Khai Mai c5c69096b3
Exclude the load balancing loss of padding tokens in Mixtral-8x7B (#28517)
* fix the function load_balancing_loss_func in Mixtral_Moe to include attention_mask

* format code using black and ruff

* skip computing mask if attention_mask=None

* add tests for load balancing loss Mixtral-Moe

* fix assert loss is different in mixtral_test

* fix pad_leng

* use assertNotAlmostEqual and print to debug

* remove print for debug

* minor updates

* reduce rtol and atol
2024-01-24 10:12:14 +01:00
amyeroberts 27c79a0fb4
Enable instantiating model with pretrained backbone weights (#28214)
* Enable instantiating model with pretrained backbone weights

* Update tests so backbone checkpoint isn't passed in

* Remove doc updates until changes made in modeling code

* Clarify pretrained import

* Update configs - docs and validation check

* Update src/transformers/utils/backbone_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Clarify exception message

* Update config init in tests

* Add test for when use_timm_backbone=True

* Small test updates

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-23 11:01:50 +00:00
Lysandre Debut 008a6a2208
Enable safetensors conversion from PyTorch to other frameworks without the torch requirement (#27599)
* Initial commit

* Requirements & tests

* Tests

* Tests

* Rogue import

* Rogue torch import

* Cleanup

* Apply suggestions from code review

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>

* bfloat16 management

* Sanchit's comments

* Import shield

* apply suggestions from code review

* correct bf16

* rebase

---------

Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
2024-01-23 10:28:23 +01:00
Yih-Dar d336c56d94
Avoid root logger's level being changed (#28638)
* avoid root logger's level being changed

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-22 14:45:30 +01:00
Arthur 83f9196cc4
[`GPTNeoX`] Fix BC issue with 4.36 (#28602)
* fix dtype issue

* add a test

* update copied from mentions

* nits

* fixup

* fix copies

* Apply suggestions from code review
2024-01-21 17:01:19 +00:00
Sangbum Daniel Choi 3f69f415ad
Fix auxiliary loss related code in transformers (#28406)
* [DETA] fix freeze/unfreeze function

* Update src/transformers/models/deta/modeling_deta.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/deta/modeling_deta.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add freeze/unfreeze test case in DETA

* fix type

* fix typo 2

* fix : enable aux and enc loss in training pipeline

* Add unsynced variables from original DETA for training

* modification for passing CI test

* make style

* make fix

* manual make fix

* change deta_modeling_test of configuration 'two_stage' default to TRUE and minor change of dist checking

* remove print

* divide configuration in DetaModel and DetaForObjectDetection

* image smaller size than 224 will give topk error

* pred_boxes and logits should be equivalent to two_stage_num_proposals

* add missing part in DetaConfig

* Update src/transformers/models/deta/modeling_deta.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add docstring in configure and prettify TO DO part

* change distribute related code to accelerate

* Update src/transformers/models/deta/configuration_deta.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/deta/test_modeling_deta.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* protect importing accelerate

* change variable name to specific value

* wrong import

* fix aux_loss in conditional_detr

* add test aux_loss

* add aux_loss test in deta and table_transformer

* fix yolos since it doesn't have auxiliary function

* fix maskformer auxiliary_loss related code

* make style

* change param 'auxiliary_loss' to 'use_auxiliary_loss'

* change param 'auxiliary_loss' to 'use_auxiliary_loss' in tests

* make style & fix-copies, also revert yolos related parameter

* revert variable name 'use_auxiliary_loss' to 'auxiliary_loss' due to DetrConfig

* revert variable name in yolos

* revert maskformer

* add aux_loss test in maskformer

* make style

* Update src/transformers/models/yolos/configuration_yolos.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-01-19 14:12:01 +00:00
Ofir Zafrir 9efec11400
Fix `_speculative_sampling` implementation (#28508) 2024-01-19 14:07:31 +00:00
Matt d15781597a
Allow add_tokens for ESM (#28535)
* Allow non-special tokens to be added

* Add test, fix token adding code

* Revert changes to id_to_token and token_to_id

* Update the ESM tokenizer to be a bit more standardized

* Update src/transformers/models/esm/tokenization_esm.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-19 12:32:05 +00:00
Patrick von Platen 690fe73f20
[Whisper] Finalize batched SOTA long-form generation (#27658)
* finalize

* make fix copies whisper

* [Tests] Make sure that we don't run tests mulitple times

* Update src/transformers/models/whisper/modeling_whisper.py

* [Tests] Make sure that we don't run tests mulitple times

* fix more

* improve

* improve

* improve further

* improve more

* improve

* fix more

* git commit and git push

* fix more

* fix more

* fix more

* New try

* Fix more whisper stuff

* Improve

* correct more

* correct more

* correct more

* Fix some tests

* Add more tests

* correct more

* correct more

* correct more

* push

* correct more

* Fix more

* Better

* without dec mask

* correct more

* clean

* save intermediate

* Fix more

* Fix VAD for large-v2

* Save new

* Correct more

* make cleaner

* correct tests

* correct src

* Finish

* Fix more

* Fix more

* finish

* Fix edge cases

* fix return_dict_in_generate

* fix all tests

* make style

* add docstrings

* add docstrings

* Fix logit processor

* make style

* fix pipeline test

* fix more style

* Apply suggestions from code review

* apply feedback Sanchit

* correct more

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* correct more

* correct more

* correct more

* Fix staticmethod

* correct more

* fix

* fix slow tests

* make style

* fix tokenizer test

* fix tokenizer test

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* finish

* finish

* revert kwargs change

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-19 14:04:17 +02:00
Saibo-creator d4fc1eb498
feat: Sequential beam search (#26304) 2024-01-19 11:36:54 +00:00
Yoach Lacombe 268fc1fdfa
Add w2v2bert to pipeline (#28585)
* generalize asr pipeline to fbank models

* change w2v2 pipeline output

* Update test_pipelines_automatic_speech_recognition.py
2024-01-19 11:25:01 +00:00
Yih-Dar db9a7e9d3d
Don't save `processor_config.json` if a processor has no extra attribute (#28584)
* not save if empty

* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-19 09:59:14 +00:00
Sanchit Gandhi 186aa6befe
[Whisper] Fix audio classification with weighted layer sum (#28563)
* fix

* tests

* fix test
2024-01-18 16:41:44 +00:00
Yih-Dar 0754217c82
Use `LoggingLevel` context manager in 3 tests (#28575)
* inside with LoggingLevel

* remove is_flaky

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-18 13:41:25 +00:00
Yoach Lacombe d2cdefb9ec
Add new meta w2v2-conformer BERT-like model (#28165)
* first commit

* correct default value non causal

* update config and modeling code

* update converting checkpoint

* clean modeling and fix tests

* make style

* add new config parameters to docstring

* fix copied from statements

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* make position_embeddings_type docstrings clearer

* clean converting script

* remove function not used

* clean modeling file

* apply suggestion for test file + add convert script to not_doctested

* modify tests according to review - cleaner logic and more tests

* Apply nit suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add checker of valid position embeddings type

* instantiate new layer norm layer with the right eps

* fix freeze_feature_encoder since it can be None in some cases

* add test same output in convert script

* restore wav2vec2conformer and add new model

* create processor and FE + clean

* add new model code

* fix convert script and set default config parameters

* correct model id paths

* make style

* make fix-copies and cleaning files

* fix copied from statements

* complete .md and fixe copies

* clean convert script argument defaults

* fix config parameters docstrings

* fix config docstring

* add copied from and enrich FE tests

* fix copied from and repo-consistency

* add autotokenizer

* make test input length shorter and change docstring code

* fix docstrings and copied from

* add add_adapter to ASR training example

* make testing of adapters more robust

* adapt to multi adapter layers

* refactor input_values->input_features and remove w2v2-bert feature extractor

* remove pretraining model

* remove depreciated features and useless lines

* add copied from and ignore statements to modeling tests

* remove pretraining model #2

* change import in convert script

* change default in convert script

* update readme and remove useless line

* Update tests/models/wav2vec2_bert/test_processor_wav2vec2_bert.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* refactor BERT to Bert for consistency

* remove useless ignore copy statement

* add persistent to buffer in rotary

* add eps in LayerNorm init and remove copied from

* add adapter activation parameters and add copied from statements

* Fix copied statements and add unitest.skip reasons

* add copied statement in test_processor

* refactor processor

* make style

* replace numpy random by torch rand

* remove expected output CTC

* improve converting script with processor class

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* remove gumbel class

* remove tests related to previously deleted class

* Update src/transformers/models/wav2vec2_bert/configuration_wav2vec2_bert.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* correct typos

* remove uused parameters

* update processor to takes both text and audio

* update checkpoints

* update expected output and add ctc expected output

* add label_attention_mask

* replace pt with np in processor tests

* fix typo

* revert to behaviour with labels_attention_mask

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-01-18 13:37:34 +00:00
Arthur 8189977885
[`Core Tokenization`] Support a fix for spm fast models (#26678)
* fix

* last attempt

* current work

* fix forward compatibility

* save all special tokens

* current state

* revert additional changes

* updates

* remove tokenizer.model

* add a test and the fix

* nit

* revert one more break

* fix typefield issue

* quality

* more tests

* fix fields for FC

* more nits?

* new additional changes

* how

* some updates

* the fix

* where do we stand

* nits

* nits

* revert unrelated changes

* nits nits nits

* styling

* don't break llama just yet

* revert llama changes

* safe arg check

* fixup

* Add a test for T5

* Necessary changes

* Tests passing, added tokens need to not be normalized. If the added tokens are normalized, it will the stripping which seems to be unwanted for a normal functioning

* Add even more tests, when normalization is set to True (which does not work 😓 )

* Add even more tests, when normalization is set to True (which does not work 😓 )

* Update to main

* nits

* fmt

* more and more test

* comments

* revert change as tests are failing

* make the test more readble

* nits

* refactor the test

* nit

* updates

* simplify

* style

* style

* style convert slow

* Update src/transformers/convert_slow_tokenizer.py
2024-01-18 12:31:54 +01:00
Yih-Dar 3005f96552
Save `Processor` (#27761)
* save processor

* Update tests/models/auto/test_processor_auto.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/test_processing_common.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-18 10:21:45 +00:00
fxmarty 2c1eebc121
Fix SDPA tests (#28552)
* skip bf16 test if not supported by device

* fix

* fix bis

* use is_torch_bf16_available_on_device

* use is_torch_fp16_available_on_device

* fix & use public llama

* use 1b model

* fix flacky test

---------

Co-authored-by: Your Name <you@example.com>
2024-01-17 17:29:18 +01:00
Junyang Lin d6ffe74dfa
Add qwen2 (#28436)
* add config, modeling, and tokenization

* add auto and init

* update readme

* update readme

* update team name

* fixup

* fixup

* update config

* update code style

* update for fixup

* update for fixup

* update for fixup

* update for testing

* update for testing

* fix bug for config and tokenization

* fix bug for bos token

* not doctest

* debug tokenizer

* not doctest

* debug tokenization

* debug init for tokenizer

* fix style

* update init

* delete if in token auto

* add tokenizer doc

* add tokenizer in init

* Update dummy_tokenizers_objects.py

* update

* update

* debug

* Update tokenization_qwen2.py

* debug

* Update convert_slow_tokenizer.py

* add copies

* add copied from and make style

* update files map

* update test

* fix style

* fix merge reading and update tests

* fix tests

* fix tests

* fix style

* debug a variable in readme

* Update src/transformers/models/qwen2/configuration_qwen2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* update test and copied from

* fix style

* update qwen2 tokenization  and tests

* Update tokenization_qwen2.py

* delete the copied from after property

* fix style

* update tests

* update tests

* add copied from

* fix bugs

* update doc

* add warning for sliding window attention

* update qwen2 tokenization

* fix style

* Update src/transformers/models/qwen2/modeling_qwen2.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix tokenizer fast

---------

Co-authored-by: Ren Xuancheng <jklj077@users.noreply.github.com>
Co-authored-by: renxuancheng.rxc <renxuancheng.rxc@alibaba-inc.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-17 16:02:22 +01:00
fxmarty a6adc05e6b
symbolic_trace: add past_key_values, llama, sdpa support (#28447)
* torch.fx: add pkv, llama, sdpa support

* Update src/transformers/models/opt/modeling_opt.py

* remove spaces

* trigger ci

* use explicit variable names
2024-01-17 11:50:53 +01:00
Joao Gante f4f57f9dfa
Config: warning when saving generation kwargs in the model config (#28514) 2024-01-16 18:31:01 +00:00
Arthur fe23256b73
[`SpeechT5Tokenization`] Add copied from and fix the `convert_tokens_to_string` to match the fast decoding scheme (#28522)
* Add copied from and fix the `convert_tokens_to_string` to match the fast decoding scheme

* fixup

* add a small test

* style test file

* nites
2024-01-16 16:50:02 +01:00
Arthur 96d0883103
[`TokenizationRoformerFast`] Fix the save and loading (#28527)
* cleanup

* add a test

* update the test

* style

* revert part that allows to pickle the tokenizer
2024-01-16 16:37:15 +01:00
Arthur 716df5fb7e
[ `TokenizationUtils`] Fix `add_special_tokens` when the token is already there (#28520)
* fix adding special tokens when the token is already there.

* add a test

* add a test

* nit

* fix the test: make sure the order is preserved

* Update tests/test_tokenization_common.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-01-16 16:36:29 +01:00
Nima Yaqmuri 07ae53e6e7
Fix/speecht5 bug (#28481)
* Fix bug in SpeechT5 speech decoder prenet's forward method

- Removed redundant `repeat` operation on speaker_embeddings in the forward method. This line was erroneously duplicating the embeddings, leading to incorrect input size for concatenation and performance issues.
- Maintained original functionality of the method, ensuring the integrity of the speech decoder prenet's forward pass remains intact.
- This change resolves a critical bug affecting the model's performance in handling speaker embeddings.

* Refactor SpeechT5 text to speech integration tests

- Updated SpeechT5ForTextToSpeechIntegrationTests to accommodate the variability in sequence lengths due to dropout in the speech decoder pre-net. This change ensures that our tests are robust against random variations in generated speech, enhancing the reliability of our test suite.
- Removed hardcoded dimensions in test assertions. Replaced with dynamic checks based on model configuration and seed settings, ensuring tests remain valid across different runs and configurations.
- Added new test cases to thoroughly validate the shapes of generated spectrograms and waveforms. These tests leverage seed settings to ensure consistent and predictable behavior in testing, addressing potential issues in speech generation and vocoder processing.
- Fixed existing test cases where incorrect assumptions about output shapes led to potential errors.

* Fix bug in SpeechT5 speech decoder prenet's forward method

- Removed redundant `repeat` operation on speaker_embeddings in the forward method. This line was erroneously duplicating the embeddings, leading to incorrect input size for concatenation and performance issues.
- Maintained original functionality of the method, ensuring the integrity of the speech decoder prenet's forward pass remains intact.
- This change resolves a critical bug affecting the model's performance in handling speaker embeddings.

* Refactor SpeechT5 text to speech integration tests

- Updated SpeechT5ForTextToSpeechIntegrationTests to accommodate the variability in sequence lengths due to dropout in the speech decoder pre-net. This change ensures that our tests are robust against random variations in generated speech, enhancing the reliability of our test suite.
- Removed hardcoded dimensions in test assertions. Replaced with dynamic checks based on model configuration and seed settings, ensuring tests remain valid across different runs and configurations.
- Added new test cases to thoroughly validate the shapes of generated spectrograms and waveforms. These tests leverage seed settings to ensure consistent and predictable behavior in testing, addressing potential issues in speech generation and vocoder processing.
- Fixed existing test cases where incorrect assumptions about output shapes led to potential errors.

* Enhance handling of speaker embeddings in SpeechT5

- Refined the generate and generate_speech functions in the SpeechT5 class to robustly handle two scenarios for speaker embeddings: matching the batch size (one embedding per sample) and one-to-many (a single embedding for all samples in the batch).
- The update includes logic to repeat the speaker embedding when a single embedding is provided for multiple samples, and a ValueError is raised for any mismatched dimensions.
- Also added corresponding test cases to validate both scenarios, ensuring complete coverage and functionality for diverse speaker embedding situations.

* Improve Test Robustness with Randomized Speaker Embeddings
2024-01-16 14:14:28 +00:00
fxmarty 66db33ddc8
Fix mismatching loading in from_pretrained with/without accelerate (#28414)
* fix mismatching behavior in from_pretrained with/without accelerate

* meaningful refactor

* remove added space

* add test

* fix model on the hub

* comment

* use tiny model

* style
2024-01-16 14:29:51 +01:00
Timothy Cronin ff86bc364d
improve dev setup comments and hints (#28495)
* improve dev setup comments and hints

* fix tests for new dev setup hints
2024-01-15 18:36:40 +00:00
Joao Gante 7e0ddf89f4
Generate: consolidate output classes (#28494) 2024-01-15 17:04:08 +00:00
Marc Sun 7c8dd88d13
[GPTQ] Fix test (#28018)
* fix test

* reduce length

* smaller model
2024-01-15 11:22:54 -05:00
thedamnedrhino 366c03271e
Tokenizer kwargs in textgeneration pipe (#28362)
* added args to the pipeline

* added test

* more sensical tests

* fixup

* docs

* typo
;

* docs

* made changes to support named args

* fixed test

* docs update

* styles

* docs

* docs
2024-01-15 16:52:18 +01:00
Younes Belkada 1b9a2e4c80
[`core`/ FEAT] Add the possibility to push custom tags using `PreTrainedModel` itself (#28405)
* v1 tags

* remove unneeded conversion

* v2

* rm unneeded warning

* add more utility methods

* Update src/transformers/utils/hub.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/hub.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/utils/hub.py

Co-authored-by: Lucain <lucainp@gmail.com>

* more enhancements

* oops

* merge tags

* clean up

* revert unneeded change

* add extensive docs

* more docs

* more kwargs

* add test

* oops

* fix test

* Update src/transformers/modeling_utils.py

Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>

* Update src/transformers/utils/hub.py

Co-authored-by: Lucain <lucainp@gmail.com>

* Update src/transformers/modeling_utils.py

* Update src/transformers/trainer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add more conditions

* more logic

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Lucain <lucainp@gmail.com>
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
2024-01-15 14:48:07 +01:00
Apoorv Saxena e304f9769c
Adding Prompt lookup decoding (#27775)
* MVP

* fix ci

* more ci

* remove redundant kwarg

* added and wired up PromptLookupCandidateGenerator

* rebased with main, working

* removed print

* style fixes

* fix test

* fixed tests

* added test for prompt lookup decoding

* fixed circleci

* fixed test issue

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-13 17:15:58 +00:00
Joao Gante afc45b13ca
Generate: refuse to save bad generation config files (#28477) 2024-01-12 16:01:17 +00:00
Younes Belkada 266c67b06a
[`Mixtral` / `Awq`] Add mixtral fused modules for Awq (#28240)
* add mixtral fused modules

* add changes from modeling utils

* add test

* fix test + rope theta issue

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add tests

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-01-12 14:29:35 +01:00
amyeroberts 666a6f078c
Update metadata loading for oneformer (#28398)
* Update meatdata loading for oneformer

* Enable loading from a model repo

* Update docstrings

* Fix tests

* Update tests

* Clarify repo_path behaviour
2024-01-12 12:35:31 +00:00
amyeroberts 4e36a6cd00
Mark two logger tests as flaky (#28458)
* Mark two logger tests as flaky

* Add description to is_flaky
2024-01-12 11:58:59 +00:00
Younes Belkada 07bdbebb48
[`Awq`] Add llava fused modules support (#28239)
* add llava + fused modules

* Update src/transformers/models/llava/modeling_llava.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-01-12 06:55:54 +01:00
Yih-Dar 59cd9de39d
Byebye torch 1.10 (#28207)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-11 16:18:27 +01:00
liangxuZhang e768616afa
Fix load balancing loss func for mixtral (#28256)
* Correct the implementation of auxiliary loss of mixtrtal

* correct the implementation of auxiliary loss of mixtrtal

* Implement a simpler calculation method

---------

Co-authored-by: zhangliangxu3 <zhangliangxu3@jd.com>
2024-01-11 16:16:12 +01:00
Gustavo de Rosa 5509058561
[Phi] Extend implementation to use GQA/MQA. (#28163)
* chore(phi): Updates configuration_phi with missing keys.

* chore(phi): Adds first draft of combined modeling_phi.

* fix(phi): Fixes according to latest review.

* fix(phi): Removes pad_vocab_size_multiple to prevent inconsistencies.

* fix(phi): Fixes unit and integration tests.

* fix(phi): Ensures that everything works with microsoft/phi-1 for first integration.

* fix(phi): Fixes output of docstring generation.

* fix(phi): Fixes according to latest review.

* fix(phi): Fixes according to latest review.

* fix(tests): Re-enables Phi-1.5 test.

* fix(phi): Fixes attention overflow on PhiAttention (for Phi-2).

* fix(phi): Improves how queries and keys are upcast.

* fix(phi): Small updates on latest changes.
2024-01-11 15:58:02 +01:00
Harisankar Babu d560637885
Optionally preprocess segmentation maps for MobileViT (#28420)
* optionally preprocess segmentation maps for mobilevit

* changed pretrained model name to that of segmentation model

* removed voc-deeplabv3 from model archive list

* added preprocess_image and preprocess_mask methods for processing images and segmentation masks respectively

* added tests for segmentation masks based on segformer feature extractor

* use crop_size instead of size

* reverting to initial model
2024-01-11 14:52:14 +00:00
amyeroberts 66964c00f6
Enable multi-label image classification in pipeline (#28433)
Enable multi-label image classification
2024-01-11 10:29:38 +00:00
Patrick von Platen cbbe30749b
[Whisper] Fix slow test (#28407)
* [Whisper] Fix slow test

* update

* update

* update

* update

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-01-10 22:35:36 +01:00
Zach Mueller 6015d0ad6c
Support `DeepSpeed` when using auto find batch size (#28088)
Fixup test
2024-01-10 06:03:13 -05:00
Zach Mueller a777f52599
Skip now failing test in the Trainer tests (#28421)
* Fix test

* Skip
2024-01-10 06:02:31 -05:00
HanHui 4df1d69634
[BUG] BarkEosPrioritizerLogitsProcessor eos_token_id use list, tensor size mismatch (#28201)
fix(generation/logits_process.py): BarkEosPrioritizerLogitsProcessor eos_token_id use list, tensor size mismatch

Co-authored-by: chenhanhui <chenhanhui@kanzhun.com>
2024-01-10 11:46:49 +01:00
Weiming Zhao 701298d2d3
Use mmap option to load_state_dict (#28331)
Use mmap option to load_state_dict (#28331)
2024-01-10 09:57:30 +01:00
Victor SANH 0f2f0c634f
Fix `_merge_input_ids_with_image_features` for llava model (#28333)
* fix `_merge_input_ids_with_image_features` for llava model

* Update src/transformers/models/llava/modeling_llava.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* adress comments

* style and tests

* ooops

* test the backward too

* Apply suggestions from code review

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update tests/models/vipllava/test_modeling_vipllava.py

* style and quality

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-01-10 08:33:33 +01:00
Xuehai Pan 976189a6df
Fix initialization for missing parameters in `from_pretrained` under ZeRO-3 (#28245)
* Fix initialization for missing parameters in `from_pretrained` under ZeRO-3

* Test initialization for missing parameters under ZeRO-3

* Add more tests

* Only enable deepspeed context for per-module level parameters

* Enable deepspeed context only once

* Move class definition inside test case body
2024-01-09 14:58:21 +00:00
Sangbum Daniel Choi 357971ec36
fix auxiliary loss training in DetrSegmentation (#28354)
* fix auxiliary loss training in detrSegmentation

* add auxiliary_loss testing
2024-01-09 10:17:07 +00:00
NielsRogge 3b742ea84c
Add SigLIP (#26522)
* Add first draft

* Use appropriate gelu function

* More improvements

* More improvements

* More improvements

* Convert checkpoint

* More improvements

* Improve docs, remove print statements

* More improvements

* Add link

* remove unused masking function

* begin tokenizer

* do_lower_case

* debug

* set split_special_tokens=True

* Remove script

* Fix style

* Fix rebase

* Use same design as CLIP

* Add fast tokenizer

* Add SiglipTokenizer to init, remove extra_ids

* Improve conversion script

* Use smaller inputs in conversion script

* Update conversion script

* More improvements

* Add processor to conversion script

* Add tests

* Remove print statements

* Add tokenizer tests

* Fix more tests

* More improvements related to weight initialization

* More improvements

* Make more tests pass

* More improvements

* More improvements

* Add copied from

* Add canonicalize_text

* Enable fast tokenizer tests

* More improvements

* Fix most slow tokenizer tests

* Address comments

* Fix style

* Remove script

* Address some comments

* Add copied from to tests

* Add more copied from

* Add more copied from

* Add more copied from

* Remove is_flax_available

* More updates

* Address comment

* Remove SiglipTokenizerFast for now

* Add caching

* Remove umt5 test

* Add canonicalize_text inside _tokenize, thanks Arthur

* Fix image processor tests

* Skip tests which are not applicable

* Skip test_initialization

* More improvements

* Compare pixel values

* Fix doc tests, add integration test

* Add do_normalize

* Remove causal mask and leverage ignore copy

* Fix attention_mask

* Fix remaining tests

* Fix dummies

* Rename temperature and bias

* Address comments

* Add copied from to tokenizer tests

* Add SiglipVisionModel to auto mapping

* Add copied from to image processor tests

* Improve doc

* Remove SiglipVisionModel from index

* Address comments

* Improve docs

* Simplify config

* Add first draft

* Make it like mistral

* More improvements

* Fix attention_mask

* Fix output_attentions

* Add note in docs

* Convert multilingual model

* Convert large checkpoint

* Convert more checkpoints

* Add pipeline support, correct image_mean and image_std

* Use padding=max_length by default

* Make processor like llava

* Add code snippet

* Convert more checkpoints

* Set keep_punctuation_string=None as in OpenCLIP

* Set normalized=False for special tokens

* Fix doc test

* Update integration test

* Add figure

* Update organization

* Happy new year

* Use AutoModel everywhere

---------

Co-authored-by: patil-suraj <surajp815@gmail.com>
2024-01-08 18:17:16 +01:00
Rosie Wood 73c88012b7
Add segmentation map processing to SAM Image Processor (#27463)
* add segmentation map processing to sam image processor

* fixup

* add tests

* reshaped_input_size is shape before padding

* update tests for size/shape outputs

* fixup

* add code snippet to docs

* Update docs/source/en/model_doc/sam.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Add missing backticks

* add `segmentation_maps` as arg for SamProcessor.__call__()

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-01-08 16:40:36 +00:00
Mohamed Abu El-Nasr 0c2121f99b
Fix building alibi tensor when num_heads is not a power of 2 (#28380)
* Fix building alibi tensor when num_heads is not a power of 2

* Remove print function
2024-01-08 10:39:40 +01:00
Susnato Dhar 3eddda1111
[Phi2] Add support for phi2 models (#28211)
* modified script and added test for phi2

* changes
2024-01-07 08:19:14 +01:00
Sangbum Daniel Choi 899d8351f9
[DETA] Improvement and Sync from DETA especially for training (#27990)
* [DETA] fix freeze/unfreeze function

* Update src/transformers/models/deta/modeling_deta.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/deta/modeling_deta.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add freeze/unfreeze test case in DETA

* fix type

* fix typo 2

* fix : enable aux and enc loss in training pipeline

* Add unsynced variables from original DETA for training

* modification for passing CI test

* make style

* make fix

* manual make fix

* change deta_modeling_test of configuration 'two_stage' default to TRUE and minor change of dist checking

* remove print

* divide configuration in DetaModel and DetaForObjectDetection

* image smaller size than 224 will give topk error

* pred_boxes and logits should be equivalent to two_stage_num_proposals

* add missing part in DetaConfig

* Update src/transformers/models/deta/modeling_deta.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add docstring in configure and prettify TO DO part

* change distribute related code to accelerate

* Update src/transformers/models/deta/configuration_deta.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/deta/test_modeling_deta.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* protect importing accelerate

* change variable name to specific value

* wrong import

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-01-05 14:20:21 +00:00
Fernando Rodriguez Sanchez 57e9c83213
Fix pos_mask application and update tests accordingly (#27892)
* Fix pos_mask application and update tests accordingly

* Fix style

* Adding comments

---------

Co-authored-by: Fernando Rodriguez <fernando.rodriguez@nielseniq.com>
2024-01-05 12:36:10 +01:00
yuanwu2017 03b980990a
Don't check the device when device_map=auto (#28351)
When running the case on multi-cards server with devcie_map-auto, It will not always be allocated to device 0,
Because other processes may be using these cards. It will select the devices that can accommodate this model.

Signed-off-by: yuanwu <yuan.wu@intel.com>
2024-01-05 12:21:29 +01:00
Yoach Lacombe 35e9d2b223
Fix error in M4T feature extractor (#28340)
* fix M4T FE error when no attention mask

* modify logic

* add test

* go back to initial test situation + add other tests
2024-01-04 16:40:53 +00:00
Sangbum Daniel Choi 4a66c0d952
enable training mask2former and maskformer for transformers trainer (#28277)
* fix get_num_masks output as [int] to int

* fix loss size from torch.Size([1]) to torch.Size([])
2024-01-04 09:53:25 +01:00
Apsod 45b1dfa342
Remove token_type_ids from model_input_names (like #24788) (#28325)
* remove token_type_ids from model_input_names (like #24788)

* removed test that assumed token_type_ids should be present and updated a model reference so that it points to an available model)
2024-01-03 19:26:07 +01:00
Connor Henderson d83ff5eeff
Add FastSpeech2Conformer (#23439)
* start - docs, SpeechT5 copy and rename

* add relevant code from FastSpeech2 draft, have tests pass

* make it an actual conformer, demo ex.

* matching inference with original repo, includes debug code

* refactor nn.Sequentials, start more desc. var names

* more renaming

* more renaming

* vocoder scratchwork

* matching vocoder outputs

* hifigan vocoder conversion script

* convert model script, rename some config vars

* replace postnet with speecht5's implementation

* passing common tests, file cleanup

* expand testing, add output hidden states and attention

* tokenizer + passing tokenizer tests

* variety of updates and tests

* g2p_en pckg setup

* import structure edits

* docstrings and cleanup

* repo consistency

* deps

* small cleanup

* forward signature param order

* address comments except for masks and labels

* address comments on attention_mask and labels

* address second round of comments

* remove old unneeded line

* address comments part 1

* address comments pt 2

* rename auto mapping

* fixes for failing tests

* address comments part 3 (bart-like, train loss)

* make style

* pass config where possible

* add forward method + tests to WithHifiGan model

* make style

* address arg passing and generate_speech comments

* address Arthur comments

* address Arthur comments pt2

* lint  changes

* Sanchit comment

* add g2p-en to doctest deps

* move up self.encoder

* onnx compatible tensor method

* fix is symbolic

* fix paper url

* move models to espnet org

* make style

* make fix-copies

* update docstring

* Arthur comments

* update docstring w/ new updates

* add model architecture images

* header size

* md wording update

* make style
2024-01-03 18:01:06 +00:00
Younes Belkada fa21ead73d
[`Awq`] Enable the possibility to skip quantization for some target modules (#27950)
* v1

* add docstring

* add tests

* add awq 0.1.8

* oops

* fix test
2023-12-25 11:06:56 +01:00
Younes Belkada 29e7a1e183
[`Llava`] Fix llava index errors (#28032)
* fix llava index errors

* forward contrib credits from original implementation and fix

* better fix

* final fixes and fix all tests

* fix

* fix nit

* fix tests

* add regression tests

---------

Co-authored-by: gullalc <gullalc@users.noreply.github.com>
2023-12-22 17:47:38 +01:00
Yoach Lacombe 5da3db3fd5
[Whisper] Fix word-level timestamps with bs>1 or num_beams>1 (#28114)
* fix frames

* use smaller chunk length

* correct beam search + tentative stride

* fix whisper word timestamp in batch

* add test batch generation with return token timestamps

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* clean a test

* make style + correct typo

* write clearer comments

* explain test in comment

---------

Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2023-12-22 12:43:11 +00:00
Yih-Dar bb3bd44739
Fix the check of models supporting FA/SDPA not run (#28202)
* add check_support_list.py

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-22 12:56:11 +01:00
NielsRogge c9fb250a25
Add Swinv2 backbone (#27742)
* First draft

* More improvements

* More improvements

* Make all tests pass

* Remove script

* Update image processor

* Address comments

* Use new gradient checkpointing method

* Convert checkpoints, add integration test

* Do not keep aspect ratio for now

* Set keep_aspect_ratio=False for beit, add integration test

* Remove print statement
2023-12-22 11:12:56 +00:00
Nicholas Neo 1ef86c4f56
Fix: [SeamlessM4T - S2TT] Bug in batch loading of audio in torch.Tensor format in the SeamlessM4TFeatureExtractor class (#27914)
* fixes: code fixes on is_batched condition to also check for batched audio data in torch.Tensor format instead of only just checking for batched audio data in np.ndarray format

* Update src/transformers/models/seamless_m4t/feature_extraction_seamless_m4t.py

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>

* refactor: code refactoring to remove torch framework dependency

* docs: updated docstring to add torch tensor compatibility

* test: add test cases to incorporate torch tensor inputs

* test: ran make fix-copies for code conformity

* test: refactor test to separate the test_call into test_call_numpy and test_call_torch

---------

Co-authored-by: Yoach Lacombe <52246514+ylacombe@users.noreply.github.com>
2023-12-22 10:47:30 +00:00
amyeroberts 3657748b4d
Update YOLOS slow test values (#28187)
Update test values
2023-12-21 18:17:07 +00:00
amyeroberts cd1350ce9b
Fix slow backbone tests - out_indices must match stage name ordering (#28186)
Indices must match stage name ordering
2023-12-21 18:16:50 +00:00
Matt 260b9d2179
Even more TF test fixes (#28146)
* Fix vision text dual encoder

* Small cleanup for wav2vec2 (not fixed yet)

* Small fix for vision_encoder_decoder

* Fix SAM builds

* Update TFBertTokenizer test with modern exporting + tokenizer

* Fix DeBERTa

* Fix DeBERTav2

* Try RAG fix but it's impossible to test locally

* Actually fix RAG now that I got FAISS working somehow

* Fix Wav2Vec2, add sermon

* Fix Hubert
2023-12-21 15:14:46 +00:00
Arthur f9a98c476c
[`Mixtral` & `Mistral`] Add support for sdpa (#28133)
* some nits

* update test

* add support d\sd[a

* remove some dummy inputs

* all good

* style

* nits

* fixes

* fix more copies

* nits

* styling

* fix

* Update src/transformers/models/mistral/modeling_mistral.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* add a slow test just to be sure

* fixup

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-21 12:38:22 +01:00
Sanchit Gandhi 814619f54f
[Whisper] Use torch for stft if available (#26119)
* [Whisper] Use torch for stft if available

* update docstring

* mock patch decorator

* fit on one line
2023-12-21 11:04:05 +00:00
Poedator 4f7806ef7e
[bnb] Let's make serialization of 4bit models possible (#26037)
* updated bitsandbytes.py

* rm test_raise_* from test_4bit.py

* add test_4bit_serialization.py

* modeling_utils bulk edits

* bnb_ver 0.41.3 in integrations/bitsandbytes.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* @slow reinstated

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* bnb ver 0.41.3 in  src/transformers/modeling_utils.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* rm bnb version todo in  integrations/bitsandbytes.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* moved 4b serialization tests to test_4bit

* tests upd for opt

* to torch_device

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* ruff fixes to tests

* rm redundant bnb version check in mod_utils

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* restore _hf_peft_config_loaded  modeling_utils.py::2188

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* restore _hf_peft_config_loaded  test in modeling_utils.py::2199

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* fixed NOT getattr(self, "is_8bit_serializable")

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* setting model.is_4bit_serializable

* rm separate fp16_statistics arg from set_module...

* rm else branch in integrations::bnb::set_module

* bnb 4bit dtype check

* upd comment on 4bit weights

* upd tests for FP4 safe

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2023-12-21 11:54:44 +01:00
Dean Wyatte e268d7e5dc
disable test_retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest (#28169)
disable retain_grad_hidden_states_attentions on SeamlessM4TModelWithTextInputTest
2023-12-21 08:39:44 +01:00
amyeroberts 1d77735947
Fix yolos resizing (#27663)
* Fix yolos resizing

* Update tests

* Add a test
2023-12-20 20:55:51 +00:00
Joao Gante 45b70384a7
Generate: fix speculative decoding (#28166)
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
2023-12-20 18:55:35 +00:00
amyeroberts ee298a16a2
Align backbone stage selection with out_indices & out_features (#27606)
* Iteratre over out_features instead of stage_names

* Update for all backbones

* Add tests

* Fix

* Align timm backbone behaviour with other backbones

* Fix tests

* Stricter checks on set out_features and out_indices

* Revert back stage selection logic

* Remove out-of-order logic

* Document restriction in docstrings
2023-12-20 18:33:17 +00:00
Yih-Dar 7938c8c836
Fix weights not properly initialized due to shape mismatch (#28122)
* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-20 14:20:02 +01:00
peter-sk 769a9542de
move code to Trainer.evaluate to enable use of that function with multiple datasets (#27844)
* move code to Trainer.evaluate to enable use of that function with multiple datasets

* test

* update doc string

* and a tip

* forgot the type

---------

Co-authored-by: Prof. Peter Schneider-Kamp <jps@ordbogen.com>
2023-12-20 10:55:56 +01:00
Arthur 4a04b4ccca
[`Mixtral`] Fix loss + nits (#28115)
* default config should not use sliding window

* update the doc

* nits

* add a proper test

* update

* update

* update expected value

* Update src/transformers/tokenization_utils_fast.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* convert to float

* average then N**2

* comment

* revert nit

* good to fo

* fixup

* Update tests/models/mixtral/test_modeling_mixtral.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* revert unrelated change

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-19 17:31:54 +01:00
amyeroberts bd7a356135
Update split string in doctest to reflect #28087 (#28135) 2023-12-19 13:55:09 +00:00
Matt 71d47f0ad4
More TF fixes (#28081)
* More build_in_name_scope()

* Make sure we set the save spec now we don't do it with dummies anymore

* make fixup
2023-12-18 15:26:03 +00:00
Poedator f85a1e82c1
4D `attention_mask` support (#27539)
* edits to _prepare_4d_causal_attention_mask()

* initial tests for 4d mask

* attention_mask_for_sdpa support

* added test for inner model hidden

* added autotest decorators

* test mask dtype to torch.int64

* torch.testing.assert_close

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* torch_device and @torch_gpu in tests

* upd tests

* +torch decorators

* torch decorators fixed

* more decorators!

* even more decorators

* fewer decorators

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-17 11:08:04 +01:00
Sourab Mangrulkar 238d2e3c44
fix resuming from ckpt when using FSDP with FULL_STATE_DICT (#27891)
* fix resuming from ckpt when suing FSDP with FULL_STATE_DICT

* update tests

* fix tests
2023-12-16 19:41:43 +05:30
Quentin Lhoest 26ea725bc0
Update fixtures-image-utils (#28080)
* fix hf-internal-testing/fixtures_image_utils

* fix test

* comments
2023-12-15 16:58:36 +00:00
Yoach Lacombe deb72cb6d9
Skip M4T `test_retain_grad_hidden_states_attentions` (#28060)
* skip test from SpeechInput

* refine description of skip
2023-12-15 13:39:16 +00:00
Younes Belkada 1e20931765
[`FA-2`] Fix fa-2 issue when passing `config` to `from_pretrained` (#28043)
* fix fa-2 issue

* fix test

* Update src/transformers/modeling_utils.py

Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>

* clenaer fix

* up

* add more robust tests

* Update src/transformers/modeling_utils.py

Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>

* fixup

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* pop

* add test

---------

Co-authored-by: fxmarty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-12-15 11:08:27 +01:00
Matt 3060899be5
Replace build() with build_in_name_scope() for some TF tests (#28046)
Replace build() with build_in_name_scope() for some tests
2023-12-14 17:42:25 +00:00
Matt 050e0b44f6
Proper build() methods for TF (#27794)
* Add a convenience method for building in your own name scope

* Second attempt at auto layer building

* Revert "Second attempt at auto layer building"

This reverts commit e03a3aaecf9ec41a805582b83cbdfe3290a631be.

* Attempt #3

* Revert "Attempt #3"

This reverts commit b9df7a0857560d29b5abbed6127d9e9eca77cf47.

* Add missing attributes that we're going to need later

* Add some attributes we're going to need later

* A fourth attempt! Feel the power flow through you!

* Revert "A fourth attempt! Feel the power flow through you!"

This reverts commit 6bf4aaf3875d6f28485f50187617a4c616c8aff7.

* Add more values we'll need later

* TF refactor that we'll need later

* Revert "TF refactor that we'll need later"

This reverts commit ca07202fb5b7b7436b893baa8d688b4f348ea7b9.

* Revert "Revert "TF refactor that we'll need later""

This reverts commit 1beb0f39f293ed9c27594575e1c849aadeb15c13.

* make fixup

* Attempt five!

* Revert "Attempt five!"

This reverts commit 3302207958dfd0374b0447a51c06eea51a506044.

* Attempt six - this time don't add empty methods

* Revert "Attempt six - this time don't add empty methods"

This reverts commit 67d60129be75416b6beb8f47c7d38d77b18d79bb.

* Attempt seven - better base model class detection!

* Revert "Attempt seven - better base model class detection!"

This reverts commit 5f14845e92ea0e87c598da933bfbfee10f553bc9.

* Another attribute we'll need later

* Try again with the missing attribute!

* Revert "Try again with the missing attribute!"

This reverts commit 760c6f30c5dffb3e04b0e73c34a77d1882a0fef7.

* This is the attempt that will pierce the heavens!

* Revert "This is the attempt that will pierce the heavens!"

This reverts commit c868bb657de057aca7a5260350a3f831fc4dfee6.

* Attempt seven - snag list is steadily decreasing

* Revert "Attempt seven - snag list is steadily decreasing"

This reverts commit 46fbd975deda64429bfb3e5fac4fc0370c00d316.

* Attempt eight - will an empty snag list do it?

* Revert "Attempt eight - will an empty snag list do it?"

This reverts commit 7c8a3c2b083253649569e9877e02054ae5cec67b.

* Fixes to Hubert issues that cause problems later

* Trying again with Conv1D/SeparableConv fixes

* Revert "Trying again with Conv1D/SeparableConv fixes"

This reverts commit 55092bca952bc0f750aa1ffe246a640bf1e2036e.

* Apply the build shape fixes to Wav2Vec2 as well

* One more attempt!

* Revert "One more attempt!"

This reverts commit 5ac3e4cb01b9458cc93312873725f9444ae7261c.

* Another attempt!

* Revert "Another attempt!"

This reverts commit ea16d890e019d7de8792a3b8e72f3b1c02adae50.

* Let's see how many failures we get without the internal build method

* Fix OpenAI

* Fix MobileBERT

* (Mostly) fix GroupVIT

* Fix BLIP

* One more BLIP fix

* One more BLIP fix!

* Fix Regnet

* Finally fully fix GroupViT

* Fix Data2Vec and add the new AdaptivePool

* Fix Segformer

* Fix Albert

* Fix Deberta/DebertaV2

* Fix XLM

* Actually fix XLM

* Fix Flaubert

* Fix lxmert

* Fix Resnet

* Fix ConvBERT

* Fix ESM

* Fix Convnext / ConvnextV2

* Fix SAM

* Fix Efficientformer

* Fix LayoutLMv3

* Fix speech_to_text

* Fix mpnet and mobilevit

* Fix Swin

* Fix CTRL

* Fix CVT

* Fix DPR

* Fix Wav2Vec2

* Fix T5

* Fix Hubert

* Fix GPT2

* Fix Whisper

* Fix DeiT

* Fix the encoder-decoder / dual-encoder classes

* make fix-copies

* build in name scope

* Fix summarization test

* Fix tied weight names for BART + Blenderbot

* Fix tied weight name building

* Fix to TFESM weight building

* Update TF SAM

* Expand all the shapes out into Big Boy Shapes
2023-12-14 15:17:30 +00:00
Yoach Lacombe bb1d0d0d9e
Fix languages covered by M4Tv2 (#28019)
* correct language assessment  + add tests

* Update src/transformers/models/seamless_m4t_v2/modeling_seamless_m4t_v2.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make style + simplify and enrich test

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-12-14 14:43:44 +00:00
Joao Gante e2b16485f3
SeamlessM4T: `test_retain_grad_hidden_states_attentions` is flaky (#28035) 2023-12-14 13:56:03 +00:00
Joao Gante 9e5c28c573
Generate: assisted decoding now uses `generate` for the assistant (#28030)
generate refactor
2023-12-14 13:31:13 +00:00
Zach Mueller 93766251cb
Fix bug with rotating checkpoints (#28009)
* Fix bug

* Write test

* Keep back old modification for grad accum steps

* Whitespace...

* Whitespace again

* Race condition

* Wait for everyone
2023-12-13 12:17:30 -05:00
Arthur ec43d6870a
[`CI slow`] Fix expected values (#27999)
* fix expected values

* style

* test is slow
2023-12-13 13:37:10 +01:00
Arindam Jati 749f94e460
Fix PatchTSMixer slow tests (#27997)
* fix slow tests

* revert formatting

---------

Co-authored-by: Arindam Jati <arindam.jati@ibm.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
2023-12-13 13:34:25 +01:00
Younes Belkada c7f076a00e
Adds VIP-llava to transformers (#27932)
* v1

* add-new-model-like

* revert

* fix forward and conversion script

* revert

* fix copies

* fixup

* fix

* Update docs/source/en/index.md

* Apply suggestions from code review

* push

* fix

* fixes here and there

* up

* fixup and fix tests

* Apply suggestions from code review

* add docs

* fixup

* fixes

* docstring

* add docstring

* fixup

* docstring

* fixup

* nit

* docs

* more copies

* fix copies

* nit

* update test
2023-12-13 10:42:24 +01:00
rjenc29 7e35f37071
Fix a couple of typos and add an illustrative test (#26941)
* fix a typo and add an illustrative test

* appease black

* reduce code duplication and add Annotion type back with a pending deprecation warning

* remove unused code

* change warning type

* black formatting fix

* change enum deprecation approach to support 3.8 and earlier

* add stacklevel

* fix black issue

* fix ruff issues

* fix ruff issues

* move tests to own mixin

* include yolos

* fix black formatting issue

* fix black formatting issue

* use logger instead of warnings and include target version for deprecation
2023-12-11 15:51:51 +00:00
Ella Charlaix 39acfe84ba
Add deepspeed test to amd scheduled CI (#27633)
* add deepspeed scheduled test for amd

* fix image

* add dockerfile

* add comment

* enable tests

* trigger

* remove trigger for this branch

* trigger

* change runner env to trigger the docker build image test

* use new docker image

* remove test suffix from docker image tag

* replace test docker image with original image

* push new image

* Trigger

* add back amd tests

* fix typo

* add amd tests back

* fix

* comment until docker image build scheduled test fix

* remove deprecated deepspeed build option

* upgrade torch

* update docker & make tests pass

* Update docker/transformers-pytorch-deepspeed-amd-gpu/Dockerfile

* fix

* tmp disable test

* precompile deepspeed to avoid timeout during tests

* fix comment

* trigger deepspeed tests with new image

* comment tests

* trigger

* add sklearn dependency to fix slow tests

* enable back other tests

* final update

---------

Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: Félix Marty <9808326+fxmarty@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-11 16:33:36 +01:00
Zach Mueller 44127ec667
Fix test for auto_find_batch_size on multi-GPU (#27947)
* Fix test for multi-GPU

* WIth CPU handle
2023-12-11 09:57:41 -05:00
Arthur accccdd008
[`Add Mixtral`] Adds support for the Mixtral MoE (#27942)
* up

* up

* test

* logits ok

* up

* up

* few fixes

* conversion script

* up

* nits

* nits

* update

* nuke

* more updates

* nites

* fix many issues

* nit

* scatter

* nit

* nuke megablocks

* nits

* fix conversion script

* nit

* remove

* nits

* nit

* update

* oupsssss

* change

* nits device

* nits

* fixup

* update

* merge

* add copied from

* fix the copy mentions

* update tests

* more fixes

* nits

* conversion script

* add parts of the readme

* Update tests/models/mixtral/test_modeling_mixtral.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* new test + conversion script

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Apply suggestions from code review

* fix

* fix copies

* fix copies

* ooops

* fix config

* Apply suggestions from code review

* fix nits

* nit

* add copies

* add batched tests

* docs

* fix flash attention

* let's add more verbose

* add correct outputs

* support router ouptus

* ignore copies where needed

* fix

* cat list if list is given for now

* nits

* Update docs/source/en/model_doc/mixtral.md

* finish router refactoring

* fix forward

* fix expected values

* nits

* fixup

* fix

* fix bug

* fix

* fix dtype mismatch

* fix

* grrr grrr I support item assignment

* fix CI

* docs

* fixup

* remove some copied form

* fix weird diff

* skip doctest fast on the config and modeling

* mark that is supports flash attention in the doc

* update

* Update src/transformers/models/mixtral/modeling_mixtral.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update docs/source/en/model_doc/mixtral.md

Co-authored-by: Lysandre Debut <hi@lysand.re>

* revert router logits config issue

* update doc accordingly

* Update src/transformers/models/mixtral/convert_mixtral_weights_to_hf.py

* nits

* use torch testing asssert close

* fixup

* doc nits

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-11 12:50:27 +01:00
Arthur 0676d992a5
[`from_pretrained`] Make from_pretrained fast again (#27709)
* Skip nn.Module.reset_parameters

* Actually skip

* Check quality

* Maybe change all inits

* Fix init issues: only modify public functions

* Add a small test for now

* Style

* test updates

* style

* nice tes

* style

* make it even faster

* one more second

* remove fx icompatible

* Update tests/test_modeling_common.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update tests/test_modeling_common.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* skip

* fix quality

* protect the import

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-11 12:38:17 +01:00
fxmarty 9f18cc6df0
Fix SDPA dispatch & make SDPA CI compatible with torch<2.1.1 (#27940)
fix sdpa dispatch
2023-12-11 18:56:38 +09:00
Yoach Lacombe 5e620a92cf
Fix `SeamlessM4Tv2ModelIntegrationTest` (#27911)
change dtype of some integration tests
2023-12-11 09:18:41 +01:00
Yih-Dar e96c1de191
Skip `UnivNetModelTest::test_multi_gpu_data_parallel_forward` (#27912)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-11 09:17:37 +01:00
Sangbum Daniel Choi 235be08569
[DETA] fix backbone freeze/unfreeze function (#27843)
* [DETA] fix freeze/unfreeze function

* Update src/transformers/models/deta/modeling_deta.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/deta/modeling_deta.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add freeze/unfreeze test case in DETA

* fix type

* fix typo 2

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-11 07:57:30 +01:00
fxmarty 80377eb018
F.scaled_dot_product_attention support (#26572)
* add sdpa

* wip

* cleaning

* add ref

* yet more cleaning

* and more :)

* wip llama

* working llama

* add output_attentions=True support

* bigcode sdpa support

* fixes

* gpt-bigcode support, require torch>=2.1.1

* add falcon support

* fix conflicts falcon

* style

* fix attention_mask definition

* remove output_attentions from attnmaskconverter

* support whisper without removing any Copied from statement

* fix mbart default to eager renaming

* fix typo in falcon

* fix is_causal in SDPA

* check is_flash_attn_2_available in the models init as well in case the model is not initialized through from_pretrained

* add warnings when falling back on the manual implementation

* precise doc

* wip replace _flash_attn_enabled by config.attn_implementation

* fix typo

* add tests

* style

* add a copy.deepcopy on the config in from_pretrained, as we do not want to modify it inplace

* obey to config.attn_implementation if a config is passed in from_pretrained

* fix is_torch_sdpa_available when torch is not installed

* remove dead code

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bart/modeling_bart.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove duplicate pretraining_tp code

* add dropout in llama

* precise comment on attn_mask

* add fmt: off for _unmask_unattended docstring

* precise num_masks comment

* nuke pretraining_tp in LlamaSDPAAttention following Arthur's suggestion

* cleanup modeling_utils

* backward compatibility

* fix style as requested

* style

* improve documentation

* test pass

* style

* add _unmask_unattended tests

* skip meaningless tests for idefics

* hard_check SDPA requirements when specifically requested

* standardize the use if XXX_ATTENTION_CLASSES

* fix SDPA bug with mem-efficient backend on CUDA when using fp32

* fix test

* rely on SDPA is_causal parameter to handle the causal mask in some cases

* fix FALCON_ATTENTION_CLASSES

* remove _flash_attn_2_enabled occurences

* fix test

* add OPT to the list of supported flash models

* improve test

* properly test on different SDPA backends, on different dtypes & properly handle separately the pad tokens in the test

* remove remaining _flash_attn_2_enabled occurence

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/modeling_attn_mask_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update docs/source/en/perf_infer_gpu_one.md

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove use_attn_implementation

* fix docstring & slight bug

* make attn_implementation internal (_attn_implementation)

* typos

* fix tests

* deprecate use_flash_attention_2=True

* fix test

* add back llama that was removed by mistake

* fix tests

* remove _flash_attn_2_enabled occurences bis

* add check & test that passed attn_implementation is valid

* fix falcon torchscript export

* fix device of mask in tests

* add tip about torch.jit.trace and move bt doc below sdpa

* fix parameterized.expand order

* move tests from test_modeling_attn_mask_utils to test_modeling_utils as a relevant test class is already there

* update sdpaattention class with the new cache

* Update src/transformers/configuration_utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/bark/modeling_bark.py

* address review comments

* WIP torch.jit.trace fix. left: test both eager & sdpa

* add test for torch.jit.trace for both eager/sdpa

* fix falcon with torch==2.0 that needs to use sdpa

* fix doc

* hopefully last fix

* fix key_value_length that has no default now in mask converter

* is it flacky?

* fix speculative decoding bug

* tests do pass

* fix following #27907

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-09 05:38:14 +09:00
Joao Gante ce0bbd5101
Generate: SinkCache can handle iterative prompts (#27907) 2023-12-08 20:02:20 +00:00
Zach Mueller 6757ed28ce
Allow `resume_from_checkpoint` to handle `auto_find_batch_size` (#27568)
* Fuffill request

* Add test

* Better test

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Better test

* Better test

* MOre comments

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-08 11:51:02 -05:00
Yih-Dar e366937587
Fix 2 tests in `FillMaskPipelineTests` (#27889)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-08 14:55:29 +01:00
Yih-Dar 3b720ad9a5
mark `test_initialization` as flaky in 2 model tests (#27906)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-08 14:54:32 +01:00
Xin Qiu b31905d1f6
Fix remaining issues in beam score calculation (#27808)
* Fix issues in add and is_done for BeamHypotheses

* make newly added arguments optional for better compatibility

* Directly use cur_len as generated_len, add note for retrocompatibility

* update test expectation

* make cur_len represents the length of the entire sequence including the decoder prompt

* remove redundant if/else in testing
2023-12-08 14:14:16 +01:00
Jonathon Belotti 4c5ed1d0c9
fix: non-atomic checkpoint save (#27820) 2023-12-08 14:08:54 +01:00
Saibo-creator 56be5e80e6
Fix: Raise informative exception when `prefix_allowed_tokens_fn` return empty set of tokens (#27797)
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-08 10:25:49 +00:00
fxmarty 307a7d0be8
[⚠️ removed a default argument] Make `AttentionMaskConverter` compatible with `torch.compile(..., fullgraph=True)` (#27868)
* remove bugged torch.float32 default

* add test

* fix tests

* fix test

* fix doc
2023-12-08 18:44:47 +09:00
Tom Aarsen 633215ba58
Generate: New `Cache` abstraction and Attention Sinks support (#26681)
* Draft version of new KV Caching

This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
/ StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented
in a third-party or in transformers directly

* Address numerous PR suggestions

1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic.
2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls.
3. Remove __bool__ and __getitem__ magic as they're confusing.
4. past_key_values.update(key, value, idx) now returns key, value.
5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR.
6. Separate key_cache and value_cache.

Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.

* Implement the SinkCache through backward+forward rotations

* Integrate (Sink)Cache with Llama FA2

* Set use_legacy_cache=True as default, allows for test passes

* Move from/to_legacy_cache to ...Model class

* Undo unnecessary newline change

* Remove copy utility from deprecated OpenLlama

* Match import style

* manual rebase with main

* Cache class working with generate (#1)

* Draft version of new KV Caching

This should allow Attention Sinks (https://github.com/tomaarsen/attention_sinks)
/ StreamingLLM (https://arxiv.org/abs/2309.17453) to be easily implemented
in a third-party or in transformers directly

* Address numerous PR suggestions

1. Move layer_idx from cache to ...Attention. Removes confusing set_layer_idx magic.
2. Always convert past_key_values to Cache instance at the start of ...Attention, removes all other isinstance calls.
3. Remove __bool__ and __getitem__ magic as they're confusing.
4. past_key_values.update(key, value, idx) now returns key, value.
5. Add use_legacy_cache flag, defaults to None, i.e. Falsey. This breaks generate for now, until 1) the cache is used is generate() or 2) use_legacy_cache is defaulted to True in generate() until we change it in another PR.
6. Separate key_cache and value_cache.

Some work is still needed to see if the SinkCache can conveniently be implemented with just one update method.

* Integrate (Sink)Cache with Llama FA2

* Move from/to_legacy_cache to ...Model class

* Undo unnecessary newline change

* Match import style

* working generate

* Add tests; Simplify code; Apply changes to Mistral and Persimmon

* fix rebase mess

* a few more manual fixes

* last manual fix

* propagate changes to phi

* upgrade test

* add use_legacy_cache docstring; beef up tests

* reintroduce unwanted deletes

---------

Co-authored-by: Tom Aarsen <Cubiegamedev@gmail.com>

* move import

* add default to model_kwargs.get('use_legacy_cache')

* correct failing test

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* apply PR suggestions

* fix failing test

* Apply suggestions from code review

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Tom Aarsen <37621491+tomaarsen@users.noreply.github.com>

* PR comments

* tmp commit

* add docstrings

* more tests, more docstrings, add to docs

* derp

* tmp commit

* tmp dbg

* more dbg

* fix beam search bug

* cache can be a list of tuples in some models

* fix group beam search

* all but sinkcache integration tests

* fix sink cache and add hard integration test

* now also compatible with input_embeds input

* PR comments

* add Cache support to Phi+FA2

* make fixup

---------

Co-authored-by: Joao Gante <joao@huggingface.co>
Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2023-12-08 09:00:17 +01:00
Matt 47500b1d72
Fix TF loading PT safetensors when weights are tied (#27490)
* Un-skip tests

* Add aliasing support to tf_to_pt_weight_rename

* Refactor tf-to-pt weight rename for simplicity

* Patch mobilebert

* Let us pray that the transfo-xl one works

* Add XGLM rename

* Expand the test to see if we can get more models to break

* Expand the test to see if we can get more models to break

* Fix MPNet (it was actually an unrelated bug)

* Fix MPNet (it was actually an unrelated bug)

* Add speech2text fix

* Update src/transformers/modeling_tf_pytorch_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/mobilebert/modeling_tf_mobilebert.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update to always return a tuple from tf_to_pt_weight_rename

* reformat

* Add a couple of missing tuples

* Remove the extra test for tie_word_embeddings since it didn't cause any unexpected failures anyway

* Revert changes to modeling_tf_mpnet.py

* Skip MPNet test and add explanation

* Add weight link for BART

* Add TODO to clean this up a bit

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-12-07 14:28:53 +00:00
fxmarty c99f254763
Fix device of masks in tests (#27887)
fix device of mask in tests
2023-12-07 21:34:43 +09:00
Yih-Dar 52746922b0
Allow `# Ignore copy` (#27328)
* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-07 10:00:08 +01:00
Younes Belkada 44b5506d29
[`Llava`] Add Llava to transformers (#27662)
* add model like

* logits match

* minor fixes

* fixes

* up

* up

* add todo

* llava processor

* keep the processor simple

* add conversion script

* fixup

* fix copies

* up

* add to index

* fix config + logits

* fix

* refactor

* more refactor

* more refactor

* fix copies

* add authors

* v1 tests

* add `LlavaProcessor` in init

* remove unneeded import

* up

* up

* docs

* up

* fix CI

* fix CI

* add attention  mask in test

* make fixup

* remove the vision model

* that' s the dirty way to do it

* nits

* nits

* updates

* add more tests

* add input tests

* fixup

* more styling

* nits

* updates amd cleanup

* fixup the generation expected results

* fix the testing script

* some cleanup and simplification which does not work yet but almost there!

* make correct dispatch operations

* vectorize works for batch of images and text

* last todos

* nits

* update test and modeling code

* remove useless function for now

* fix few issues

* fix generation

* some nits

* add bakllava

* nits

* remove duplicated code

* finis merge

* cleanup

* missed this line

* fill the todos

* add left padding offset

* add left and rignt padding logic

* bool to properly index

* make sure

* more cleanups

* batch is fixed 😉

* add correct device for tensor creation

* fix some dtype missmatch

* ruff

* update conversion script

* Update src/transformers/__init__.py

* fa 2 support + fix conversion script

* more

* correct reshaping

* fix test dict

* fix copies by ignoring

* fix nit

* skip clip vision model

* fixup

* fixup

* LlavaForVisionText2Text -> LlavaForCausalLM

* update

* fix

* raise correct errors

* fix

* docs

* nuke for now

* nits here and there

* fixup

* fix remaining tests

* update LlavaForConditionalGeneration instead of CausalLM

* fixups

* pipeline support

* slow and piepline tests

* supports batch

* nits

* cleanup

* fix first integration tests

* add pad token where needed

* correct etsts

* fixups

* update pipeline testr

* fix quality

* nits

* revert unneeded change

* nit

* use BatchFeature

* from ...feature_extraction_utils import BatchFeature

* nits

* nits

* properly update

* more f*** nits

* fix copies

* comment

* keep slow test slow

* Update src/transformers/models/llava/processing_llava.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* add piepline example

* add pixel values in docstrign

* update pr doctest

* fix

* fix slow tests

* remove hack

* fixup

* small note

* forward contrib credits from PR25789

* forward contrib credits from original implementation and work

* add arthur

* Update src/transformers/models/llava/processing_llava.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* update docstring

* nit

* move to not doctested because of timeout issues

* fixup

* add description

* more

* fix-copies

* fix docs

* add beam search

* add more comments

* add typehints on processor

* add speedup plot

* update slow tests and docs

* push test

* push batched test

* fix batched generation with different number of images

* remove benchmark due to a bug

* fix test

* fix copies

* add gcolab demo

---------

Co-authored-by: Arthur Zucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: shauray8 <shauray8@users.noreply.github.com>
Co-authored-by: haotian-liu <haotian-liu@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-12-07 09:30:47 +01:00
Susnato Dhar f84d85ba67
[`FA-2`] Add Flash Attention to `Phi` (#27661)
* add FA and modify doc file

* test_flash_attn_2_generate_padding_right test overwritten

* comment

* modify persimmon modeling file

* added speedup graph

* more changes
2023-12-07 07:57:48 +01:00
Alex McKinney 75336c1794
Add Llama Flax Implementation (#24587)
* Copies `modeling_flax_gpt_neo.py` to start

* MLP Block. WIP Attention and Block

* Adds Flax implementation of `LlamaMLP`
Validated with in-file test.
Some slight numeric differences, but assuming it isn't an issue

* Adds `FlaxLlamaRMSNorm` layer
`flax.linen` includes `RMSNorm` layer but not necessarily in all
versions. Hence, we add in-file.

* Adds FlaxLlamaAttention
Copied from GPT-J as it has efficient caching implementation as well as
rotary embeddings.
Notice numerically different, but not by a huge amount. Needs
investigating

* Adds `FlaxLlamaDecoderLayer`
numerically inaccurate, debugging..

* debugging rotary mismatch
gptj uses interleaved whilst llama uses contiguous
i think they match now but still final result is wrong.
maybe drop back to just debugging attention layer?

* fixes bug with decoder layer
still somewhat numerically inaccurate, but close enough for now

* adds markers for what to implement next
the structure here diverges a lot from the PT version.
not a big fan of it, but just get something working for now

* implements `FlaxLlamaBlockCollection`]
tolerance must be higher than expected, kinda disconcerting

* Adds `FlaxLlamaModule`
equivalent PyTorch model is `LlamaModel`
yay! a language model🤗

* adds `FlaxLlamaForCausalLMModule`
equivalent to `LlamaForCausalLM`
still missing returning dict or tuple, will add later

* start porting pretrained wrappers
realised it probably needs return dict as a prereq

* cleanup, quality, style

* readds `return_dict` and model output named tuples

* (tentatively) pretrained wrappers work 🔥

* fixes numerical mismatch in `FlaxLlamaRMSNorm`
seems `jax.lax.rsqrt` does not match `torch.sqrt`.
manually computing `1 / jax.numpy.sqrt` results in matching values.

* [WIP] debugging numerics

* numerical match
I think issue was accidental change of backend. forcing CPU fixes test.
We expect some mismatch on GPU.

* adds in model and integration tests for Flax Llama
summary of failing:
- mul invalid combination of dimensions
- one numerical mismatch
- bf16 conversion (maybe my local backend issue)
- params are not FrozenDict

* adds missing TYPE_CHECKING import and `make fixup`

* adds back missing docstrings
needs review on quality of docstrings, not sure what is required.
Furthermore, need to check if `CHECKPOINT_FOR_DOC` is valid. See TODO

* commenting out equivalence test as can just use common

* debugging

* Fixes bug where mask and pos_ids were swapped in pretrained models
This results in all tests passing now 🔥

* cleanup of modeling file

* cleanup of test file

* Resolving simpler review comments

* addresses more minor review comments

* fixing introduced pytest errors from review

* wip additional slow tests

* wip tests
need to grab a GPU machine to get real logits for comparison
otherwise, slow tests should be okay

* `make quality`, `make style`

* adds slow integration tests
- checking logits
- checking hidden states
- checking generation outputs

* `make fix-copies`

* fix mangled function following `make fix-copies`

* adds missing type checking imports

* fixes missing parameter checkpoint warning

* more finegrained 'Copied from' tags
avoids issue of overwriting `LLAMA_INPUTS_DOCSTRING`

* swaps import guards
??? how did these get swapped initially?

* removing `inv_freq` again as pytorch version has now removed

* attempting to get CI to pass

* adds doc entries for llama flax models

* fixes typo in __init__.py imports

* adds back special equivalence tests
these come from the gpt neo flax tests. there is special behaviour for these models that needs to override the common version

* overrides tests with dummy to see if CI passes
need to fill in these tests later

* adds my contribution to docs

* `make style; make quality`

* replaces random masking with fixed to work with flax version

* `make quality; make style`

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* updates `x`->`tensor` in `rotate_half`

* addresses smaller review comments

* Update docs/source/en/model_doc/llama.md

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* adds integration test class

* adds `dtype` to rotary embedding to cast outputs

* adds type to flax llama rotary layer

* `make style`

* `make fix-copies`

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* applies suggestions from review

* Update modeling_flax_llama.py

* `make fix-copies`

* Update tests/models/llama/test_modeling_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* Update src/transformers/models/llama/modeling_flax_llama.py

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* fixes shape mismatch in FlaxLlamaMLP

* applies some suggestions from reviews

* casts attn output logits to f32 regardless of dtype

* adds attn bias using `LlamaConfig.attention_bias`

* adds Copied From comments to Flax Llama test

* mistral and persimmon test change -copy from llama

* updates docs index

* removes Copied from in tests

it was preventing `make fix-copies` from succeeding

* quality and style

* ignores FlaxLlama input docstring

* adds revision to `_CHECKPOINT_FOR_DOC`

* repo consistency and quality

* removes unused import

* removes copied from from Phi test

now diverges from llama tests following FlaxLlama changes

* adds `_REAL_CHECKPOINT_FOR_DOC`

* removes refs from pr tests

* reformat to make ruff happy

---------

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2023-12-07 07:05:00 +01:00
Yih-Dar ac975074e6
Update `VitDetModelTester.get_config` to use `pretrain_image_size` (#27831)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-05 16:33:27 +01:00
Arindam Jati b242d0f297
[Time series] Add PatchTSMixer (#26247)
* patchtsmixer initial commit

* x,y->context_values,target_values, unittest addded

* cleanup code

* minor

* return hidden states

* model tests, partial integration tests

* ettm notebook temporary

* minor

* config mask bug fix, tests updated

* final ETT notebooks

* add selfattn

* init

* added docstrings

* PatchTSMixerForPretraining -> PatchTSMixerForMaskPretraining

* functionality tests added

* add start and input docstrings

* docstring edits

* testcase edits

* minor changes

* docstring error fixed

* ran make fixup

* finalize integration tests and docs

* minor

* cleaned gitignore

* added dataclass decorator, ran black formatter

* ran ruff

* formatting

* add slow decorator

* renamed in_Channel to input_size and default to 1

* shorten dataclass names

* use smaller model for testing

* moved the 3 heads to the modeling file

* use scalers instead of revin

* support forecast_channel_indices

* fix regression scaling

* undo reg. scaling

* removed unneeded classes

* forgot missing

* add more layers

* add copied positional_encoding

* use patchmask from patchtst

* removed dependency on layers directory

* formatting

* set seed

* removed unused imports

* fixed forward signature test

* adding distributional head for PatchTSMixerForecasting

* add generate to forecast

* testcases for generate

* add generate and distributional head for regression

* raise Exception for negative values for neg binominal distribution

* formatting changes

* remove copied from patchtst and add TODO for test passing

* make copies

* doc edits

* minor changes

* format issues

* minor changes

* minor changes

* format docstring

* change some class names to PatchTSMixer + class name

Transpose to PatchTSMixerTranspose
GatedAttention to PatchTSMixerGatedAttention

* change NormLayer to PatchTSMixerNormLayer

* change MLP to PatchTSMixerMLP

* change PatchMixer to PatchMixerBlock, FeatureMixer to FeatureMixerBlock

* change ChannelFeatureMixer to ChannelFeatureMixerBlock

* change PatchMasking to PatchTSMixerMasking

* change Patchify to PatchTSMixerPatchify

* list to `list`

* fix docstrings

* formatting

* change bs to batch_size, edit forecast_masking

* edit random_masking

* change variable name and update docstring in PatchTSMixerMasking

* change variable name and update docstring in InjectScalerStatistics4D

* update forward call in PatchTSMixerTranspose

* change variable name and update docstring in PatchTSMixerNormLayer

* change variable name and update docstring in PatchTSMixerMLP

* change variable name and update docstring in ChannelFeatureMixerBlock

* formatting

* formatting issues

* docstring issue

* fixed observed_mask type in docstrings

* use FloatTensor type

* formatting

* fix rescaling issue in forecasting, fixed integration tests

* add docstring from decorator

* fix docstring

* Update README.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/configuration_patchtsmixer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* PatchTSMixerChannelFeatureMixerBlock

* formatting

* ForPretraining

* use num_labels instead of n_classes

* remove commented out code

* docstring fixed

* nn.functional used instead of one letter F

* x_tmp renamed

* one letter variable x removed from forward calls

* one letter variable y removed

* remove commented code

* rename patch_size, in_channels, PatchTSMixerBackbone

* add config to heads

* add config to heads tests

* code reafactoring to use config instead of passing individual params

* Cdocstring fixes part 1

* docstring fixes part 2

* removed logger.debug

* context_values -> past_values

* formatting changes

* pe -> positional_encoding

* removed unused target variable

* self.mode logic fixed

* formatting change

* edit docstring and var name

* change n_targets to num_targets

* rename input_size to num_input_channels

* add head names with prefix PatchTSMixer

* edit docstring in PatchTSMixerForRegression

* fix var name change in testcases

* add PatchTSMixerAttention

* return dict for all exposed classes, test cases added

* format

* move loss function to forward call

* make style

* adding return dict/tuple

* make repo-consistency

* remove flatten mode

* code refactoring

* rename data

* remove PatchTSMixer and keep only PatchTSMixerEncoder

* docstring fixes

* removed unused code

* format

* format

* remove contiguous and formatting changes

* remove model description from config

* replace asserts with ValueError

* remove nn.Sequential from PatchTSMixerNormLayer

* replace if-else with map

* remove all nn.Sequential

* format

* formatting

* fix gradient_checkpointing error after merge, and formatting

* make fix-copies

* remove comments

* reshape

* doesnt support gradient checkpointing

* corect Patchify

* masking updates

* batchnorm copy from

* format checks

* scaler edits

* remove comments

* format changes

* remove self.config

* correct class PatchTSMixerMLP(nn.Module):

* makr fix

* doc updates

* fix-copies

* scaler class correction

* doc edits

* scaler edits

* update readme with links

* injectstatistics add

* fix-copies

* add norm_eps option to LayerNorm

* format changes

* fix copies

* correct make copies

* use parametrize

* fix doc string

* add docs to toctree

* make style

* doc segmenting

* docstring edit

* change forecast to prediction

* edit doc

* doc edits

* remove PatchTSMixerTranspose

* add PatchTSMixerPositionalEncoding and init position_enc

* remove positional_encoding

* edit forecast_masking, remove forecast_mask_ratios

* fix broken code

* var rename target_values -> future_values

* num_features -> d_model

* fix broken code after master merge

* repo consistency

* use postional embedding

* prediction_logits -> prediction_outputs, make fix-copies

* uncommented @slow

* minor changes

* loss first in tuple

* tuple and dict same ordering

* style edits

* minor changes

* dict/tuple consistent enablement

* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/patchtsmixer/test_modeling_patchtsmixer.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update src/transformers/models/patchtsmixer/modeling_patchtsmixer.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix formatting

* formatting

* usage tip

* test on cpu only

* add sample usage

* change PatchTSMixerForClassification to PatchTSMixerForTimeSeriesClassification

* push changes

* fix copies

* std scaling set to default True case

* minor changes

* stylechanges

---------

Co-authored-by: Arindam Jati <arindam.jati@ibm.com>
Co-authored-by: vijaye12 <vijaye12@in.ibm.com>
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
Co-authored-by: nnguyen <nnguyen@us.ibm.com>
Co-authored-by: vijaye12 <vijaykr.e@gmail.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-05 15:31:35 +01:00
Joao Gante b7e6d120c1
Generate: Update VisionEncoderDecoder test value (#27850)
update test result, due to bug fix in decoder-only beam search
2023-12-05 11:26:59 +00:00
Younes Belkada fdb85be40f
Faster generation using AWQ + Fused modules (#27411)
* v1 fusing modules

* add fused mlp support

* up

* fix CI

* block save_pretrained

* fixup

* small fix

* add new condition

* add v1 docs

* add some comments

* style

* fix nit

* adapt from suggestion

* add check

* change arg names

* change variables name

* Update src/transformers/integrations/awq.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* style

* split up into 3 different private methods

* more conditions

* more checks

* add fused tests for custom models

* fix

* fix tests

* final update docs

* final fixes

* fix importlib metadata

* Update src/transformers/utils/quantization_config.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* change it to `do_fuse`

* nit

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* Update src/transformers/utils/quantization_config.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* few fixes

* revert

* fix test

* fix copies

* raise error if model is not quantized

* add test

* use quantization_config.config when fusing

* Update src/transformers/modeling_utils.py

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
2023-12-05 12:14:45 +01:00
NielsRogge df40edfb00
Make image processors more general (#27690)
* Make image processors more general

* Add backwards compatibility for KOSMOS-2

* Remove use_square_size everywhere

* Remove script
2023-12-05 10:45:39 +01:00
Sanchit Gandhi 3c15fd1990
[Seamless v2] Add FE to auto mapping (#27829) 2023-12-04 16:34:13 +00:00
Yih-Dar 1d63b0ec36
Disallow `pickle.load` unless `TRUST_REMOTE_CODE=True` (#27776)
* fix

* fix

* Use TRUST_REMOTE_CODE

* fix doc

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-04 16:48:37 +01:00
fxmarty 1da1302ec8
Flash Attention 2 support for RoCm (#27611)
* support FA2

* fix typo

* fix broken tests

* fix more test errors

* left/right

* fix bug

* more test

* typo

* fix layout flash attention falcon

* do not support this case

* use allclose instead of equal

* fix various bugs with flash attention

* bump

* fix test

* fix mistral

* use skiptest instead of return that may be misleading

* add fix causal arg flash attention

* fix copies

* more explicit comment

* still use self.is_causal

* fix causal argument

* comment

* fixes

* update documentation

* add link

* wrong test

* simplify FA2 RoCm requirements

* update opt

* make flash_attn_uses_top_left_mask attribute private and precise comment

* better error handling

* fix copy & mistral

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/utils/import_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* use is_flash_attn_greater_or_equal_2_10 instead of is_flash_attn_greater_or_equal_210

* fix merge

* simplify

* inline args

---------

Co-authored-by: Felix Marty <felix@hf.co>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-12-04 21:52:17 +09:00
Nilesh 4d4febb7aa
Added test cases for rembert refering to albert and reformer test_tok… (#27637)
* Added test cases for rembert refering to albert and reformer test_tokenization

* removed CURL_CA_BUNDLE='

* Added flag test_sentencepiece_ignore_case and space_between_special_tokens to True

* Overrided test_added_tokens_serialization

* As slow->fast token failed due to the different initialization for [MASK]  for slow and fast, Therefore it required to make the initialization for [MASK] token uniform between fast and slow token

* Added few more test cases in test_encode_decode_round_trip and modefied the slow token (mask_token) to  have AddedToken instance with lstrip=True

* Added few test cases in test_encoder_decoder round trip and also modified slow tokenizer of rembert to have mask_token as AddedToken with lstrip = True

* Cleaned the code and added  fmt: skip to avoid line breaks after make style +  added comments to indicate from the copied test cases

* Corrected few comments

* Fixed quality issue

* Ran fix-copies

* Fixed few minor issues as (make fix-copies) broke few test cases while stripping the text

* Reverted the changes made by repo-consistancy

---------

Co-authored-by: Kokane <kokanen@apac.corpdir.net>
2023-12-04 13:36:57 +01:00
Yih-Dar 73893df864
Fix `Owlv2ModelIntegrationTest::test_inference_object_detection` (#27793)
* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-04 09:45:22 +01:00
Yih-Dar 5a551df92b
Fix `TvpModelIntegrationTests` (#27792)
* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-12-04 09:40:42 +01:00
Arthur c0b9db0914
[`ModelOnTheFlyConversionTester`] Mark as slow for now (#27823)
* mark test as slow for now

* style
2023-12-04 08:33:15 +01:00
NielsRogge 7edf8bfafd
Improve forward signature test (#27729)
* First draft

* Extend test_forward_signature

* Update tests/test_modeling_common.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Revert suggestion

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-12-04 07:38:22 +01:00
Nicolas Patry 7b6324e18e
Make using safetensors files automated. (#27571)
* [WIP] Make using safetensors files automated.

If `use_safetensors=True` is used, and it doesn't exist:

- Don't crash just yet
- Lookup for an open PR containing it.
- If yes, use that instead
- If not, touch the space to convert, wait for conversion to be finished
  and the PR to be opened
- Use that new PR
- Profit.

* Remove the token.

* [Auto Safetensors] Websocket -> SSE (#27656)

* Websocket -> SSE

* Support sharded + tests +cleanup

a

* env var

* Apply suggestions from code review

* Thanks Simon

* Thanks Wauplin

Co-authored-by: Wauplin <lucainp@gmail.com>

* Cleanup

* Update tests

* Tests should pass

* Apply to other tests

* Extend extension

* relax requirement on latest hfh

* Revert

* Correct private handling & debug statements

* Skip gated repos as of now

* Address review comments

Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>

---------

Co-authored-by: Lysandre Debut <hi@lysand.re>
Co-authored-by: Lysandre <lysandre@huggingface.co>
Co-authored-by: Wauplin <lucainp@gmail.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
2023-12-01 15:51:10 +01:00
Yoach Lacombe 29f1aee3b6
Add SeamlessM4T v2 (#27779)
* add working convertion script

* first non-working version of modeling code

* update modeling code (working)

* make style

* make fix-copies

* add config docstrings

* add config to ignore docstrings formatage due to unconventional markdown

* fix copies

* fix generation num_return_sequences

* enrich docs

* add and fix tests beside integration tests

* update integration tests

* update repo id

* add tie weights and make style

* correct naming in .md

* fix imports and so on

* correct docstrings

* fix fp16 speech forward

* fix speechencoder attention

* make style

* fix copied from

* rename SeamlessM4Tv2-v2 to SeamlessM4Tv2

* Apply suggestions on configuration

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* remove useless public models

* fix private models + better naming for T2U models

* clean speech encoder relative position embeddings

* refactor chunk attention

* add docstrings to chunk attention method

* improve naming and docstrings

* rename some attention variables + add temperature sampling in T2U model

* rename DOCSTRINGS variable names

* make style + remove 2 useless config parameters

* enrich model card

* remove any attention_head reference + fix temperature in T2U

* new fmt and make style

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* rename spkr_id->speaker_id and change docstrings of get_char_input_ids

* simplify v2attention

* make style

* Update seamless_m4t_v2.md

* update code and tests with last update

* update repo ids

* fill article name, abstract andauthors

* update not_doctested and slow_doc tests

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2023-11-30 20:24:43 +01:00
Joao Gante 510270af34
Generate: `GenerationConfig` throws an exception when `generate` args are passed (#27757) 2023-11-30 14:16:31 +00:00
Kashif Rasul af8acc4760
[Time series] Add patchtst (#27581)
* add distribution head to forecasting

* formatting

* Add generate function for forecasting

* Add generate function to prediction task

* formatting

* use argsort

* add past_observed_mask ordering

* fix arguments

* docs

* add back test_model_outputs_equivalence test

* formatting

* cleanup

* formatting

* use ACT2CLS

* formatting

* fix add_start_docstrings decorator

* add distribution head and generate function to regression task

add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput,  PatchTSTForRegressionOutput.

* add distribution head and generate function to regression task

add distribution head and generate function to regression task. Also made add PatchTSTForForecastingOutput,  PatchTSTForRegressionOutput.

* fix typos

* add forecast_masking

* fixed tests

* use set_seed

* fix doc test

* formatting

* Update docs/source/en/model_doc/patchtst.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* better var names

* rename PatchTSTTranspose

* fix argument names and docs string

* remove compute_num_patches and unused class

* remove assert

* renamed to PatchTSTMasking

* use num_labels for classification

* use num_labels

* use default num_labels from super class

* move model_type after docstring

* renamed PatchTSTForMaskPretraining

* bs -> batch_size

* more review fixes

* use hidden_state

* rename encoder layer and block class

* remove commented seed_number

* edit docstring

* Add docstring

* formatting

* use past_observed_mask

* doc suggestion

* make fix-copies

* use Args:

* add docstring

* add docstring

* change some variable names and add PatchTST before some class names

* formatting

* fix argument types

* fix tests

* change x variable to patch_input

* format

* formatting

* fix-copies

* Update tests/models/patchtst/test_modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* move loss to forward

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* formatting

* fix a bug when pre_norm is set to True

* output_hidden_states is set to False as default

* set pre_norm=True as default

* format docstring

* format

* output_hidden_states is None by default

* add missing docs

* better var names

* docstring: remove default to False in output_hidden_states

* change labels name to target_values in regression task

* format

* fix tests

* change to forecast_mask_ratios and random_mask_ratio

* change mask names

* change future_values to target_values param in the prediction class

* remove nn.Sequential and make PatchTSTBatchNorm class

* black

* fix argument name for prediction

* add output_attentions option

* add output_attentions to PatchTSTEncoder

* formatting

* Add attention output option to all classes

* Remove PatchTSTEncoderBlock

* create PatchTSTEmbedding class

* use config in PatchTSTPatchify

* Use config in PatchTSTMasking class

* add channel_attn_weights

* Add PatchTSTScaler class

* add output_attentions arg to test function

* format

* Update doc with image patchtst.md

* fix-copies

* rename Forecast <-> Prediction

* change name of a few parameters to match with PatchTSMixer.

* Remove *ForForecasting class to match with other time series models.

* make style

* Remove PatchTSTForForecasting in the test

* remove PatchTSTForForecastingOutput class

* change test_forecast_head to test_prediction_head

* style

* fix docs

* fix tests

* change num_labels to num_targets

* Remove PatchTSTTranspose

* remove arguments in PatchTSTMeanScaler

* remove arguments in PatchTSTStdScaler

* add config as an argument to all the scaler classes

* reformat

* Add norm_eps for batchnorm and layernorm

* reformat.

* reformat

* edit docstring

* update docstring

* change variable name pooling to pooling_type

* fix output_hidden_states as tuple

* fix bug when calling PatchTSTBatchNorm

* change stride to patch_stride

* create PatchTSTPositionalEncoding class and restructure the PatchTSTEncoder

* formatting

* initialize scalers with configs

* edit output_hidden_states

* style

* fix forecast_mask_patches doc string

* doc improvements

* move summary to the start

* typo

* fix docstring

* turn off masking when using prediction, regression, classification

* return scaled output

* adjust output when using distribution head

* remove _num_patches function in the config

* get config.num_patches from patchifier init

* add output_attentions docstring, remove tuple in output_hidden_states

* change SamplePatchTSTPredictionOutput and SamplePatchTSTRegressionOutput to SamplePatchTSTOutput

* remove print("model_class: ", model_class)

* change encoder_attention_heads to num_attention_heads

* change norm to norm_layer

* change encoder_layers to num_hidden_layers

* change shared_embedding to share_embedding, shared_projection to share_projection

* add output_attentions

* more robust check of norm_type

* change dropout_path to path_dropout

* edit docstring

* remove positional_encoding function and add _init_pe in PatchTSTPositionalEncoding

* edit shape of cls_token and initialize it

* add a check on the num_input_channels.

* edit head_dim in the Prediction class to allow the use of cls_token

* remove some positional_encoding_type options, remove learn_pe arg, initalize pe

* change Exception to ValueError

* format

* norm_type is "batchnorm"

* make style

* change cls_token shape

* Change forecast_mask_patches to num_mask_patches. Remove forecast_mask_ratios.

* Bring PatchTSTClassificationHead on top of PatchTSTForClassification

* change encoder_ffn_dim to ffn_dim and edit the docstring.

* update variable names to match with the config

* add generation tests

* change num_mask_patches to num_forecast_mask_patches

* Add examples explaining the use of these models

* make style

* Revert "Revert "[time series] Add PatchTST (#25927)" (#27486)"

This reverts commit 78f6ed6c70.

* make style

* fix default std scaler's minimum_scale

* fix docstring

* close code blocks

* Update docs/source/en/model_doc/patchtst.md

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update tests/models/patchtst/test_modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/configuration_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/patchtst/modeling_patchtst.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix tests

* add add_start_docstrings

* move examples to the forward's docstrings

* update prepare_batch

* update test

* fix test_prediction_head

* fix generation test

* use seed to create generator

* add output_hidden_states and config.num_patches

* add loc and scale args in PatchTSTForPredictionOutput

* edit outputs if if not return_dict

* use self.share_embedding to check instead checking type.

* remove seed

* make style

* seed is an optional int

* fix test

* generator device

* Fix assertTrue test

* swap order of items in outputs when return_dict=False.

* add mask_type and random_mask_ratio to unittest

* Update modeling_patchtst.py

* add add_start_docstrings for regression model

* make style

* update model path

* Edit the ValueError comment in forecast_masking

* update examples

* make style

* fix commented code

* update examples: remove config from from_pretrained call

* Edit example outputs

* Set default target_values to None

* remove config setting in regression example

* Update configuration_patchtst.py

* Update configuration_patchtst.py

* remove config from examples

* change default d_model and ffn_dim

* norm_eps default

* set has_attentions to Trye and define self.seq_length = self.num_patche

* update docstring

* change variable mask_input to do_mask_input

* fix blank space.

* change logger.debug to logger.warning.

* remove unused PATCHTST_INPUTS_DOCSTRING

* remove all_generative_model_classes

* set test_missing_keys=True

* remove undefined params in the docstring.

---------

Co-authored-by: nnguyen <nnguyen@us.ibm.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Nam Nguyen <namctin@gmail.com>
Co-authored-by: Wesley Gifford <79663411+wgifford@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-29 13:36:38 +01:00
Susnato Dhar dfbd209c25
CLVP Fixes (#27547)
* fixes

* more fixes

* style fix

* more fix

* comments
2023-11-28 17:40:01 +01:00
Yih-Dar 30e92ea323
Trigger corresponding pipeline tests if `tests/utils/tiny_model_summary.json` is modified (#27693)
* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-28 17:21:21 +01:00
NielsRogge 1fb3c23b41
Add BeitBackbone (#25952)
* First draft

* Add backwards compatibility

* More improvements

* More improvements

* Improve error message

* Address comment

* Add conversion script

* Fix style

* Update code snippet

* Adddress comment

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2023-11-28 08:38:32 +00:00
Charbel Abi Daher 2ca73e5ee3
Fixed passing scheduler-specific kwargs via TrainingArguments lr_scheduler_kwargs (#27595)
* Fix passing scheduler-specific kwargs through TrainingArguments `lr_scheduler_kwargs`

* Added test for lr_scheduler_kwargs
2023-11-28 08:33:45 +01:00
NielsRogge 59499bbe8b
Update forward signature test for vision models (#27681)
* Update forward signature

* Empty-Commit
2023-11-27 15:48:17 +01:00
jiqing-feng 1d7f406e19
fix assisted decoding assistant model inputs (#27503)
* fix assisted decoding attention_cat

* fix attention_mask for assisted decoding

* fix attention_mask len

* fix attn len

* Use a more clean way to prepare assistant models inputs

* fix param meaning

* fix param name

* fix assistant model inputs

* update token type ids

* fix assistant kwargs copy

* add encoder-decoder tests of assisted decoding

* check if assistant kwargs contains updated keys

* revert test

* fix whisper tests

* fix assistant kwargs

* revert whisper test

* delete _extend funcs
2023-11-27 14:23:54 +00:00
Yanan Xie b09912c8f4
Fix mistral generate for long prompt / response (#27548)
* Fix mistral generate for long prompt / response

* Add unit test

* fix linter

* fix linter

* fix test

* add assisted generation test for mistral and load the model in 4 bit + fa2
2023-11-27 10:18:41 +01:00
Yih-Dar 35551f9a0f
Fix `TVPModelTest` (#27695)
* fix

* fix

* fix

* fix

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-24 19:47:50 +01:00
Yih-Dar 7293fdc5b9
Deprecate `TransfoXL` (#27607)
* fix

* fix

* trigger

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <hi@lysand.re>

* tic

* revert

* revert

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2023-11-24 11:48:02 +01:00
Yih-Dar 623432dcc9
Skip pipeline tests for 2 models for now (#27687)
fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-24 09:43:20 +01:00
Yih-Dar b8db265bc6
Update tiny model summary file (#27388)
* update

* fix

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2023-11-23 21:00:39 +01:00
dg845 7f6a804d30
Add UnivNet Vocoder Model for Tortoise TTS Diffusers Integration (#24799)
* initial commit

* Add inital testing files and modify __init__ files to add UnivNet imports.

* Fix some bugs

* Add checkpoint conversion script and add references to transformers pre-trained model.

* Add UnivNet entries for auto.

* Add initial docs for UnivNet.

* Handle input and output shapes in UnivNetGan.forward and add initial docstrings.

* Write tests and make them pass.

* Write docs.

* Add UnivNet doc to _toctree.yml and improve docs.

* fix typo

* make fixup

* make fix-copies

* Add upsample_rates parameter to config and improve config documentation.

* make fixup

* make fix-copies

* Remove unused upsample_rates config parameter.

* apply suggestions from review

* make style

* Verify and add reason for skipped tests inherited from ModelTesterMixin.

* Add initial UnivNetGan integration tests

* make style

* Remove noise_length input to UnivNetGan and improve integration tests.

* Fix bug and make style

* Make UnivNet integration tests pass

* Add initial code for UnivNetFeatureExtractor.

* make style

* Add initial tests for UnivNetFeatureExtractor.

* make style

* Properly initialize weights for UnivNetGan

* Get feature extractor fast tests passing

* make style

* Get feature extractor integration tests passing

* Get UnivNet integration tests passing

* make style

* Add UnivNetGan usage example

* make style and use feature extractor from hub in integration tests

* Update tips in docs

* apply suggestions from review

* make style

* Calculate padding directly instead of using get_padding methods.

* Update UnivNetFeatureExtractor.to_dict to be UnivNet-specific.

* Update feature extractor to support using model(**inputs) and add the ability to generate noise and pad the end of the spectrogram in __call__.

* Perform padding before generating noise to ensure the shapes are correct.

* Rename UnivNetGan.forward's noise_waveform argument to noise_sequence.

* make style

* Add tests to test generating noise and padding the end for UnivNetFeatureExtractor.__call__.

* Add tests for checking batched vs unbatched inputs for UnivNet feature extractor and model.

* Add expected mean and stddev checks to the integration tests and make them pass.

* make style

* Make it possible to use model(**inputs), where inputs is the output of the feature extractor.

* fix typo in UnivNetGanConfig example

* Calculate spectrogram_zero from other config values.

* apply suggestions from review

* make style

* Refactor UnivNet conversion script to use load_state_dict (following persimmon).

* Rename UnivNetFeatureExtractor to UnivNetGanFeatureExtractor.

* make style

* Switch to using torch.tensor and torch.testing.assert_close for testing expected values/slices.

* make style

* Use config in UnivNetGan modeling blocks.

* make style

* Rename the spectrogram argument of UnivNetGan.forward to input_features, following Whisper.

* make style

* Improving padding documentation.

* Add UnivNet usage example to the docs.

* apply suggestions from review

* Move dynamic_range_compression computation into the mel_spectrogram method of the feature extractor.

* Improve UnivNetGan.forward return docstring.

* Update table in docs/source/en/index.md.

* make fix-copies

* Rename UnivNet components to have pattern UnivNet*.

* make style

* make fix-copies

* Update docs

* make style

* Increase tolerance on flaky unbatched integration test.

* Remove torch.no_grad decorators from UnivNet integration tests to try to avoid flax/Tensorflow test errors.

* Add padding_mask argument to UnivNetModel.forward and add batch_decode feature extractor method to remove padding.

* Update documentation and clean up padding code.

* make style

* make style

* Remove torch dependency from UnivNetFeatureExtractor.

* make style

* Fix UnivNetModel usage example

* Clean up feature extractor code/docstrings.

* apply suggestions from review

* make style

* Add comments for tests skipped via ModelTesterMixin flags.

* Add comment for model parallel tests skipped via the test_model_parallel ModelTesterMixin flag.

* Add # Copied from statements to copied UnivNetFeatureExtractionTest tests.

* Simplify UnivNetFeatureExtractorTest.test_batch_decode.

* Add support for unbatched padding_masks in UnivNetModel.forward.

* Refactor unbatched padding_mask support.

* make style
2023-11-22 17:21:36 +01:00
Patrick von Platen 4151fbb49c
[Whisper] Add sequential longform decoding (#27492)
* [Whisper] Add seq gen

* [Whisper] Add seq gen

* more debug

* Fix whisper logit processor

* Improve whisper code further

* Fix more

* more debug

* more debug

* Improve further

* Add tests

* Prep for batch size > 1

* Get batch_size>1 working

* Correct more

* Add extensive tests

* more debug

* more debug

* more debug

* add more tests

* more debug

* Apply suggestions from code review

* more debug

* add comments to explain the code better

* add comments to explain the code better

* add comments to explain the code better

* Add more examples

* add comments to explain the code better

* fix more

* add comments to explain the code better

* add comments to explain the code better

* correct

* correct

* finalize

* Apply suggestions from code review

* Apply suggestions from code review
2023-11-22 13:27:34 +01:00