ArthurZucker
ab0f050b42
Release: v4.41.2
2024-05-30 13:28:00 -04:00
Matt
57f5553d2e
Fix faulty rstrip in module loading ( #31108 )
2024-05-30 13:25:10 -04:00
oOraph
73b180c2be
fix from_pretrained in offline mode when model is preloaded in cache ( #31010 )
...
* Unit test to verify fix
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>
* fix from_pretrained in offline mode when model is preloaded in cache
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>
* minor: fmt
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>
---------
Signed-off-by: Raphael Glon <oOraph@users.noreply.github.com>
Co-authored-by: Raphael Glon <oOraph@users.noreply.github.com>
2024-05-30 13:25:10 -04:00
Aymeric Roucher
a6325a77b2
Redirect transformers_agents doc to agents ( #31054 )
2024-05-30 13:25:10 -04:00
Pablo Montalvo
9ccdc84cb2
Paligemma- fix devices and dtype assignments ( #31008 )
...
* fix devices and dtype assignments
* [run-slow]paligemma
2024-05-30 13:25:09 -04:00
Lucain
12aa3167b4
Do not trigger autoconversion if local_files_only ( #31004 )
2024-05-24 05:02:39 -04:00
ArthurZucker
75f15f39a0
Release: v4.41.1
2024-05-22 13:40:40 -04:00
Pablo Montalvo
8282db5cc9
Paligemma causal attention mask ( #30967 )
...
* PaliGemma working causal attention
* Formatting
* Style
* Docstrings + remove commented code
* Update docstring for PaliGemma Config
* PaliGemma - add separator ind to model/labels
* Refactor + docstring paligemma processor method
* Style
* return token type ids when tokenizing labels
* use token type ids when building causal mask
* add token type ids to tester
* remove separator from config
* fix style
* don't ignore separator
* add processor documentation
* simplify tokenization
* fix causal mask
* style
* fix label propagation, revert suffix naming
* fix style
* fix labels tokenization
* [run-slow]paligemma
* add eos if suffixes are present
* [run-slow]paligemma
* [run-slow]paligemma
* add misssing tokens to fast version
* Apply suggestions from code review
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix style
* [run-slow]paligemma
---------
Co-authored-by: Peter Robicheaux <peter@roboflow.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-05-22 13:39:52 -04:00
ArthurZucker
e5b788ade3
Revert "feat: Upgrade Weights & Biases callback ( #30135 )"
...
This reverts commit 4ab7a28216
.
2024-05-22 12:39:27 -04:00
Raushan Turganbay
9d054596e7
Generation: get special tokens from model config ( #30899 )
...
* fix
* let's do this way?
* codestyle
* update
* add tests
2024-05-22 12:37:27 -04:00
hoshi-hiyouga
e5d174f12a
PaliGemma - fix processor with no input text ( #30916 )
...
Update processing_paligemma.py
2024-05-22 12:37:15 -04:00
Arthur
04141855bd
legacy to init the slow tokenizer when converting from slow was wrong ( #30972 )
2024-05-22 12:37:07 -04:00
Arthur
6d2439a126
`tokenizer_class = "AutoTokenizer"` Llava Family ( #30912 )
...
propagate changes to more models
2024-05-22 12:36:58 -04:00
ArthurZucker
4c6c45ba13
Release: v4.41.0
2024-05-17 11:11:44 -04:00
Arthur
e9a8041d1c
update release script ( #30880 )
...
* update release script
* update release script
2024-05-17 17:09:30 +02:00
Arthur
0a9300f474
Support arbitrary processor ( #30875 )
...
* Support arbitrary processor
* fix
* nit
* update
* nit
* nit
* fix and revert
* add a small test
* better check
* fixup
* bug so let's just use class for now
* oups
* .
2024-05-17 16:51:31 +02:00
Sanchit Gandhi
57edd84bdb
[whisper] fix multilingual fine-tuning ( #30865 )
...
* [whisper] fix multilingual fine-tuning
* config ids as well
2024-05-17 15:12:44 +01:00
Jacky Lee
977ce58a78
Fix dependencies for image classification example ( #30842 )
...
* fix: missing dependencies
* fix: image classification dependencies
2024-05-17 13:57:47 +01:00
Darshana S
3802e786ef
Enable device map ( #30870 )
...
* added_no_split_modules
* added LlavaNextVisionAttention to _no_split_modules
2024-05-17 12:50:24 +01:00
amyeroberts
57c965a8f1
Remove deprecated logic and warnings ( #30743 )
...
* Remove deprecated logic and warnings
* Add back some code that seems to be important...
* Let's just add all he nllb stuff back; removing it is a bit more involved
* Remove kwargs
* Remove more kwargs
2024-05-17 12:15:59 +01:00
Younes Belkada
3d7d3a87a0
TEST: Add llama logits tests ( #30835 )
...
* add llama logits test
* fix
* fix tests
"
"
* fix for a10
* format
* format
* fix
* [run-slow] remove fmt: skip
* Your commit message
* test commit
* Revert "test commit"
This reverts commit b66e01e55f
.
* [run-slow]llama
* Update tests/models/llama/test_modeling_llama.py
* [run-slow]llama
* empty commit
2024-05-17 12:23:00 +02:00
amyeroberts
15c74a2829
Fix VideoLlava imports ( #30867 )
...
* Fix VideoLlava imports
* Update dummy objects
2024-05-16 17:06:21 +01:00
Younes Belkada
4e17e7dcf8
TST / Quantization: Reverting to torch==2.2.1 ( #30866 )
...
Reverting to 2.2.1
2024-05-16 17:30:02 +02:00
Joao Gante
f4014e75db
Docs: update example with assisted generation + sample ( #30853 )
2024-05-16 14:32:21 +01:00
Raushan Turganbay
95b3c3814d
Video-LLaVa: Fix docs ( #30855 )
...
fix model id in docs
2024-05-16 17:23:01 +05:00
Yih-Dar
1b3dba9417
Make `Gemma` work with `torch.compile` ( #30775 )
...
* fix
* [run-slow] gemma
* add test
* add `test_compile_static_cache`
* fix
* style
* remove subprocess
* use attribute
* fix
* style
* update
* [run-slow] dbrx,gemma,jetmoe,phi3,recurrent_gemma
---------
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-05-16 13:41:33 +02:00
Mohit Sharma
0753134f4d
Disable the FA backend for SDPA on AMD GPUs ( #30850 )
...
* disable fa
* disable fa
* update warning
* update warning
2024-05-16 13:31:14 +02:00
Joao Gante
9d889f870e
Cache: add new flag to distinguish models that `Cache` but not static cache ( #30800 )
...
* jamba cache
* new flag
* generate exception
2024-05-16 12:08:35 +01:00
NielsRogge
17cc71e149
[Idefics2] Improve docs, add resources ( #30717 )
...
* Add resources
* Address comment
* Address comments
* Update docs/source/en/model_doc/idefics2.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update figure
---------
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-16 12:22:13 +02:00
hyenal
1c21f48a50
add sdpa to ViT [follow up of #29325 ] ( #30555 )
...
remove blank line (+1 squashed commit)
Squashed commits:
[24ccd2061] [run-slow]vit_msn,vision_encoder_decoder (+24 squashed commits)
Squashed commits:
[08bd27e7a] [run-slow]vit_msn,vision_encoder_decoder
[ec96a8db3] [run-slow]vit_msn
[ead817eca] fix vit msn multi gpu
[d12cdc8fd] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
[3fdbfa88f] doc
[a3ff33e4a] finish implementation
[e20b7b7fb] Update test_modeling_common.py
[e290c5810] Update test_modeling_flax_common.py
[d3af86f46] comment
[ff7dd32d8] more comments
[59b137889] suggestion
[7e2ba6d67] attn_implementation as attribute of the class
[fe66ab71f] minor
[38642b568] Apply suggestions from code review
Accept comments
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[22cde7d52] Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[48e137cc6] Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[99f4c679f] Update tests/test_modeling_common.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[96cf20a6d] Update src/transformers/models/vit_msn/modeling_vit_msn.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[c59377d23] Update src/transformers/models/vit_mae/modeling_vit_mae.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[b70a47259] Update tests/models/vision_text_dual_encoder/test_modeling_vision_text_dual_encoder.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
[00c84d216] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
[61f00ebb0] all tests are passing locally
[e9e0b82b7] vision encoder/decoder
[4d5076b56] test-vision (+20 squashed commits)
Squashed commits:
[d1add8db9] yolo
[9fde65716] fix flax
[986566c28] minor
[ca2f21d1f] vit
[3333efd7a] easy models change
[ebfc21402] [run-slow]audio_spectrogram_transformer,deit,vision_encoder_decoder,vision_text_dual_encoder,vit,vit_hybrid,vit_mae,vit_msn,videomae,yolos
[b8b8603ed] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos
[48ecc7e26] all tests are passing locally
[bff7fc366] minor
[62f88306f] fix yolo and text_encoder tests
[121507555] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
[1064cae0a] [run-slow]vision_encoder_decoder,vision_text_dual_encoder,yolos
[b7f52ff3a] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
[cffaa10dd] fix-copies
[ef6c511c4] test vit hybrid
[7d4ba8644] vit hybrid
[66f919033] [run-slow]audio_spectrogram_transformer,deit,vit,vit_hybrid,vit_mae,vit_msn,videomae
[1fcc0a031] fixes
[cfde6eb21] fixup
[e77df1ed3] all except yolo end encoder decoder (+17 squashed commits)
Squashed commits:
[602913e22] vit + vit_mae are working
[547f6c4cc] RUN_SLOW=1 pytest tests/models/audio_spectrogram_transformer/ tests/models/deit/ tests/models/videomae/ passes
[61a97dfa9] it s the complete opposite...
[aefab37d4] fix more tests
[71802a1b9] fix all torch tests
[40b12eb58] encoder - decoder tests
[941552b69] slow decorator where appropriate
[14d055d80] has_attentions to yolo and msn
[3381fa19f] add correct name
[e261316a7] repo consistency
[31c6d0c08] fixup
[9d214276c] minor fix
[11ed2e1b7] chore
[eca6644c4] add sdpa to vit-based models
[cffbf390b] make fix-copies result
[6468319b0] fix style
[d324cd02a] add sdpa for vit
Co-authored-by: Liubov Yaronskaya <luba.yaronskaya@gmail.com>
2024-05-16 10:56:11 +01:00
NielsRogge
9fd606dbdb
[LLaVa-NeXT] Small fixes ( #30841 )
...
* First draft
* Update docstring
2024-05-16 08:19:15 +02:00
Edoardo Cetin
4b3eb19fa7
Fix llama model sdpa attention forward function masking bug when output_attentions=True ( #30652 )
...
* Fix llama model forward function with attention=True, same-length encoded sequence.
* Fix style
* propagate fix to modeling_cohere, gemma, dbrx, and olmo (which copy the same sdpa masking logic from llama)
* Fix style
* ignore unnecessary sdpa mask converter when output_attentions=True
* add tests checking sdpa and eager outputs match when output_attentions=True
* Split if statements in two lines
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Fix formatting
* Add fix to new jetmoe model
* Add missing output_attentions argument to jetmoe mask creation
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-05-15 19:48:19 +02:00
Yih-Dar
2d83324ecf
Use `torch 2.3` for CI ( #30837 )
...
2.3
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-05-15 19:31:52 +02:00
Younes Belkada
3f435823e0
FEAT / Bitsandbytes: Add `dequantize` API for bitsandbytes quantized models ( #30806 )
...
* add method
* change method name
* more comments
* Apply suggestions from code review
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fixup
* add docstrings and fix comment
* warn users on the de-quantized dtype
* Update src/transformers/quantizers/base.py
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Update src/transformers/integrations/bitsandbytes.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* final suggestion - use private method
---------
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-15 17:17:09 +02:00
amyeroberts
58faa7b824
Deprecate models script - correctly set the model name for the doc file ( #30785 )
...
* Correctly set the moel name for the doc file
* Fix up
2024-05-15 15:14:11 +01:00
Xuan-Phi Nguyen
5ca085b882
Better llava next. ( #29850 )
...
* Better llava next.
- Batched forward with multiple image of different sizes (number of patches).
- Support training, for cases without any image.
- Support multi-image in same sequence. e.g: ["<image> <image> the first image is a dog while the second is a cat", "<image> <image> <image> <image> these 4 image are..."]
Current limitation:
- Haven't done testing
- Only support right padding (for training)
- left padding (batched generation) is not ready yet.
- PR not ready.
* fix bugs in batched generation
* add tests
* fix batch-gen bugs, left-padding positions and incorrect attention mask
* remove better modeling llava
* fix formatting
* fix test
* fix test
* fix testing
* fix test
* fix formatting
* Update src/transformers/models/llava_next/modeling_llava_next.py
add clarity
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update modeling_llava_next.py
remove assert
* fix bug modeling_llava_next.py
* update modeling
* fix bugs
* fix format
* fix error
* fix new_token_positions
* Update modeling_llava_next.py
* update formatting
* add args
* removecomments
* add slow tests for batched inference
* failing tf/flax tests
* this one ic correct
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* fix docs
* make fixup
* more fixup
* add test for batch equivalence
* Update tests/models/llava_next/test_modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/image_processing_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/image_processing_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/llava_next/modeling_llava_next.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* pr comments
* hardcode padding side for bs=1
* update
* [run-slow] llava_next
* [run-slow] llava_next
* make fix-copies
---------
Co-authored-by: NGUYEN, Xuan Phi <x.nguyen@alibaba-inc.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: raushan <raushan@huggingface.co>
Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>
2024-05-15 19:02:56 +05:00
Sourab Mangrulkar
bdfefbadaf
Update ds_config_zero3.json ( #30829 )
2024-05-15 10:02:31 -04:00
xkszltl
92544cb8f3
Missing `Optional` in typing. ( #30821 )
...
The function checks for None in its first line.
2024-05-15 15:00:43 +01:00
amyeroberts
64c06df325
Jamba - Skip 4d custom attention mask test ( #30826 )
...
* Jamba - Skip 4d custom attention mask test
* Skip assistant greedy test
2024-05-15 13:57:28 +01:00
Lysandre Debut
a42844955f
Loading GGUF files support ( #30391 )
...
* Adds support for loading GGUF files
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
Co-authored-by: 99991 <99991@users.noreply.github.com>
* add q2_k q3_k q5_k support from @99991
* fix tests
* Update doc
* Style
* Docs
* fix CI
* Update docs/source/en/gguf.md
* Update docs/source/en/gguf.md
* Compute merges
* change logic
* add comment for clarity
* add comment for clarity
* Update src/transformers/models/auto/tokenization_auto.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* change logic
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* change
* Apply suggestions from code review
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/modeling_gguf_pytorch_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* put back comment
* add comment about mistral
* comments and added tests
* fix unconsistent type
* more
* fix tokenizer
* Update src/transformers/modeling_utils.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* address comments about tests and tokenizer + add added_tokens
* from_gguf -> gguf_file
* replace on docs too
---------
Co-authored-by: Younes Belkada <younesbelkada@gmail.com>
Co-authored-by: 99991 <99991@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-15 14:28:20 +02:00
Raushan Turganbay
bd9f4d7951
Add Video Llava ( #29733 )
...
* add model draft
* update docstring
* add tests
* support image and video as input
* update for better handling of mixed input and clean-up a bit
* bug when mixed inputs & add tests
* Update README.md
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
* Merge remote-tracking branch 'upstream/main' into video_llava
* link to abstract of paper in README
* fix test
* fix-copies
* make tests happy
* skip docstest for now
* do not run doctest for now
* Update src/transformers/models/video_llava/processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* address review comments
* failing tests
* Fix vocab_size in common tests for VLMs
* codestyle
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/video_llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/video_llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/image_processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update docs/source/en/model_doc/video_llava.md
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/processing_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update tests/models/video_llava/test_modeling_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* PR suggestions
* fix-copies
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* Update src/transformers/models/video_llava/configuration_video_llava.py
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
* add full example in docs
* clean-up with new model-id
* [run-slow] video_llava
* update docstring
* [run-slow] video_llava
* remove all achive maps
* fix some tests
* test was supposed to be skipped for llava :)
---------
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-05-15 16:42:29 +05:00
David
b8aee2e918
Remove unused module DETR based models ( #30823 )
...
* removing heads for classification from DETR models.
* quality fix
2024-05-15 11:19:43 +01:00
Ondřej Cífka
be3aa43e5f
Support mixed-language batches in `WhisperGenerationMixin` ( #29688 )
...
* Add support for mixing languages in a single batch
* Update docstring
* Enable different detected languages in batch
* Do not require input_features
* Test list of languages
* Fix comment
* Make init_tokens length-1 if possible, broadcast at the end
* Test for ValueError with language list of incorrect length
* Slow test for batched multilingual transcription
* fixup
* Apply suggestions from code review
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
* Address review, refactor
* Second attempt to move this line where it was originally
* Split test, fix a bug
---------
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
2024-05-15 09:53:17 +02:00
Jacky Lee
37543bad3c
Add missing dependencies in image classification example ( #30820 )
...
fix: missing dependencies
2024-05-15 08:38:30 +02:00
Jacky Lee
99e16120ab
Add support for custom checkpoints in MusicGen ( #30011 )
...
* feat: support custom checkpoint
* update: revert changes and add TODO
* update: docs and exception handling
* fix: ah, extra space
2024-05-15 08:30:33 +02:00
Pablo Montalvo
1360801a69
Add PaliGemma ( #30814 )
...
* add new model like
* add state dict slicing + new model config
* update palma config and weights, passes vision activations
* fix
* update
* reorder loading/unpacking
* clean up
* add debug statements
* change device
* fix
* debugging
* fix noncausal mask
* fixup sdpa + causal mask
* fix activation function
* remove debug before changing modeling file
* add variants
* debug attention mask in generate
* revert to non-debug sdpa
* revert gemma modifications
* add custom language modeling
* use Processor
* add language modeling file to init
* try thin wrapper around generate
* Update
* update mask
* breakpoints galore
* remove conflict
* switch to left-padding
* add incomplete model doc
* add paligemma global files
* batch rename paligemma
* make generation match outputs and captioning
* style
* style
* remove copied from + doc
* remove more copied from
* remove copy from projector
* minor fix
* update config and style
* add readme - dummy
* CORRECT image captioning
* moving to args
* add siglip proper + fix merging image + text features
* take update_causal_mask from upstream
* remove breakpoint
* leverage AutoModel
* fix input_ids slicing
* make siglip head conditional
* remove encoder_decoder value
* remove unneeded modeling file
* add commented 4d attention mask
* FIXED generation with 4D mask
* Update src/transformers/models/siglip/modeling_siglip.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* fix left padding detection
* shuffle order of verifications
* fix missing labels for training
* fix
* vectorize merging of features, improve slicing
* improve testing before conversion
* handle merging in processor
* image token index depends on checkpoint
* add variants, save processor too
* save processors, base tokenizer off spm file
* expand model embeddings due to additional image token
* pass image processing args
* add convert rgb to siglip processor
* add \n token separately
* fix tokenizer and prompts
* fix docstrings
* change to camel
* fix casing
* debug pos_ids and sdpa
* pass and use cache_position
* add flag for newline tokenization
* Update src/transformers/models/paligemma/processing_paligemma.py
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
* simplify conversion script
* add copied from
* add precision to conversion script
* Update src/transformers/models/paligemma/modeling_paligemma.py
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
* clean up
* Shift attention mask from `1:`
After discussion with @molbap
* add docs, fix quality
* quality, tied weights inheritance, and logits/label alignment
* fix more tests
* pass attn_implementation to language model correctly
* add SiglipVisionTransformer to no split modules
* skip paligemma test for sdpa dispatch to flash
* skip incompatible tests
* quality
* [broken archive maps]
* Apply suggestions
- remove archive lists
- style
- take shape of inputs_embeds for batch
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/utils/dummy_pt_objects.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* simplify conversion script
* add suggestions
* add suggestions
* add copied from
* fix
* move labels out
* revert
* fix
* remove placeholder labels if None
* use cache_position
* fix quality + docstrings
* fix quality
* fix paligemma 4d gemma mask incompatibility
* fix config docstring
* fix query and attn_mask dtype
---------
Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Merve Noyan <merveenoyan@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
2024-05-14 22:07:15 +02:00
Ankur Singh
c96aca3a8d
Added the necessay import of module ( #30804 )
2024-05-14 18:45:06 +01:00
Yikang Shen
ccdabc5642
Add JetMoE model ( #30005 )
...
* init jetmoe code
* update archive maps
* remove flax import
* fix import error
* update README
* ruff fix
* update readme
* fix
* update config
* fix issue
* merge files
* fix model bug
* fix test
* auto fix
* model size
* add comments
* fix form
* add flash attention support
* fix attention head number
* fix init
* fix support list
* sort auto mapping
* fix test
* fix docs
* update test
* fix test
* fix test
* change variable name
* fix config
* fix init
* update format
* clean code
* fix config
* fix config
* change default config
* update config
* fix issues
* update formate
* update config argument
* update format
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* change to mixtral aux loss
* change to cache_position
* debug
* fix bugs
* debug
* fix format
* fix format
* fix copy
* fix format
* fix format
* fix sort
* fix sort
* fix sort
* add copy comment
* add copy from
* remove debug code
* revert readme update
* add copy
* debug
* remove debug code
* fix flash attention
* add comments
* clean code
* clean format
* fix format
* fix format
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
* change variable name
* add copied from
* fix variable name
* remove deprecated functinos
* sync to llama implementation
* fix format
* fix copy
* fix format
* update format
* remove repr
* add comment for moe weight
* fix copy
* Update src/transformers/models/jetmoe/configuration_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* add comments and reformat config
* fix format
* fix format
* fix format
* update test
* update doc string in config
* Update src/transformers/models/jetmoe/modeling_jetmoe.py
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
* update config doc
* update attention cache
* fix format
* fix copy
---------
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-05-14 16:32:01 +02:00
Masahiro Suzuki
d84f34ad77
[T5] Adding `model_parallel = False` to `T5ForTokenClassification` and `MT5ForTokenClassification` ( #30763 )
...
* Adding model_parallel = False
* Revert "Adding model_parallel = False"
This reverts commit ba1d99976a
.
* Trainer: circumvent error for model in which is_parallelizable is True but does not have model_parallel attribute
2024-05-14 14:39:25 +01:00
Matt
9ef3884046
Deprecate TF weight conversion since we have full Safetensors support now ( #30786 )
2024-05-14 13:48:17 +01:00