transformers

Commit Graph

Author	SHA1	Message	Date
Xuan-Phi Nguyen	5ca085b882	Better llava next. (#29850 ) * Better llava next. - Batched forward with multiple image of different sizes (number of patches). - Support training, for cases without any image. - Support multi-image in same sequence. e.g: ["<image> <image> the first image is a dog while the second is a cat", "<image> <image> <image> <image> these 4 image are..."] Current limitation: - Haven't done testing - Only support right padding (for training) - left padding (batched generation) is not ready yet. - PR not ready. * fix bugs in batched generation * add tests * fix batch-gen bugs, left-padding positions and incorrect attention mask * remove better modeling llava * fix formatting * fix test * fix test * fix testing * fix test * fix formatting * Update src/transformers/models/llava_next/modeling_llava_next.py add clarity Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update modeling_llava_next.py remove assert * fix bug modeling_llava_next.py * update modeling * fix bugs * fix format * fix error * fix new_token_positions * Update modeling_llava_next.py * update formatting * add args * removecomments * add slow tests for batched inference * failing tf/flax tests * this one ic correct * Update src/transformers/models/llava_next/modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix docs * make fixup * more fixup * add test for batch equivalence * Update tests/models/llava_next/test_modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/image_processing_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/image_processing_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/llava_next/modeling_llava_next.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * pr comments * hardcode padding side for bs=1 * update * [run-slow] llava_next * [run-slow] llava_next * make fix-copies --------- Co-authored-by: NGUYEN, Xuan Phi <x.nguyen@alibaba-inc.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: raushan <raushan@huggingface.co> Co-authored-by: Raushan Turganbay <raushan.turganbay@alumni.nu.edu.kz>	2024-05-15 19:02:56 +05:00
Sourab Mangrulkar	bdfefbadaf	Update ds_config_zero3.json (#30829 )	2024-05-15 10:02:31 -04:00
xkszltl	92544cb8f3	Missing `Optional` in typing. (#30821 ) The function checks for None in its first line.	2024-05-15 15:00:43 +01:00
amyeroberts	64c06df325	Jamba - Skip 4d custom attention mask test (#30826 ) * Jamba - Skip 4d custom attention mask test * Skip assistant greedy test	2024-05-15 13:57:28 +01:00
Lysandre Debut	a42844955f	Loading GGUF files support (#30391 ) * Adds support for loading GGUF files Co-authored-by: Younes Belkada <younesbelkada@gmail.com> Co-authored-by: 99991 <99991@users.noreply.github.com> * add q2_k q3_k q5_k support from @99991 * fix tests * Update doc * Style * Docs * fix CI * Update docs/source/en/gguf.md * Update docs/source/en/gguf.md * Compute merges * change logic * add comment for clarity * add comment for clarity * Update src/transformers/models/auto/tokenization_auto.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change logic * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * change * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/modeling_gguf_pytorch_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * put back comment * add comment about mistral * comments and added tests * fix unconsistent type * more * fix tokenizer * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address comments about tests and tokenizer + add added_tokens * from_gguf -> gguf_file * replace on docs too --------- Co-authored-by: Younes Belkada <younesbelkada@gmail.com> Co-authored-by: 99991 <99991@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-15 14:28:20 +02:00
Raushan Turganbay	bd9f4d7951	Add Video Llava (#29733 ) * add model draft * update docstring * add tests * support image and video as input * update for better handling of mixed input and clean-up a bit * bug when mixed inputs & add tests * Update README.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Merge remote-tracking branch 'upstream/main' into video_llava * link to abstract of paper in README * fix test * fix-copies * make tests happy * skip docstest for now * do not run doctest for now * Update src/transformers/models/video_llava/processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/video_llava/test_modeling_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * address review comments * failing tests * Fix vocab_size in common tests for VLMs * codestyle * Update src/transformers/models/video_llava/configuration_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/configuration_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/modeling_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/modeling_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/video_llava.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/video_llava.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/image_processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/video_llava.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/processing_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/video_llava/test_modeling_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/video_llava/test_modeling_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/video_llava/test_modeling_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * PR suggestions * fix-copies * Update src/transformers/models/video_llava/configuration_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/video_llava/configuration_video_llava.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * add full example in docs * clean-up with new model-id * [run-slow] video_llava * update docstring * [run-slow] video_llava * remove all achive maps * fix some tests * test was supposed to be skipped for llava :) --------- Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-15 16:42:29 +05:00
David	b8aee2e918	Remove unused module DETR based models (#30823 ) * removing heads for classification from DETR models. * quality fix	2024-05-15 11:19:43 +01:00
Ondřej Cífka	be3aa43e5f	Support mixed-language batches in `WhisperGenerationMixin` (#29688 ) * Add support for mixing languages in a single batch * Update docstring * Enable different detected languages in batch * Do not require input_features * Test list of languages * Fix comment * Make init_tokens length-1 if possible, broadcast at the end * Test for ValueError with language list of incorrect length * Slow test for batched multilingual transcription * fixup * Apply suggestions from code review Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com> * Address review, refactor * Second attempt to move this line where it was originally * Split test, fix a bug --------- Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>	2024-05-15 09:53:17 +02:00
Jacky Lee	37543bad3c	Add missing dependencies in image classification example (#30820 ) fix: missing dependencies	2024-05-15 08:38:30 +02:00
Jacky Lee	99e16120ab	Add support for custom checkpoints in MusicGen (#30011 ) * feat: support custom checkpoint * update: revert changes and add TODO * update: docs and exception handling * fix: ah, extra space	2024-05-15 08:30:33 +02:00
Pablo Montalvo	1360801a69	Add PaliGemma (#30814 ) * add new model like * add state dict slicing + new model config * update palma config and weights, passes vision activations * fix * update * reorder loading/unpacking * clean up * add debug statements * change device * fix * debugging * fix noncausal mask * fixup sdpa + causal mask * fix activation function * remove debug before changing modeling file * add variants * debug attention mask in generate * revert to non-debug sdpa * revert gemma modifications * add custom language modeling * use Processor * add language modeling file to init * try thin wrapper around generate * Update * update mask * breakpoints galore * remove conflict * switch to left-padding * add incomplete model doc * add paligemma global files * batch rename paligemma * make generation match outputs and captioning * style * style * remove copied from + doc * remove more copied from * remove copy from projector * minor fix * update config and style * add readme - dummy * CORRECT image captioning * moving to args * add siglip proper + fix merging image + text features * take update_causal_mask from upstream * remove breakpoint * leverage AutoModel * fix input_ids slicing * make siglip head conditional * remove encoder_decoder value * remove unneeded modeling file * add commented 4d attention mask * FIXED generation with 4D mask * Update src/transformers/models/siglip/modeling_siglip.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix left padding detection * shuffle order of verifications * fix missing labels for training * fix * vectorize merging of features, improve slicing * improve testing before conversion * handle merging in processor * image token index depends on checkpoint * add variants, save processor too * save processors, base tokenizer off spm file * expand model embeddings due to additional image token * pass image processing args * add convert rgb to siglip processor * add \n token separately * fix tokenizer and prompts * fix docstrings * change to camel * fix casing * debug pos_ids and sdpa * pass and use cache_position * add flag for newline tokenization * Update src/transformers/models/paligemma/processing_paligemma.py Co-authored-by: Merve Noyan <merveenoyan@gmail.com> * simplify conversion script * add copied from * add precision to conversion script * Update src/transformers/models/paligemma/modeling_paligemma.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * clean up * Shift attention mask from `1:` After discussion with @molbap * add docs, fix quality * quality, tied weights inheritance, and logits/label alignment * fix more tests * pass attn_implementation to language model correctly * add SiglipVisionTransformer to no split modules * skip paligemma test for sdpa dispatch to flash * skip incompatible tests * quality * [broken archive maps] * Apply suggestions - remove archive lists - style - take shape of inputs_embeds for batch Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/utils/dummy_pt_objects.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * simplify conversion script * add suggestions * add suggestions * add copied from * fix * move labels out * revert * fix * remove placeholder labels if None * use cache_position * fix quality + docstrings * fix quality * fix paligemma 4d gemma mask incompatibility * fix config docstring * fix query and attn_mask dtype --------- Co-authored-by: ArthurZucker <arthur.zucker@gmail.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Merve Noyan <merveenoyan@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>	2024-05-14 22:07:15 +02:00
Ankur Singh	c96aca3a8d	Added the necessay import of module (#30804 )	2024-05-14 18:45:06 +01:00
Yikang Shen	ccdabc5642	Add JetMoE model (#30005 ) * init jetmoe code * update archive maps * remove flax import * fix import error * update README * ruff fix * update readme * fix * update config * fix issue * merge files * fix model bug * fix test * auto fix * model size * add comments * fix form * add flash attention support * fix attention head number * fix init * fix support list * sort auto mapping * fix test * fix docs * update test * fix test * fix test * change variable name * fix config * fix init * update format * clean code * fix config * fix config * change default config * update config * fix issues * update formate * update config argument * update format * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * change to mixtral aux loss * change to cache_position * debug * fix bugs * debug * fix format * fix format * fix copy * fix format * fix format * fix sort * fix sort * fix sort * add copy comment * add copy from * remove debug code * revert readme update * add copy * debug * remove debug code * fix flash attention * add comments * clean code * clean format * fix format * fix format * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com> * change variable name * add copied from * fix variable name * remove deprecated functinos * sync to llama implementation * fix format * fix copy * fix format * update format * remove repr * add comment for moe weight * fix copy * Update src/transformers/models/jetmoe/configuration_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * add comments and reformat config * fix format * fix format * fix format * update test * update doc string in config * Update src/transformers/models/jetmoe/modeling_jetmoe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * update config doc * update attention cache * fix format * fix copy --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>	2024-05-14 16:32:01 +02:00
Masahiro Suzuki	d84f34ad77	[T5] Adding `model_parallel = False` to `T5ForTokenClassification` and `MT5ForTokenClassification` (#30763 ) * Adding model_parallel = False * Revert "Adding model_parallel = False" This reverts commit `ba1d99976a`. * Trainer: circumvent error for model in which is_parallelizable is True but does not have model_parallel attribute	2024-05-14 14:39:25 +01:00
Matt	9ef3884046	Deprecate TF weight conversion since we have full Safetensors support now (#30786 )	2024-05-14 13:48:17 +01:00
Joao Gante	d8f8a9cd61	CI: more models wo cache support (#30780 )	2024-05-14 10:43:03 +01:00
Raushan Turganbay	5ad960f1f4	Add Watermarking LogitsProcessor and WatermarkDetector (#29676 ) * add watermarking processor * remove the other hashing (context width=1 always) * make style * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * update watermarking process * add detector * update tests to use detector * fix failing tests * rename `input_seq` * make style * doc for processor * minor fixes * docs * make quality * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/logits_process.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * add PR suggestions * let's use lru_cache's default max size (128) * import processor if torch available * maybe like this * lets move the config to torch independet file * add docs * tiny docs fix to make the test happy * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/watermarking.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * PR suggestions * add docs * fix test * fix docs * address pr comments * style * Revert "style" This reverts commit `7f33cc34ff`. * correct style * make doctest green --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>	2024-05-14 13:31:39 +05:00
Pashmina Cameron	65ea1904ff	PEFT: Access active_adapters as a property in Trainer (#30790 ) Access active_adapters as a property	2024-05-14 09:31:18 +01:00
Raushan Turganbay	c02d302e6b	Fix cache type in Idefics2 (#30729 ) standardize cache in idefics2	2024-05-14 13:30:53 +05:00
Jacky Lee	449894d2e5	Fix OWLv2 Doc (#30794 ) fix: owlv2 doc	2024-05-14 08:36:11 +02:00
fxmarty	37bba2a32d	CI: update to ROCm 6.0.2 and test MI300 (#30266 ) * update to ROCm 6.0.2 and test MI300 * add callers for mi300 * update dockerfile * fix trainer tests * remove apex * style * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * Update tests/trainer/test_trainer_seq2seq.py * update to torch 2.3 * add workflow dispatch target * we may need branches: mi300-ci after all * nit * fix docker build * nit * add check runner * remove docker-gpu * fix issues * fix --------- Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>	2024-05-13 18:14:36 +02:00
Marc Sun	539ed75d50	skip low_cpu_mem_usage tests (#30782 )	2024-05-13 18:00:43 +02:00
amyeroberts	0f8fefd481	Deprecate models script (#30184 ) * Add utility for finding candidate models for deprecation * Update model init * Make into configurable script * Fix path * Add sorting of base object alphabetically * Tidy * Refactor __init__ alpha ordering * Update script with logging * fix import * Fix logger * Fix logger * Get config file before moving files * Take models from CLI * Split models into lines to make easier to feed to deprecate_models script * Update * Use posix path * Print instead * Add example in module docstring * Fix up * Add clarifying comments; add models to DEPRECATE_MODELS * Address PR comments * Don't update relative paths on the same level	2024-05-13 16:30:55 +01:00
Yih-Dar	82c1625ec3	Save other CI jobs' result (torch/tf pipeline, example, deepspeed etc) (#30699 ) * update * update * update * update * update * update * update * update * Update utils/notification_service.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-13 17:27:44 +02:00
Joao Gante	2e27291ce4	Generate: assistant should be greedy in assisted decoding (#30778 ) * assistant should be greedy * better comment * Update src/transformers/generation/candidate_generator.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-13 16:08:45 +01:00
Alazar	94306352f4	Port IDEFICS to tensorflow (#26870 ) * Initial commit * Just a copy of modeling_idefics.py that will be ported to TF * - Prepend TF to the name of all classes - Convert pytorch ops to TF (not all operations are converted yet) * Add TF imports * Add autotranslated files * Add TF classes to model_tf_auto.py * Add the TF classes in model_doc * include auto-translated code * Adopted from auto-translated version * Add a forgotten super().build * Add test code for TF version. * Fix indentation and load pytorch weights for now * Some fixes. Many tests are still failing but some are passing now. - I have added TODO's for some of the hacks I made to unblock me and I will address them soon - I have the processing_idefics.py hacked in my view to support TF temporarily * Add ALL_LAYERNORM_LAYERS to match pytorch * Revert "Add ALL_LAYERNORM_LAYERS to match pytorch" This reverts commit 7e0a35119b4d7a6284d04d8c543fba1b29e573c9 as it is not needed in the tf implementation. * Fix freeze_relevant_params() * Some more fixes * Fix test_attention_outputs * Add tf stuff to processing_idefics.py processing_idefics.py supports both pytorch and tf now. test_processor_idefics.py for pytorch is passing, so i didn't break anything but still some issues with tf. I also need to add tf tests in test_processor_idefics.py. * Pass return_tensors to image processing code and fix test * Pass return_tensors to the image processor __init__ * Fix several test cases - Make input to some of the forward pass of type `TFModelInputType` - Decorate main layer forward pass with `@unpack_inputs` - Decorate main layer with `@keras_serializable` - Pass `inputs` to TFIdeficsModel * Some more fixes forgotten in last commit * Fix processing code and vision_tf.py * Fix perceiver bug * Import from * Auto-add build() methods + style pass * Fix build() errors due to `None` being passed as shape to some layers * Change name in TFIdeficsForVisionText2Text to attribute in IdeficsForVisionText2Text * Fix pytorch weights load for tf2 There were a lot of `name=` missing in weight initialization code. * Attempt to fix CI * Add back accidently removed line * Remove torch-specific stuff from the TF test file * make fix-copies, make style, remove autotranslated files * Fixes to imports/docstrings * Let's try the from future import in desperation * Fix the core random_attention_mask fn to match the torch/flax behaviour * Clean random_attention_mask up correctly * Remove torch-only test * Fix loss shape, couple of nits * make style * Don't test for OOB embeddings because IDEFICS uses those deliberately * Fix loss computation to handle masking * Fix test failures when flattening * Fix some test failures - Add cross attention gate which was missing and wasn't being passed arround - Fix overwriting of image_attention_mask due to hack I had for dummy inputs * Add a proper stateless scaled_dot_product_attention * make style * Adding missing attribute from the PyTorch version * Small cleanups to decoupledlinearlayer in case that helps * Pass epsilon to LayerNormalization * Attemp to fix pytorch weight cross-loading for TFIdeficsEmbedding * Fix a bug in TFIdeficsGatedCrossAttentionLayer * Patching up build() methods * Constant self.inv_freq * Constant self.inv_freq * First working version The TF implementation works now, there was a bug in the TFIdeficsDecoupledLinear where the weights were mis-intialized (in_features,out_features) when it should be: (out_features, in_features) I have tested this so far with tiny-random and idefics-9b-instruct and gives correct output. I also dumped the final outputs for both pytorch and TF and they are identical. * Fix some test failures * remove print statement * Fix return_tensors * Fix CI test failure check_code_quality * Attempt to fix CI failures by running `make fixup` The hardcoded IDs in test_modeling_tf_idefics.py are for the integration test and makes that file unreadable and should probably be moved to a seperate file. * Attempt to fix tests_pr_documentation_tests * Fix a test failure in test_image_processing_idefics.py * Fix test test_pt_tf_model_equivalence * Fix a few failures * Tiny fix * Some minor fixes * Remove a duplicate test * Override a few test failures for IDEFICS - `test_keras_save_load` is passing now - `test_compile_tf_model` is still failing * Fix processing_idefics.py after rebase * Guard import keras with is_tf_available * fix check code quality * fix check code quality * Minor fixes * Skip test_save_load temporarily This test passed on my local box but fails on the CI, skipping for now to see if there are other remaining failures on the CI. * Run `ruff format tests src utils` * Fix last failing test, `test_compile_tf_model` * Add fixes for vision_tf.py I forgot to add this file in last commit. * Minor fixes * Replace "<<<" with "<<" for doc tests IDEFICS-9B is too big for doctest runner, so don't run it there * Make code more readable * Fix bug after code review I added a layer_norm_eps to IdeficsConfig but I don't even need it since the vision config has a layer_norm_eps. * Fix after code review Use original code tokenizer.convert_tokens_to_ids * Keep PyTorch as the default return_tensors * Fixes to modeling_tf after code review * Fixes from code review - Remove all references of `TF_IDEFICS_PRETRAINED_MODEL_ARCHIVE_LIST` - Pass 1e-5 to LayerNormalization in perceiver * Run ruff * Undo a change * Refactor processing code after Matt's suggestion * Remove TODO's that aren't needed anymore * For pytorch, Use original pytorch processing code from main Since this PR is a TF port it shouldn't make any modifications to pytorch IDEFICS code. This changes undo's the pytorch processing modifications I made and uses original code from main. * Update tests/models/idefics/test_modeling_idefics.py * Update tests/models/idefics/test_modeling_tf_idefics.py * Add missing imports for is_pt_tf_cross_test * [DO NOT MERGE]: This is a commit for debugging and will be reverted The cross test `test_pt_tf_model_equivalence` passes locally but fails when running on the CI. This commit is to help debug that and will be reverted. * Revert "[DO NOT MERGE]: This is a commit for debugging and will be reverted" This reverts commit 8f0d709ec5bd46685fb0b4259d914ffee794875b. * [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted * [DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted * Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted" This reverts commit 998cc38b8c3d313bf5e5eb55a7f5b7b881897b89. * Revert "[DO NOT MERGE]: This commit is for debugging a CI failure and will be reverted" This reverts commit 1c695ac4219c4ae4d39b330b01744dc27deb7dd4. * Don't skip test_save_load IIRC test_save_load was also failing on the CI but not on my local box, it might be easier to debug that on the CI first than the cross tests * Debugging commit, will be reverted * Revert "Debugging commit, will be reverted" This reverts commit 8eafc8e41e20c4e95a3a90834f06a6e9f445e2d5. * Override `test_save_load` and push model to save Maybe this will help me repro this weird bug * pass my repo_id * add endpoint * Pass a temp (write) token just for this CI * Undo last few commits, still pushing to hub for model debugging The issue seems to be with save_pretrained(), when I looked at the model saved from the CI test failure it is basically empty and has no weights. `self.save_weights(..)` seems to be failing in save_pretrained but needs more debugging * Add logging to modeling tf utils, will be reverted just for debugging * Debugging, will revert * Revert "Debugging, will revert" This reverts commit 9d0d3075fb7c82d8cde3a5c76bc8f3876c5c55d3. * Revert "Add logging to modeling tf utils, will be reverted just for debugging" This reverts commit 774b6b7b1c17b3ce5d7634ade768f2f686cee617. * Remove `test_save_load` The CI failures are gone after my latest rebase, no idea why but I was still saving the model to my hub on HF and the tf_model.h5 file now has everything. * Run make fix-copies * Run ruff format tests src utils * Debugging commit, will be reverted * Run ruff, also trigger CI run * Run ruff again * Undo debugging commit --------- Co-authored-by: Matt <rocketknight1@gmail.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>	2024-05-13 15:59:46 +01:00
Joao Gante	de2f722172	Generate: remove near-duplicate sample/greedy copy (#30773 )	2024-05-13 15:48:20 +01:00
NielsRogge	ce87dca1d7	[Object detection pipeline] Lower threshold (#30710 ) * Lower threshold * Address comment	2024-05-13 16:47:58 +02:00
Fanli Lin	69d9bca55a	enable Pipeline to get device from model (#30534 ) * check model.device * fix * style fix * move model device * remove print * add comment * fix * add unit test * optimize * change test names and add more cases * Update tests/pipelines/test_pipelines_common.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-13 15:00:39 +01:00
Joao Gante	f4dc26d466	Qwen: incorrect setup flag (#30776 ) qwen does not support the new cache classes	2024-05-13 14:12:58 +01:00
Younes Belkada	f823fec53e	Generation / FIX: Fix multi-device generation (#30746 ) * attempt to fix multi-device generation * fix * final fix * final fix * fix * fix * fix * fix * add joao suggestion * fix	2024-05-13 14:35:45 +02:00
Poedator	a0779b9e19	Llama: fix custom 4D masks, v2 (#30348 ) * 4d mask fixes * Update custom 4D mask logic * test moved to mixin * extra tests 4d mask * upd 4d mask and StaticCache handling * added Mask4DTestHard to mistral tests * post-rebase fixes * test fixes for StaticCache * make fix-copies * upd 1 after #30476 * fix common tests * rm elif attention_mask.dim() == 4: * tests combined, fixed, mixtral supported * bigbird style chg reverted * rm if attention_mask.dim() == 2 * modeling_llama formatting chg --------- Co-authored-by: Joao Gante <joao@huggingface.co>	2024-05-13 13:46:06 +02:00
Eduardo Pacheco	453893ed15	[GroundingDino] Adding ms_deform_attn kernels (#30768 ) * Adding ms_deform_attn kernels to GroundingDino * Pointing to deformable detr kernels	2024-05-13 12:34:45 +01:00
Nilabhra Roy Chowdhury	e52741f601	Support for Falcon2-11B (#30771 ) * remove unrelated changes * remove unrelated changes on phi and stable LM * add: Test for Falcon 10B * fix: formatting * fix: loading the falcon 10B in 8 bit precision using bitsanbytes. * fix: device placement * fix: broken tests. * fix: backwards compatibility for falcon 1B architecture. * chore: updated test. * chore: test_modeling_falcon.py to use the 11B model. * chore: minor edit * chore: formating. --------- Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com> Co-authored-by: ArthurZucker <arthur.zucker@gmail.com>	2024-05-13 13:32:43 +02:00
Zafir Stojanovski	f63d822242	Blip dynamic input resolution (#30722 ) * blip with interpolated pos encoding * feat: Add interpolate_pos_encoding option to other models from `BLIP` family. * include check for textual generated content in tests	2024-05-13 12:20:16 +01:00
Younes Belkada	a4e530e3c8	Workflow: Replace `actions/post-slack` with centrally defined workflow (#30737 ) * Remove commit details * remove old workflow	2024-05-13 12:08:48 +02:00
Marc Sun	de6e0db184	[awq] replace scale when we have GELU (#30074 ) * fix awq test * style * add log * new fix * style * only modifying impacted model in the end * rename function	2024-05-13 11:41:03 +02:00
mobicham	e0c3cee170	hqq - fix weight check in check_quantized_param (#30748 ) * hqq - fix weight check in check_quantized_param * ruff format	2024-05-10 19:29:35 +02:00
Aaron Jimenez	8ce4fefc52	[docs] Update link in es/pipeline_webserver.md (#30745 ) * update link * run make style	2024-05-10 09:29:26 -07:00
Younes Belkada	2d1602aef7	PEFT / Trainer: Make use of `model.active_adapters()` instead of deprecated `model.active_adapter` whenever possible (#30738 ) * Update trainer.py * Update src/transformers/trainer.py * Update src/transformers/trainer.py * Update src/transformers/trainer.py * style * Update src/transformers/trainer.py * Update src/transformers/trainer.py	2024-05-10 15:16:44 +02:00
eigen2017	1c52cb7b3b	mlp_only_layers is more flexible than decoder_sparse_step (#30552 ) * force back to commit ba40a21 and fix workflow errors * match the review suggestions * fix ci errors * fix CI * fix ci, format code * fix ci, ruff format * fix ci, ruff format again * Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/qwen2_moe/configuration_qwen2_moe.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * solve this warning: Default Argument Value is mutable --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-05-10 14:00:46 +02:00
Sungkyun Chang	73fcfb2861	Update llama3.md, fix typo (#30739 ) Update llama3.md fix typo again	2024-05-10 12:40:57 +01:00
Aaron Jimenez	47735f5f0f	[docs] Update es/pipeline_tutorial.md (#30684 ) * copy en/ contect to es/ * translate first section * translate the doc * fix typos * run make style	2024-05-09 16:42:01 -07:00
Omar Sanseviero	c99d88e520	Update CodeLlama references (#30218 ) * Update CodeLlama references * Update slow_documentation_tests.txt * Update slow_documentation_tests.txt	2024-05-09 22:57:52 +02:00
Joao Gante	7130a22db9	Generate: consistently handle special tokens as tensors (#30624 ) * tmp commit * [test_all] mvp * missing not * [test_all] final test fixes * fix musicgen_melody and rag * [test_all] empty commit * PR comments * Update src/transformers/generation/utils.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>	2024-05-09 18:01:57 +01:00
Raushan Turganbay	5413b8986d	KV cache is no longer a model attribute (#30730 ) kv_cache is no longer a model attribute	2024-05-09 17:59:29 +01:00
Jacky Lee	218f44135f	Fix image post-processing for OWLv2 (#30686 ) * feat: add note about owlv2 * fix: post processing coordinates * remove: workaround document * fix: extra quotes * update: owlv2 docstrings * fix: copies check * feat: add unit test for resize * Update tests/models/owlv2/test_image_processor_owlv2.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-09 17:02:03 +01:00
Joao Gante	df53c6e5d9	Generate: add `min_p` sampling (#30639 ) * min_p * more relaxed test to avoid numerical issues * Update src/transformers/generation/logits_process.py Co-authored-by: menhguin <minh1228@gmail.com> * Update src/transformers/generation/configuration_utils.py Co-authored-by: menhguin <minh1228@gmail.com> * docstring clarifications * PR comments * Update tests/generation/test_logits_process.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * make fixup --------- Co-authored-by: menhguin <minh1228@gmail.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-09 14:36:53 +01:00
Lysandre Debut	297b732bdf	Removal of deprecated maps (#30576 ) * [test_all] Remove all imports Remove remaining ARCHIVE MAPS Remove remaining PRETRAINED maps * review comments * [test_all] empty commit to trigger tests	2024-05-09 14:15:56 +02:00
Jacky Lee	8c5b3c19cf	Enable dynamic resolution for vivit (#30630 ) * feat: enable dynamic resolution for vivit * fix: formatting * remove: print statement for testing * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/vivit/test_modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/vivit/test_modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/vivit/test_modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/vivit/modeling_vivit.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix: style check --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>	2024-05-09 11:23:39 +01:00

... 2 3 4 5 6 ...

16035 Commits All Branches Search

16035 Commits

All Branches