Commit Graph

15180 Commits

Author SHA1 Message Date
Sanchit Gandhi 2a9b1f80c4
[Gemma] Fix eager attention (#29187)
* fix modelling code

* add tests

* fix tests

* add some logit tests

* style

* fix fix
2024-02-22 01:07:52 +01:00
Andrei Panferov fc37f38915
Add training version check for AQLM quantizer. (#29142)
* training version check

* warn old aqlm

* aqlm 1.0.2 real

* docs
2024-02-21 17:09:36 +01:00
Younes Belkada ae49b218c3
FIX [`Gemma`] Fix bad rebase with transformers main (#29170)
fix bad rebase
2024-02-21 14:56:34 +01:00
Arthur 594c1277b2
[ `gemma`] Adds support for Gemma 💎 (#29167)
* inital commit

* update

* update conversion checkpoint

* update conversion script

* nits

* some fixes

* nits

* merge

* fix permute

* nits

* fix

* nits

* nits

* nits

* fix rope

* fix both rope

* nites

* style

* make sure flax works

* fix flax init code

* fix foward

* nits

* print flax generation out

* current code

* nits

* SIIIIIIIIIIIIIIIIIII

* update

* add new tokenizer

* correct fast tokenizer

* fix conversion

* more comments

* fix modeling and conversion

* nits and nits

* nits testing

* add some tokenization tests

* add some edge cases

* add slow tests and fix them

* fixup

* fix copies for modeling

* fix copies

* add 7B slow tests

* fix

* fix

* fix tests

* make tokenizer cis go green

* styling

* last tokenizer nits

* update jax tests

* fix flax for 7b

* add jit testing 🤗

* cleanups

* isolated nit, inv_freq for rotary_emb.inv_freq

* propagate to jax

* Apply suggestions from code review

Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>

* adjust test

* fix conversion script

* change name

* correct file names

* update conversion script

* Fix bos and eos token ids in the model configuration (#3)

* update modelling

* update conversion script

* add static cache for gemma

* fix sdpa generate

* fix batched

* multiple fixes

* fix FA2

* final fix

* Rename a few missing strings and filenames (#4)

* merge with upstream main

* fix copies

* fix copies

* fix fixup

* fix fixup

* fix

* fix

* final tests

* fix fx gemma tests

* fix fx bf16/fp16 tests

* update slow fx tests

* fx slow tests: one logits, one generation

* move jit test standalone

* Apply suggestions from code review

* nits

* tokenizer updates

* more tokenization updates: custom GemmaSentencepieceExtrator

* style

* Update src/transformers/cache_utils.py

* Update src/transformers/models/gemma/__init__.py

* Update tests/models/gemma/test_modeling_flax_gemma.py

* small nits

* style

* update tokenization test

* fix the rotary embedding

* with style

* fix slow tests

* WARNING this commit might be very important for precisions

* Update tests/models/gemma/test_modeling_flax_gemma.py

* Update src/transformers/models/gemma/configuration_gemma.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* Update src/transformers/models/gemma/modeling_flax_gemma.py

Co-authored-by: Lysandre Debut <hi@lysand.re>

* small nits here and there!

* forgotten nit

* remove on the fly computation of inv_freq

* revert previous change, let's be safe and for now re-compute freq cis to make sure it's in float

* Apply suggestions from code review

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update src/transformers/models/gemma/convert_gemma_weights_to_hf.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_flax_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_tokenization_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* Update tests/models/gemma/test_modeling_gemma.py

Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

* nit conversion script link

* fix some tests

* add not doctest and pr doctest

* repo consistency

* fix last CIs 🚀

* update all readmes

---------

Co-authored-by: younesbelkada <younesbelkada@gmail.com>
Co-authored-by: Sanchit Gandhi <93869735+sanchit-gandhi@users.noreply.github.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Co-authored-by: sanchit-gandhi <sanchit@huggingface.co>
Co-authored-by: Lysandre Debut <hi@lysand.re>
2024-02-21 14:21:28 +01:00
amyeroberts 58245ba6fb
[`Maskformer`] safely get backbone config (#29166)
Safe getattr
2024-02-21 13:51:15 +01:00
Ekaterina Aidova 1d0ea7abe0
support SDPA Attention in stablelm (#29106)
* support SDPA Attention in stablelm

* add integration test

* add fallback for output_attentions

* Update src/transformers/models/stablelm/modeling_stablelm.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/models/stablelm/test_modeling_stablelm.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* Update src/transformers/models/stablelm/modeling_stablelm.py

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

* handle non-contiguous states

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-02-21 13:12:49 +01:00
fxmarty cc4a664baa
`torch.compile` compatibility with `generate` + static cache (#29114)
* fix compatibility

* working version

* cleanup

* sanity checks

* more sanity

* working version WITH refactor

* working without API change

* cleanup & tests pass

* more cleaning

* fix test

* fix tests

* Update src/transformers/generation/utils.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* smaller comment

* update comment

* update comment

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
2024-02-21 12:19:30 +01:00
Joao Gante 3994fa5baf
🚨 Llama: update rope scaling to match static cache changes (#29143) 2024-02-21 09:47:41 +00:00
Arthur Zucker 1a77f07f65 v4.39.dev.0 2024-02-21 15:23:22 +09:00
amyeroberts e770f0316d
[`pipeline`] Add pool option to image feature extraction pipeline (#28985)
* Add pool option

* PR comments - error message and exact outputs check
2024-02-20 20:22:08 +00:00
Fernando Pérez-García c47576ca6e
Fix drop path being ignored in DINOv2 (#29147)
Fix drop path not being used
2024-02-20 17:31:59 +00:00
Gustavo Isturiz 3c00b885b9
Added image_captioning version in es and included in toctree file (#29104)
added image_captioning version in es and included in toctree file
2024-02-20 09:13:15 -08:00
Joao Gante 857fd8eaab
Generate: missing generation config eos token setting in encoder-decoder tests (#29146) 2024-02-20 16:17:51 +00:00
Pablo Montalvo 1c81132e80
Raise unused kwargs image processor (#29063)
* draft processor arg capture

* add missing vivit model

* add new common test for image preprocess signature

* fix quality

* fix up

* add back missing validations

* quality

* move info level to warning for unused kwargs
2024-02-20 16:20:20 +01:00
JB (Don) b8b16475d4
[Phi] Add support for sdpa (#29108) 2024-02-20 14:33:12 +01:00
Yih-Dar 7688d8df84
Save (circleci) cache at the end of a job (#29141)
nice job

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-02-20 21:31:36 +08:00
Taylor Jackle Spriggs ee3af60be0
Add support for fine-tuning CLIP-like models using contrastive-image-text example (#29070)
* add support for siglip and chinese-clip model training with contrastive-image-text example

* codebase fixups
2024-02-20 12:08:31 +00:00
amyeroberts 0996a10077
Revert low cpu mem tie weights (#29135)
* Revert "Add tie_weights() to LM heads and set bias in set_output_embeddings() (#28948)"

This reverts commit 725f4ad1cc.

* Revert "Patch to skip failing `test_save_load_low_cpu_mem_usage` tests (#29043)"

This reverts commit 4156f517ce.
2024-02-20 12:06:46 +00:00
Arthur 15cfe38942
[`Core tokenization`] `add_dummy_prefix_space` option to help with latest issues (#28010)
* add add_dummy_prefix_space option to slow

* checking kwargs might be better. Should be there for all spm tokenizer IMO

* nits

* fix copies

* more copied

* nits

* add prefix space

* nit

* nits

* Update src/transformers/convert_slow_tokenizer.py

* fix inti

* revert wrong styling

* fix

* nits

* style

* updates

* make sure we use slow tokenizer for conversion instead of looking for the decoder

* support llama ast well

* update llama tokenizer fast

* nits

* nits nits nits

* update the doc

* update

* update to fix tests

* skip unrelated tailing test

* Update src/transformers/convert_slow_tokenizer.py

* add proper testing

* test decode as well

* more testing

* format

* fix llama test

* Apply suggestions from code review
2024-02-20 12:50:31 +01:00
Younes Belkada efdd436663
FIX [`PEFT` / `Trainer` ] Handle better peft + quantized compiled models (#29055)
* handle peft + compiled models

* add tests

* fixup

* adapt from suggestions

* clarify comment
2024-02-20 12:45:08 +01:00
Arthur 5e95dcabe1
[`cuda kernels`] only compile them when initializing (#29133)
* only compile when needed

* fix mra as well

* fix yoso as well

* update

* rempve comment

* Update src/transformers/models/deformable_detr/modeling_deformable_detr.py

* Update src/transformers/models/deformable_detr/modeling_deformable_detr.py

* opps

* Update src/transformers/models/deta/modeling_deta.py

* nit
2024-02-20 12:38:59 +01:00
Joao Gante a7755d2409
Generate: unset GenerationConfig parameters do not raise warning (#29119) 2024-02-20 11:34:31 +00:00
Joao Gante 7d312ad2e9
Llama: fix batched generation (#29109) 2024-02-20 10:23:17 +00:00
Younes Belkada ff76e7c212
FIX [`bnb` / `tests`] Propagate the changes from #29092 to 4-bit tests (#29122)
* forgot to push the changes for 4bit ..

* trigger CI
2024-02-20 11:11:15 +01:00
Pablo Montalvo 1c9134f004
Abstract image processor arg checks. (#28843)
* abstract image processor arg checks.

* fix signatures and quality

* add validate_ method to rescale-prone processors

* add more validations

* quality

* quality

* fix formatting

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix formatting

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix formatting

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix formatting mishap

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix crop_size compatibility

* fix default mutable arg

* fix segmentation map + image arg validity

* remove segmentation check from arg validation

* fix quality

* fix missing segmap

* protect PILImageResampling type

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* add back segmentation maps check

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-20 11:05:46 +01:00
Younes Belkada f7ef7cec6c
FEAT [`Trainer` / `bnb`]: Add RMSProp from `bitsandbytes` to HF `Trainer` (#29082)
* add RMSProp to Trainer

* revert some change

* Update src/transformers/trainer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-20 02:43:02 +01:00
Erich Schubert a7ff2f23a0
Move misplaced line (#29117)
Move misplaced line, improve code comment
2024-02-20 02:24:48 +01:00
Arthur 9094abe8dc
[`gradient_checkpointing`] default to use it for torch 2.3 (#28538)
* default to use it

* style
2024-02-20 02:23:25 +01:00
Nilesh 49c0b293d2
Fixed nll with label_smoothing to just nll (#28708)
* Fixed nll with label_smoothing to nll

* Resolved conflict by rebase

* Fixed nll with label_smoothing to nll

* Resolved conflict by rebase

* Added label_smoothing to config file

* Fixed nits
2024-02-20 01:52:15 +01:00
Shijie Wu 4f09d0fd88
storing & logging gradient norm in trainer (#27326)
* report grad_norm during training

* support getting grad_norm from deepspeed
2024-02-19 19:07:41 +00:00
Sadra Barikbin a4851d9477
Fix two tiny typos in `pipelines/base.py::Pipeline::_sanitize_parameters()`'s docstring (#29102)
* Update base.py

* Fix a typo
2024-02-19 18:50:28 +00:00
Titus 5ce90f3212
Bnb test fix for different hardwares (#29066)
* generated text on A10G

* generated text in CI

* Apply suggestions from code review

add explanatory comments

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

---------

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
2024-02-19 18:04:44 +00:00
Max Baak 08cd694ef0
ENH: added new output_logits option to generate function (#28667)
output_logits option behaves like output_scores, but returns the raw, unprocessed prediction logit scores,
ie. the values before they undergo logit processing and/or warping. The latter happens by default for the
regular output scores.

It's useful to have the unprocessed logit scores in certain circumstances. For example, unprocessed logit scores
are very useful with causallm models when one wants to determine the probability of a certain answer, e.g.
when asking a question with a yes/no answer. In that case getting the next-token probabilities of both "yes" and
"no" (and/or their relative ratio) is of interest for classification. The reason for getting these _before_ logit
processing and/or warping is b/c a) that can change the probabilities or b) reject the tokens of interest / reduce
the number of tokens to just 1.

For an example use-case see paper TabLLM: Few-shot Classification of Tabular Data with Large Language Models
by Stefan Hegselmann, Alejandro Buendia, Hunter Lang, Monica Agrawal, Xiaoyi Jiang, and David Sontag.
https://arxiv.org/abs/2210.10723

In addition:
- added dedicated unit test: tests/generation/test_utils/test_return_unprocessed_logit_scores
  which tests return of logics with output_logits=True in generation.
- set output_logits=True in all other generation unit tests, that also have output_scores=True.

Implemented @gante's and @amyeroberts review feedback

Co-authored-by: kx79wq <max.baak@ing.com>
2024-02-19 17:34:17 +00:00
NielsRogge 07e3454f03
[Docs] Add resources (#28705)
* Add resource

* Add more resources

* Add resources

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Remove mention

* Remove pipeline tags

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-19 15:22:29 +01:00
Arthur b2724d7b4c
change version (#29097)
* change version

* nuke

* this doesn't make sense

* update some requirements.py

* revert + no main

* nits

* change cache number

* more pin

* revert

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
2024-02-19 22:08:44 +08:00
Jay Zhou 79132d4cfe
Fix a typo in `examples/pytorch/text-classification/run_classification.py` (#29072) 2024-02-19 13:01:15 +00:00
Lysandre Debut 9830858671
Fix the `bert-base-cased` tokenizer configuration test (#29105)
Fix test
2024-02-19 13:23:25 +01:00
Winton Davies 593230f0a1
fix the post-processing link (#29091)
The link in evaluation was missing a hyphen between post and processing. I fixed this, for English only. Someone with the ability to do a global search/replace should fix the other languages (if indeed they have this issue)/
2024-02-19 10:15:58 +00:00
Younes Belkada a75a6c9315
FIX [`bnb` / `tests`]: Fix currently failing bnb tests (#29092)
Update test_mixed_int8.py
2024-02-19 10:39:12 +01:00
Younes Belkada 864c8e6ea3
[`Awq`] Add peft support for AWQ (#28987)
* add peft support for AWQ

* Update src/transformers/quantizers/quantizer_awq.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-19 01:31:39 +01:00
Aaron Jimenez ce4fff0be7
[Docs] Spanish translation of task_summary.md (#28844)
* Add task_summary to es/_toctree.yml

* Add task_summary.md to docs/es

* Change title of task_summary.md

* Translate firsts paragraphs

* Translate middle paragraphs

* Translte the rest of the doc

* Edit firts paragraph
2024-02-16 15:50:06 -08:00
Matt 2f1003be86
Add chat support to text generation pipeline (#28945)
* Add chat support to text generation pipeline

* Better handling of single elements

* Deprecate ConversationalPipeline

* stash commit

* Add missing add_special_tokens kwarg

* Update chat templating docs to refer to TextGenerationPipeline instead of ConversationalPipeline

* Add TF tests

* @require_tf

* Add type hint

* Add specific deprecation version

* Remove unnecessary do_sample

* Remove todo - the discrepancy has been resolved

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/pipelines/text_generation.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-16 16:41:01 +00:00
Zach Mueller 636b03244c
Fix trainer test wrt DeepSpeed + auto_find_bs (#29061)
* FIx trainer test

* Update tests/trainer/test_trainer.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-16 10:04:24 -05:00
Sean (Seok-Won) Yi 161fe425c9
Feature: Option to set the tracking URI for MLflowCallback. (#29032)
* Added option to set tracking URI for MLflowCallback.

* Added option to set tracking URI for MLflowCallback.

* Changed  to  in docstring.
2024-02-16 14:47:18 +00:00
Richard Lee be42c24d14
Honor trust_remote_code for custom tokenizers (#28854)
* pass through trust_remote_code for dynamically loading unregistered tokenizers specified by config
add test

* change directories back to previous directory after test

* fix ruff check

* Add a note to that block for future in case we want to remove it later

---------

Co-authored-by: Matt <rocketknight1@gmail.com>
2024-02-16 13:40:23 +00:00
Sourab Mangrulkar 4c18ddb5cf
`auto_find_batch_size` isn't yet supported with DeepSpeed/FSDP. Raise error accrodingly. (#29058)
Update trainer.py
2024-02-16 18:11:09 +05:30
Sourab Mangrulkar b262808656
fix failing trainer ds tests (#29057) 2024-02-16 17:18:45 +05:30
Jonathan Mamou 258da40efd
fix num_assistant_tokens with heuristic schedule (#28759)
* fix heuristic num_assistant_tokens_schedule

* Update src/transformers/generation/configuration_utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Update utils.py

check that candidate_generator.assistant_model exists since some some speculations (like ngram and PLD) don't have assistant_model attribute

* Update src/transformers/generation/candidate_generator.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* Update tests/generation/test_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* make fixup

* merge conflict

* fix docstring

* make fixup

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-16 11:44:58 +00:00
Tanmay patil 0eb408551c
Support : Leverage Accelerate for object detection/segmentation models (#28312)
* made changes for object detection models

* added support for segmentation models.

* Made changes for segmentaion models

* Changed import statements

* solving conflicts

* removed conflicts

* Resolving commits

* Removed conflicts

* Fix : Pixel_mask_value set to False
2024-02-16 11:38:59 +00:00
Raushan Turganbay aee11fe427
Fix max_length criteria when using inputs_embeds (#28994)
* fix max_length for inputs_embeds

* make style

* Update src/transformers/generation/utils.py

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>

* Static Cache: load models with MQA or GQA (#28975)

* fix

* fix tests

* fix tests

* Update src/transformers/generation/utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* more fixes

* make style

---------

Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
2024-02-16 11:25:12 +00:00