Commit Graph

401 Commits

Author SHA1 Message Date
NielsRogge 6e68597877
Add CANINE (#12024)
* First pass

* More progress

* Add support for local attention

* More improvements

* More improvements

* Conversion script working

* Add CanineTokenizer

* Make style & quality

* First draft of integration test

* Remove decoder test

* Improve tests

* Add documentation

* Mostly docs improvements

* Add CanineTokenizer tests

* Fix most tests on GPU, improve upsampling projection

* Address most comments by @dhgarrette

* Remove decoder logic

* Improve Canine tests, improve docs of CanineConfig

* All tokenizer tests passing

* Make fix-copies and fix tokenizer tests

* Fix test_model_outputs_equivalence test

* Apply suggestions from @sgugger's review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address some more comments

* Add support for hidden_states and attentions of shallow encoders

* Define custom CanineModelOutputWithPooling, tests pass

* First pass

* More progress

* Add support for local attention

* More improvements

* More improvements

* Conversion script working

* Add CanineTokenizer

* Make style & quality

* First draft of integration test

* Remove decoder test

* Improve tests

* Add documentation

* Mostly docs improvements

* Add CanineTokenizer tests

* Fix most tests on GPU, improve upsampling projection

* Address most comments by @dhgarrette

* Remove decoder logic

* Improve Canine tests, improve docs of CanineConfig

* All tokenizer tests passing

* Make fix-copies and fix tokenizer tests

* Fix test_model_outputs_equivalence test

* Apply suggestions from @sgugger's review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address some more comments

* Make conversion script work for Canine-c too

* Fix tokenizer tests

* Remove file

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-30 08:05:44 -04:00
Patrick von Platen ccca510276
Hubert (#11889)
* fix_torch_device_generate_test

* remove @

* add hubert

* add first test file

* more docs

* fix bugs

* fix bug

* finish

* finish

* finish docstring

* fix

* fix

* finalize

* add to ignored

* finish

* Apply suggestions from code review

* correct naming

* finish

* fix auto config

* finish

* correct convert script

* Apply suggestions from code review

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>

* apply suggestions lysandre & suraj

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
2021-06-16 12:14:12 +01:00
Sylvain Gugger 60b1d6b45b
Add course banner (#12157)
* Add course banner

* Update course banner
2021-06-15 09:25:49 -04:00
NielsRogge d3eacbb829
Add DETR (#11653)
* Squash all commits of modeling_detr_v7 branch into one

* Improve docs

* Fix tests

* Style

* Improve docs some more and fix most tests

* Fix slow tests of ViT, DeiT and DETR

* Improve replacement of batch norm

* Restructure timm backbone forward

* Make DetrForSegmentation support any timm backbone

* Fix name of output

* Address most comments by @LysandreJik

* Give better names for variables

* Conditional imports + timm in setup.py

* Address additional comments by @sgugger

* Make style, add require_timm and require_vision to testsé

* Remove train_backbone attribute of DetrConfig, add methods to freeze/unfreeze backbone

* Add png files to fixtures

* Fix type hint

* Add timm to workflows

* Add `BatchNorm2d` to the weight initialization

* Fix retain_grad test

* Replace model checkpoints by Facebook namespace

* Fix name of checkpoint in test

* Add user-friendly message when scipy is not available

* Address most comments by @patrickvonplaten

* Remove return_intermediate_layers attribute of DetrConfig and simplify Joiner

* Better initialization

* Scipy is necessary to get sklearn metrics

* Rename TimmBackbone to DetrTimmConvEncoder and rename DetrJoiner to DetrConvModel

* Make style

* Improve docs and add 2 community notebooks

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-06-09 11:51:13 -04:00
Gunjan Chhablani 88ca6a231d
VisualBERT (#10534)
* Init VisualBERT

* Add cookie-cutter, Config, and Embeddings

* Add preliminary Model

* Add Bert analogous classes

* Add basic code for NLVR, VQA, Flickr

* Update Init

* Fix VisualBert Downstream Models

* Rename classifier to cls

* Comment position_ids buffer

* Remove sentence image predictor output

* Update output dicts

* Remove unnecessary files

* Fix Auto Modeling

* Fix transformers init

* Add conversion script

* Add conversion script

* Fix docs

* Update visualbert modelling

* Update configuration

* Style fixes

* Add model and integration tests

* Add all tests

* Update model mapping

* Add simple detector from original repository

* Update docs and configs

* Fix style

* Fix style

* Update docs

* Fix style

* Fix import issues in style

* Fix style

* Add changes from review

* Fix style

* Fix style

* Update docs

* Fix style

* Fix style

* Update docs/source/model_doc/visual_bert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add changes from review

* Remove convert run script

* Add changes from review

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/visual_bert/modeling_visual_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add changes from review

* Add changes from review

* Add visual embedding example in docs

* Fix "copied from" comments

* Add changes from review

* Fix error, style, checkpoints

* Update docs

* Fix integration tests

* Fix style

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-02 18:13:08 +05:30
Patrick von Platen 47a98fc4cb
ByT5 model (#11971)
* allow tf to use uneven num of layers

* add tokenizer

* finish docs

* finish docs

* Apply suggestions from code review

* include in index

* finish

* Update docs/source/model_doc/byt5.rst

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* apply sylvais suggestions

* make style

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
2021-06-01 19:07:37 +01:00
yujun 206f06f2dd
Add new model RoFormer (use rotary position embedding ) (#11684)
* add roformer

* Update docs/source/model_doc/roformer.rst

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* Update docs/source/model_doc/roformer.rst

Co-authored-by: Suraj Patil <surajp815@gmail.com>

* update

* add TFRoFormerSinusoidalPositionalEmbedding and fix TFMarianSinusoidalPositionalEmbedding

* update docs

* make style and make quality

* roback

* unchanged

* rm copies from , this is a error in TFMarianSinusoidalPositionalEmbedding

* update Copyright year

* move # Add modeling imports here to the correct position

* max_position_embeddings can be set to 1536

* # Copied from transformers.models.bert.modeling_bert.BertOutput with Bert->RoFormer

* # Copied from transformers.models.bert.modeling_bert.BertLayer.__init__ with Bert->RoFormer

* update tokenization_roformer

* make style

* add staticmethod apply_rotary_position_embeddings

* add TF staticmethod apply_rotary_position_embeddings

* update torch apply_rotary_position_embeddings

* fix tf apply_rotary_position_embeddings error

* make style

* add pytorch RoFormerSelfAttentionRotaryPositionEmbeddingTest

* add TF rotary_position_embeddings test

* update test_modeling_rofomer

* Update docs/source/model_doc/roformer.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/roformer/convert_roformer_original_tf_checkpoint_to_pytorch.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/roformer/modeling_roformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/roformer/modeling_roformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/roformer/modeling_tf_roformer.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* refact roformer tokenizer

* add RoFormerTokenizerFast

* add RoFormerTokenizationTest

* add require_jieba

* update Copyright

* update tokenizer & add copy from

* add option rotary_value

* use rust jieba

* use rjieba

* use rust jieba

* fix test_alignement_methods

* slice normalized_string is too slow

* add config.embedding_size when embedding_size!=hidden_size

* fix pickle tokenizer

* Update docs/source/model_doc/roformer.rst

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* make style and make quality

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-05-20 08:00:34 -04:00
Albert Villanova del Moral 2582e59a57
Add DOI badge to README (#11771) 2021-05-19 09:48:56 -04:00
Patrick von Platen cebb96f53a
Add more subsections to main doc (#11758)
* add headers to main doc

* Apply suggestions from code review

* update

* upload
2021-05-18 14:38:56 +01:00
Julien Chaumond 0fc56df5fb
Add visual + link to Premium Support webpage (#11740)
* Update README.md

* Update index.rst
2021-05-17 05:28:56 -04:00
Suraj Patil f063c56d94
Fix clip docs (#11694)
* fix doc url

* fix example
2021-05-12 15:28:30 +05:30
Suraj Patil 8719afa1ad
CLIP (#11445)
* begin second draft

* fix import, style

* add loss

* fix embeds, logits_scale, and projection

* fix imports

* add conversion script

* add feature_extractor and processor

* style

* add tests for tokenizer, extractor and processor

* add vision model tests

* add weight init

* add more tests

* fix save_load  test

* model output, dosstrings, causal mask

* config doc

* add clip model tests

* return dict

* bigin integration test

* add integration tests

* fix-copies

* fix init

* Clip => CLIP

* fix module name

* docs

* fix doc

* output_dim => projection_dim

* fix checkpoint names

* remoe fast tokenizer file

* fix conversion script

* fix tests, quality

* put causal mask on device

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix attribute test

* style

* address sylvains comments

* style

* fix docstrings

* add qucik_gelu in activations, docstrings

* clean-up attention test

* fix act fun

* fix config

* fix torchscript tests

* even batch_size

* remove comment

* fix ouput tu_tuple

* fix save load tests

* fix add tokens test

* add fast tokenizer

* update copyright

* new processor API

* fix docs

* docstrings

* docs

* fix doc

* fix doc

* fix tokenizer

* fix import in doc example

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* check types of config

* valhalla => openai

* load image using url

* fix test

* typo

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-05-12 13:48:15 +05:30
Matt b3429ab678
Grammar and style edits for the frontpage README (#11679)
* Grammar and style edits for the frontpage README

* Going all-in on em-dashes because you only live once

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-05-11 15:49:34 +01:00
Vasudev Gupta dc3f6758cf
Add BigBirdPegasus (#10991)
* init bigbird pegasus

* add debugging nb ; update config

* init conversion

* update conversion script

* complete conversion script

* init forward()

* complete forward()

* add tokenizer

* add some slow tests

* commit current

* fix copies

* add docs

* add conversion script for bigbird-roberta-summarization

* remove TODO

* small fixups

* correct tokenizer

* add bigbird core for now

* fix config

* fix more

* revert pegasus-tokenizer back

* make style

* everything working for pubmed; yayygit status

* complete tests finally

* remove bigbird pegasus tok

* correct tokenizer

* correct tests

* add tokenizer files

* finish make style

* fix test

* update

* make style

* fix tok utils base file

* make fix-copies

* clean a bit

* small update

* fix some suggestions

* add to readme

* fix a bit, clean tests

* fix more tests

* Update src/transformers/__init__.py

* Update src/transformers/__init__.py

* make fix-copies

* complete attn switching, auto-padding left

* make style

* fix auto-padding test

* make style

* fix batched attention tests

* put tolerance at 1e-1 for stand-alone decoder test

* fix docs

* fix tests

* correct slow tokenizer conversion

* Apply suggestions from code review

Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* complete remaining suggestions

* fix test

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-05-07 09:27:43 +02:00
NielsRogge f3cf8ae7b3
Add LUKE (#11223)
* Rebase with master

* Minor bug fix in docs

* Copy files from adding_luke_v2 and improve docs

* change the default value of use_entity_aware_attention to True

* remove word_hidden_states

* fix head models

* fix tests

* fix the conversion script

* add integration tests for the pretrained large model

* improve docstring

* Improve docs, make style

* fix _init_weights for pytorch 1.8

* improve docs

* fix tokenizer to construct entity sequence with [MASK] entity when entities=None

* Make fix-copies

* Make style & quality

* Bug fixes

* Add LukeTokenizer to init

* Address most comments by @patil-suraj and @LysandreJik

* rename _compute_extended_attention_mask to get_extended_attention_mask

* add comments to LukeSelfAttention

* fix the documentation of the tokenizer

* address comments by @patil-suraj, @LysandreJik, and @sgugger

* improve docs

* Make style, quality and fix-copies

* Improve docs

* fix docs

* add "entity_span_classification" task

* update example code for LukeForEntitySpanClassification

* improve docs

* improve docs

* improve the code example in luke.rst

* rename the classification layer in LukeForEntityClassification from typing to classifier

* add bias to the classifier in LukeForEntitySpanClassification

* update docs to use fine-tuned hub models in code examples of the head models

* update the example sentences

* Make style & quality

* Add require_torch to tokenizer tests

* Add require_torch to tokenizer tests

* Address comments by @sgugger and add community notebooks

* Make fix-copies

Co-authored-by: Ikuya Yamada <ikuya@ikuya.net>
2021-05-03 09:07:29 -04:00
Sylvain Gugger 2d27900b5d
Update min versions in README and add Flax (#11472)
* Update min versions in README and add Flax

* Adapt index
2021-04-28 09:10:06 -04:00
NielsRogge 9f1260971f
Add DeiT (PyTorch) (#11056)
* First draft of deit

* More improvements

* Remove DeiTTokenizerFast from init

* Conversion script works

* Add DeiT to ViT conversion script

* Add tests, add head model, add support for deit in vit conversion script

* Update model checkpoint names

* Update image_mean and image_std, set resample to bicubic

* Improve docs

* Docs improvements

* Add DeiTForImageClassificationWithTeacher to init

* Address comments by @sgugger

* Improve feature extractors

* Make fix-copies

* Minor fixes

* Address comments by @patil-suraj

* All models uploaded

* Fix tests

* Remove labels argument from DeiTForImageClassificationWithTeacher

* Fix-copies, style and quality

* Fix tests

* Fix typo

* Multiple docs improvements

* More docs fixes
2021-04-12 18:07:10 -04:00
Kevin Canwen Xu fb41f9f50c
Add a special tokenizer for CPM model (#11068)
* Add a special tokenizer for CPM model

* make style

* fix

* Add docs

* styles

* cpm doc

* fix ci

* fix the overview

* add test

* make style

* typo

* Custom tokenizer flag

* Add REAMDE.md

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-04-10 02:07:47 +08:00
Julien Demouth 02ec02d6d3
Add nvidia megatron models (#10911)
* Add support for NVIDIA Megatron models

* Add support for NVIDIA Megatron GPT2 and BERT

Add the megatron_gpt2 model. That model reuses the existing GPT2 model. This
commit includes a script to convert a Megatron-GPT2 checkpoint downloaded
from NVIDIA GPU Cloud. See examples/megatron-models/README.md for details.

Add the megatron_bert model. That model is implemented as a modification of
the existing BERT model in Transformers. This commit includes a script to
convert a Megatron-BERT checkpoint downloaded from NVIDIA GPU Cloud. See
examples/megatron-models/README.md for details.

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Remove model.half in tests + add "# Copied ..."

Remove the model.half() instruction which makes tests fail on the CPU.

Add a comment "# Copied ..." before many classes in the model to enable automatic
tracking in CI between the new Megatron classes and the original Bert ones.

* Fix issues

* Fix Flax/TF tests

* Fix copyright

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/configuration_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/megatron_bert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/megatron_gpt2.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/__init__.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_gpt2/convert_megatron_gpt2_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/convert_megatron_bert_checkpoint.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/megatron_bert/modeling_megatron_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Resolve most of 'sgugger' comments

* Fix conversion issue + Run make fix-copies/quality/docs

* Apply suggestions from code review

* Causal LM & merge

* Fix init

* Add CausalLM to last auto class

Co-authored-by: Julien Demouth <jdemouth@nvidia.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
2021-04-08 14:09:11 -04:00
NielsRogge 30677dc743
Add Vision Transformer and ViTFeatureExtractor (#10950)
* Squash all commits into one

* Update ViTFeatureExtractor to use image_utils instead of torchvision

* Remove torchvision and add Pillow

* Small docs improvement

* Address most comments by @sgugger

* Fix tests

* Clean up conversion script

* Pooler first draft

* Fix quality

* Improve conversion script

* Make style and quality

* Make fix-copies

* Minor docs improvements

* Should use fix-copies instead of manual handling

* Revert "Should use fix-copies instead of manual handling"

This reverts commit fd4e591bce.

* Place ViT in alphabetical order

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-01 11:16:05 -04:00
Suraj Patil 860264379f
GPT Neo (#10848)
* lets begin

* boom boom

* fix out proj in attn

* fix attention

* fix local attention

* add tokenizer

* fix imports

* autotokenizer

* fix checkpoint name

* cleanup

* more clean-up

* more cleanup

* output attentions

* fix attn mask creation

* fix imports

* config doc

* add tests

* add slow tests

* quality

* add conversion script

* copyright

* typo

* another bites the dust

* fix attention tests

* doc

* add embed init in convert function

* fix copies

* remove tokenizer

* enable caching

* address review comments

* improve config and create attn layer list internally

* more consistent naming

* init hf config from mesh-tf config json file

* remove neo tokenizer from doc

* handle attention_mask in local attn layer

* attn_layers => attention_layers

* add tokenizer_class in config

* fix docstring

* raise if len of attention_layers is not same as num_layers

* remove tokenizer_class from config

* more consistent naming

* fix doc

* fix checkpoint names

* fp16 compat

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-30 09:42:30 -04:00
Vasudev Gupta 6dfd027279
BigBird (#10183)
* init bigbird

* model.__init__ working, conversion script ready, config updated

* add conversion script

* BigBirdEmbeddings working :)

* slightly update conversion script

* BigBirdAttention working :) ; some bug in layer.output.dense

* add debugger-notebook

* forward() working for BigBirdModel :) ; replaced gelu with gelu_fast

* tf code adapted to torch till rand_attn in bigbird_block_sparse_attention ; till now everything working :)

* BigBirdModel working in block-sparse attention mode :)

* add BigBirdForPreTraining

* small fix

* add tokenizer for BigBirdModel

* fix config & hence modeling

* fix base prefix

* init testing

* init tokenizer test

* pos_embed must be absolute, attn_type=original_full when add_cross_attn=True , nsp loss is optional in BigBirdForPreTraining, add assert statements

* remove position_embedding_type arg

* complete normal tests

* add comments to block sparse attention

* add attn_probs for sliding & global tokens

* create fn for block sparse attn mask creation

* add special tests

* restore pos embed arg

* minor fix

* attn probs update

* make big bird fully gpu friendly

* fix tests

* remove pruning

* correct tokenzier & minor fixes

* update conversion script , remove norm_type

* tokenizer-inference test add

* remove extra comments

* add docs

* save intermediate

* finish trivia_qa conversion

* small update to forward

* correct qa and layer

* better error message

* BigBird QA ready

* fix rebased

* add triva-qa debugger notebook

* qa setup

* fixed till embeddings

* some issue in q/k/v_layer

* fix bug in conversion-script

* fixed till self-attn

* qa fixed except layer norm

* add qa end2end test

* fix gradient ckpting ; other qa test

* speed-up big bird a bit

* hub_id=google

* clean up

* make quality

* speed up einsum with bmm

* finish perf improvements for big bird

* remove wav2vec2 tok

* fix tokenizer

* include docs

* correct docs

* add helper to auto pad block size

* make style

* remove fast tokenizer for now

* fix some

* add pad test

* finish

* fix some bugs

* fix another bug

* fix buffer tokens

* fix comment and merge from master

* add comments

* make style

* commit some suggestions

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Fix typos

* fix some more suggestions

* add another patch

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* fix copies

* another path

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* update

* update nit suggestions

* make style

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-03-30 08:51:34 +03:00
Lysandre c988db5af2 Release v4.4.0 2021-03-16 11:33:35 -04:00
Patrick von Platen 602d63f05c
[XLSR-Wav2Vec2] Add multi-lingual Wav2Vec2 models (#10648)
* add conversion script

* add wav2vec2 xslr models

* finish

* Update docs/source/model_doc/xlsr_wav2vec2.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-11 17:44:18 +03:00
Suraj Patil d26b37e744
Speech2TextTransformer (#10175)
* s2t

* fix config

* conversion script

* fix import

* add tokenizer

* fix tok init

* fix tokenizer

* first version working

* fix embeds

* fix lm head

* remove extra heads

* fix convert script

* handle encoder attn mask

* style

* better enc attn mask

* override _prepare_attention_mask_for_generation

* handle attn_maks in encoder and decoder

* input_ids => input_features

* enable use_cache

* remove old code

* expand embeddings if needed

* remove logits bias

* masked_lm_loss => loss

* hack tokenizer to support feature processing

* fix model_input_names

* style

* fix error message

* doc

* remove inputs_embeds

* remove input_embeds

* remove unnecessary docstring

* quality

* SpeechToText => Speech2Text

* style

* remove shared_embeds

* subsample => conv

* remove Speech2TextTransformerDecoderWrapper

* update output_lengths formula

* fix table

* remove max_position_embeddings

* update conversion scripts

* add possibility to do upper case for now

* add FeatureExtractor and Processor

* add tests for extractor

* require_torch_audio => require_torchaudio

* add processor test

* update import

* remove classification head

* attention mask is now 1D

* update docstrings

* attention mask should be of type long

* handle attention mask from generate

* alwyas return attention_mask

* fix test

* style

* doc

* Speech2TextTransformer => Speech2Text

* Speech2TextTransformerConfig => Speech2TextConfig

* remove dummy_inputs

* nit

* style

* multilinguial tok

* fix tokenizer

* add tgt_lang setter

* save lang_codes

* fix tokenizer

* add forced_bos_token_id to tokenizer

* apply review suggestions

* add torchaudio to extra deps

* add speech deps to CI

* fix dep

* add libsndfile to ci

* libsndfile1

* add speech to extras all

* libsndfile1 -> libsndfile1

* libsndfile

* libsndfile1-dev

* apt update

* add sudo to install

* update deps table

* install libsndfile1-dev on CI

* tuple to list

* init conv layer

* add model tests

* quality

* add integration tests

* skip_special_tokens

* add speech_to_text_transformer in toctree

* fix tokenizer

* fix fp16 tests

* add tokenizer tests

* fix copyright

* input_values => input_features

* doc

* add model in readme

* doc

* change checkpoint names

* fix copyright

* fix code example

* add max_model_input_sizes in tokenizer

* fix integration tests

* add do_lower_case to tokenizer

* remove clamp trick

* fix "Add modeling imports here"

* fix copyrights

* fix tests

* SpeechToTextTransformer => SpeechToText

* fix naming

* fix table formatting

* fix typo

* style

* fix typos

* remove speech dep from extras[testing]

* fix copies

* rename doc file,

* put imports under is_torch_available

* run feat extract tests when torch is available

* dummy objects for processor and extractor

* fix imports in tests

* fix import in modeling test

* fxi imports

* fix torch import

* fix imports again

* fix positional embeddings

* fix typo in import

* adapt new extractor refactor

* style

* fix torchscript test

* doc

* doc

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* fix docs, copied from, style

* fix docstring

* handle imports

* remove speech from all extra deps

* remove s2t from seq2seq lm mapping

* better names

* skip training tests

* add install instructions

* List => Tuple

* doc

* fix conversion script

* fix urls

* add instruction for libsndfile

* fix fp16 test

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-03-10 21:42:04 +05:30
Suraj Patil f6e74a63ca
Add m2m100 (#10236)
* m2m_100

* no layernorm_embedding

* sinusoidal positional embeddings

* update pos embeddings

* add default config values

* tokenizer

* add conversion script

* fix config

* fix pos embed

* remove _float_tensor

* update tokenizer

* update lang codes

* handle lang codes

* fix pos embeds

* fix spm key

* put embedding weights on device

* remove qa and seq classification heads

* fix convert script

* lang codes pn one line

* fix embeds

* fix tokenizer

* fix tokenizer

* add fast tokenizer

* style

* M2M100MT => M2M100

* fix copyright, style

* tokenizer converter

* vocab file

* remove fast tokenizer

* fix embeds

* fix tokenizer

* fix tests

* add tokenizer tests

* add integration test

* quality

* fix model name

* fix test

* doc

* doc

* fix doc

* add copied from statements

* fix tokenizer tests

* apply review suggestions

* fix urls

* fix shift_tokens_right

* apply review suggestions

* fix

* fix doc

* add lang code to id

* remove unused function

* update checkpoint names

* fix copy

* fix tokenizer

* fix checkpoint names

* fix merge issue

* style
2021-03-06 22:14:16 +05:30
Lysandre Debut 0c2325198f
Add I-BERT to README (#10462) 2021-03-01 12:12:31 -05:00
Lysandre Debut cd8c4c3fc2
DeBERTa-v2 fixes (#10328)
Co-authored-by: Pengcheng He <penhe@microsoft.com>

Co-authored-by: Pengcheng He <penhe@microsoft.com>
2021-02-22 07:45:18 -05:00
Pengcheng He 9a7e63729f
Integrate DeBERTa v2(the 1.5B model surpassed human performance on Su… (#10018)
* Integrate DeBERTa v2(the 1.5B model surpassed human performance on SuperGLUE); Add DeBERTa v2 900M,1.5B models;

* DeBERTa-v2

* Fix v2 model loading issue (#10129)

* Doc members

* Update src/transformers/models/deberta/modeling_deberta.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Address Sylvain's comments

* Address Patrick's comments

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* Style

Co-authored-by: Lysandre <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-02-19 18:34:44 -05:00
Suraj Patil 6fc940ed09
Add mBART-50 (#10154)
* add tokenizer for mBART-50

* update tokenizers

* make src_lang and tgt_lang optional

* update tokenizer test

* add setter

* update docs

* update conversion script

* update docs

* update conversion script

* update tokenizer

* update test

* update docs

* doc

* address Sylvain's suggestions

* fix test

* fix formatting

* nits
2021-02-15 20:58:54 +05:30
Sylvain Gugger 6710d1d5ef Typo fix 2021-02-11 15:12:35 -05:00
yylun 5442a11f5f
fix steps_in_epoch variable in trainer when using max_steps (#9969)
* fix steps_in_epoch variable when using max_steps

* redundant sentence

* Revert "redundant sentence"

This reverts commit ad5c0e9b6e.

* remove redundant sentence

Co-authored-by: wujindou <wujindou@sogou-inc.com>
2021-02-03 09:30:37 -05:00
Patrick von Platen d6217fb30c
Wav2Vec2 (#9659)
* add raw scaffold

* implement feat extract layers

* make style

* remove +

* correctly convert weights

* make feat extractor work

* make feature extraction proj work

* run forward pass

* finish forward pass

* Succesful decoding example

* remove unused files

* more changes

* add wav2vec tokenizer

* add new structure

* fix run forward

* add other layer norm architecture

* finish 2nd structure

* add model tests

* finish tests for tok and model

* clean-up

* make style

* finish docstring for model and config

* make style

* correct docstring

* correct tests

* change checkpoints to fairseq

* fix examples

* finish wav2vec2

* make style

* apply sylvains suggestions

* apply lysandres suggestions

* change print to log.info

* re-add assert statement

* add input_values as required input name

* finish wav2vec2 tokenizer

* Update tests/test_tokenization_wav2vec2.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* apply sylvains suggestions

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-02-02 15:52:10 +03:00
Stas Bekman 15e4ce353a
[docs] expand install instructions (#9817)
* expand install instructions

* fix

* white space

* rewrite as discussed in the PR

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* change the wording to encourage issue report

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-28 09:36:46 -08:00
Stefan Schweter 5ed5a54684
ADD BORT (#9813)
* tests: add integration tests for new Bort model

* bort: add conversion script from Gluonnlp to Transformers 🚀

* bort: minor cleanup (BORT -> Bort)

* add docs

* make fix-copies

* clean doc a bit

* correct docs

* Update docs/source/model_doc/bort.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/model_doc/bort.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* correct dialogpt doc

* correct link

* Update docs/source/model_doc/bort.rst

* Update docs/source/model_doc/dialogpt.rst

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* make style

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-01-27 21:25:11 +03:00
abhishek thakur f617490e71
ConvBERT Model (#9717)
* finalize convbert

* finalize convbert

* fix

* fix

* fix

* push

* fix

* tf image patches

* fix torch model

* tf tests

* conversion

* everything aligned

* remove print

* tf tests

* fix tf

* make tf tests pass

* everything works

* fix init

* fix

* special treatment for sepconv1d

* style

* 🙏🏽

* add doc and cleanup

* add electra test again

* fix doc

* fix doc again

* fix doc again

* Update src/transformers/modeling_tf_pytorch_utils.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update src/transformers/models/conv_bert/configuration_conv_bert.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Update docs/source/model_doc/conv_bert.rst

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/auto/configuration_auto.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/models/conv_bert/configuration_conv_bert.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* conv_bert -> convbert

* more fixes from review

* add conversion script

* dont use pretrained embed

* unused config

* suggestions from julien

* some more fixes

* p -> param

* fix copyright

* fix doc

* Update src/transformers/models/convbert/configuration_convbert.py

Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>

* comments from reviews

* fix-copies

* fix style

* revert shape_list

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-01-27 03:20:09 -05:00
Lysandre 7d9a9d0c72 Release: v4.2.0 2021-01-13 16:01:51 +01:00
Patrick von Platen 9e1ea846bc
[README] Add new models (#9465)
* add new models

* make fix-copies
2021-01-08 05:49:43 -05:00
Clement 4eec5d0cf6
improve readme text to private models/versioning/api (#9424) 2021-01-05 15:02:46 -05:00
Sylvain Gugger 6d2e864db7
Put all models in the constants (#9170)
* Put all models in the constants

* Add Google AI mention in the main README
2020-12-17 11:23:21 -05:00
NielsRogge 1551e2dc6d
[WIP] Tapas v4 (tres) (#9117)
* First commit: adding all files from tapas_v3

* Fix multiple bugs including soft dependency and new structure of the library

* Improve testing by adding torch_device to inputs and adding dependency on scatter

* Use Python 3 inheritance rather than Python 2

* First draft model cards of base sized models

* Remove model cards as they are already on the hub

* Fix multiple bugs with integration tests

* All model integration tests pass

* Remove print statement

* Add test for convert_logits_to_predictions method of TapasTokenizer

* Incorporate suggestions by Google authors

* Fix remaining tests

* Change position embeddings sizes to 512 instead of 1024

* Comment out positional embedding sizes

* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

* Added more model names

* Fix truncation when no max length is specified

* Disable torchscript test

* Make style & make quality

* Quality

* Address CI needs

* Test the Masked LM model

* Fix the masked LM model

* Truncate when overflowing

* More much needed docs improvements

* Fix some URLs

* Some more docs improvements

* Test PyTorch scatter

* Set to slow + minify

* Calm flake8 down

* First commit: adding all files from tapas_v3

* Fix multiple bugs including soft dependency and new structure of the library

* Improve testing by adding torch_device to inputs and adding dependency on scatter

* Use Python 3 inheritance rather than Python 2

* First draft model cards of base sized models

* Remove model cards as they are already on the hub

* Fix multiple bugs with integration tests

* All model integration tests pass

* Remove print statement

* Add test for convert_logits_to_predictions method of TapasTokenizer

* Incorporate suggestions by Google authors

* Fix remaining tests

* Change position embeddings sizes to 512 instead of 1024

* Comment out positional embedding sizes

* Update PRETRAINED_VOCAB_FILES_MAP and PRETRAINED_POSITIONAL_EMBEDDINGS_SIZES

* Added more model names

* Fix truncation when no max length is specified

* Disable torchscript test

* Make style & make quality

* Quality

* Address CI needs

* Test the Masked LM model

* Fix the masked LM model

* Truncate when overflowing

* More much needed docs improvements

* Fix some URLs

* Some more docs improvements

* Add add_pooling_layer argument to TapasModel

Fix comments by @sgugger and @patrickvonplaten

* Fix issue in docs + fix style and quality

* Clean up conversion script and add task parameter to TapasConfig

* Revert the task parameter of TapasConfig

Some minor fixes

* Improve conversion script and add test for absolute position embeddings

* Improve conversion script and add test for absolute position embeddings

* Fix bug with reset_position_index_per_cell arg of the conversion cli

* Add notebooks to the examples directory and fix style and quality

* Apply suggestions from code review

* Move from `nielsr/` to `google/` namespace

* Apply Sylvain's comments

Co-authored-by: sgugger <sylvain.gugger@gmail.com>

Co-authored-by: Rogge Niels <niels.rogge@howest.be>
Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: sgugger <sylvain.gugger@gmail.com>
2020-12-15 17:08:49 -05:00
StillKeepTry df2af6d8b8 Add MP Net 2 (#9004) 2020-12-09 10:32:43 -05:00
Sylvain Gugger 00aa9dbca2
Copyright (#8970)
* Add copyright everywhere missing

* Style
2020-12-07 18:36:34 -05:00
Clement de6befd41f
Remove sourcerer (#8965) 2020-12-07 11:15:29 -05:00
Lysandre Debut 0c5615af66
Put Transformers on Conda (#8918)
* conda

* Guide

* correct tag

* Update README.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update docs/source/installation.md

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Sylvain's comments

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-12-03 14:28:49 -05:00
Julien Chaumond 9ad6194318
Tweak wording + Add badge w/ number of models on the hub (#8914)
* Add badge w/ number of models on the hub

* try to apease @sgugger 😇

* not sure what this `c` was about [ci skip]

* Fix script and move stuff around

* Fix doc styling error

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
2020-12-03 10:56:55 -05:00
Devangi Purkayastha e52f9c0ade
Update README.md (#8906) 2020-12-02 09:28:44 -08:00
Sylvain Gugger 75f8100fc7
Add a direct link to the big table (#8850) 2020-11-30 10:29:23 -05:00
Moussa Kamal Eddine 81fe0bf085
Add barthez model (#8393)
* Add init barthez

* Add barthez model, tokenizer and docs

BARThez is a pre-trained french seq2seq model that uses BART objective.

* Apply suggestions from code review docs typos

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Add license

* Change URLs scheme

* Remove barthez model keep tokenizer

* Fix style

* Fix quality

* Update tokenizer

* Add fast tokenizer

* Add fast tokenizer test

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-11-27 12:31:42 -05:00
Kevin Canwen Xu 94caaa93c2
Update the bibtex with EMNLP demo (#8678)
* Update the bibtex with EMNLP demo

* Update README.md

* Update README.md
2020-11-20 13:26:33 +08:00