Commit Graph

22 Commits

Author SHA1 Message Date
Sylvain Gugger 16edf4d9fd
Doc checks (#25408)
* Document check_dummies

* Type hints and doc in other files

* Document check inits

* Add documentation to

* Address review comments
2023-08-10 10:53:22 +02:00
Sylvain Gugger 67d074874d
Cleanup quality (#21493)
* Remove mentions of flake8/isort

* Clean up inits

* Deall with all other inits

* Last special rule for dummy files
2023-02-07 12:27:31 -05:00
Sylvain Gugger 3c7b965bcd
Add some tests for check_dummies (#19146) 2022-09-21 14:54:09 -04:00
Sylvain Gugger 451df725d6
Fix dummy creation for multi-frameworks objects (#19144) 2022-09-21 11:41:45 -04:00
Matt ee0d001de7
Add a TF in-graph tokenizer for BERT (#17701)
* Add a TF in-graph tokenizer for BERT

* Add from_pretrained

* Add proper truncation, option handling to match other tokenizers

* Add proper imports and guards

* Add test, fix all the bugs exposed by said test

* Fix truncation of paired texts in graph mode, more test updates

* Small fixes, add a (very careful) test for savedmodel

* Add tensorflow-text dependency, make fixup

* Update documentation

* Update documentation

* make fixup

* Slight changes to tests

* Add some docstring examples

* Update tests

* Update tests and add proper lowercasing/normalization

* make fixup

* Add docstring for padding!

* Mark slow tests

* make fixup

* Fall back to BertTokenizerFast if BertTokenizer is unavailable

* Fall back to BertTokenizerFast if BertTokenizer is unavailable

* make fixup

* Properly handle tensorflow-text dummies
2022-06-27 12:06:21 +01:00
Sylvain Gugger 032d63b976
Fix dummy creation script (#17304) 2022-05-17 12:56:24 -04:00
Sylvain Gugger 4975002df5
Reorganize file utils (#16264)
* Split file_utils in several submodules

* Fixes

* Add back more objects

* More fixes

* Who exactly decided to import that from there?

* Second suggestion to code with code review

* Revert wront move

* Fix imports

* Adapt all imports

* Adapt all imports everywhere

* Revert this import, will fix in a separate commit
2022-03-23 10:26:33 -04:00
Sylvain Gugger 1b730c3d11
Better dummies (#15148)
* Better dummies

* See if this fixes the issue

* Fix quality

* Style

* Add doc for DummyObject
2022-01-14 10:59:41 -05:00
Sylvain Gugger 1a92bc5788
Fix dummy objects for quantization (#14478)
* Fix dummy objects for quantization

* Add more models
2021-11-21 17:39:20 -05:00
Sylvain Gugger 3e8d17e66d
Add forward method to dummy models (#14419)
* Add forward method to dummy models

* Fix quality
2021-11-16 09:24:40 -05:00
Lysandre Debut d07b540a37
Have dummy processors have a `from_pretrained` method (#12145) 2021-06-15 08:39:05 -04:00
Lysandre Debut 3b1f5caff2
Add from_pretrained to dummy timm objects (#12097)
* Add from_pretrained to dummy timm

* Fix at the source

* Update utils/check_dummies.py

Co-authored-by: Lysandre Debut <lysandre@huggingface.co>

* Missing pretrained dummies

* Style

Co-authored-by: Sylvain Gugger <sylvain.gugger@gmail.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-06-11 12:27:10 -04:00
Patrick von Platen 32dbb2d954
make style (#11442) 2021-04-26 13:50:34 +02:00
Sylvain Gugger 11505fa139
Dummies multi backend (#11100)
* Replaces requires_xxx by one generic method

* Quality and update check_dummies

* Fix inits check

* Post-merge cleanup
2021-04-07 09:56:40 -04:00
Sylvain Gugger 403d530eec
Auto feature extractor (#11097)
* AutoFeatureExtractor

* Init and first tests

* Tests

* Damn you gitignore

* Quality

* Defensive test for when not all backends are here

* Use pattern for Speech2Text models
2021-04-06 19:20:08 -04:00
Sylvain Gugger b0595d33c1
Add ImageFeatureExtractionMixin (#10905)
* Add ImageFeatureExtractionMixin

* Add dummy vision objects

* Add require_vision

* Add tests

* Fix test
2021-03-26 11:23:56 -04:00
Sylvain Gugger 758ed3332b
Transformers fast import part 2 (#9446)
* Main init work

* Add version

* Change from absolute to relative imports

* Fix imports

* One more typo

* More typos

* Styling

* Make quality script pass

* Add necessary replace in template

* Fix typos

* Spaces are ignored in replace for some reason

* Forgot one models.

* Fixes for import

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>

* Add documentation

* Styling

Co-authored-by: LysandreJik <lysandre.debut@reseau.eseo.fr>
2021-01-07 09:36:14 -05:00
Julien Plu 91a67b7506
Use LF instead of os.linesep (#8491) 2020-11-12 13:52:40 -05:00
Julien Chaumond 70f622fab4
Model versioning (#8324)
* fix typo

* rm use_cdn & references, and implement new hf_bucket_url

* I'm pretty sure we don't need to `read` this file

* same here

* [BIG] file_utils.networking: do not gobble up errors anymore

* Fix CI 😇

* Apply suggestions from code review

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Tiny doc tweak

* Add doc + pass kwarg everywhere

* Add more tests and explain

cc @sshleifer let me know if better

Co-Authored-By: Sam Shleifer <sshleifer@gmail.com>

* Also implement revision in pipelines

In the case where we're passing a task name or a string model identifier

* Fix CI 😇

* Fix CI

* [hf_api] new methods + command line implem

* make style

* Final endpoints post-migration

* Fix post-migration

* Py3.6 compat

cc @stefan-it

Thank you @stas00

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
2020-11-10 07:11:02 -05:00
Sylvain Gugger 6d4f8bd02a
Add Flax dummy objects (#7918) 2020-10-20 07:45:48 -04:00
Thomas Wolf ba8c4d0ac0
[Dependencies|tokenizers] Make both SentencePiece and Tokenizers optional dependencies (#7659)
* splitting fast and slow tokenizers [WIP]

* [WIP] splitting sentencepiece and tokenizers dependencies

* update dummy objects

* add name_or_path to models and tokenizers

* prefix added to file names

* prefix

* styling + quality

* spliting all the tokenizer files - sorting sentencepiece based ones

* update tokenizer version up to 0.9.0

* remove hard dependency on sentencepiece 🎉

* and removed hard dependency on tokenizers 🎉

* update conversion script

* update missing models

* fixing tests

* move test_tokenization_fast to main tokenization tests - fix bugs

* bump up tokenizers

* fix bert_generation

* update ad fix several tokenizers

* keep sentencepiece in deps for now

* fix funnel and deberta tests

* fix fsmt

* fix marian tests

* fix layoutlm

* fix squeezebert and gpt2

* fix T5 tokenization

* fix xlnet tests

* style

* fix mbart

* bump up tokenizers to 0.9.2

* fix model tests

* fix tf models

* fix seq2seq examples

* fix tests without sentencepiece

* fix slow => fast  conversion without sentencepiece

* update auto and bert generation tests

* fix mbart tests

* fix auto and common test without tokenizers

* fix tests without tokenizers

* clean up tests lighten up when tokenizers + sentencepiece are both off

* style quality and tests fixing

* add sentencepiece to doc/examples reqs

* leave sentencepiece on for now

* style quality split hebert and fix pegasus

* WIP Herbert fast

* add sample_text_no_unicode and fix hebert tokenization

* skip FSMT example test for now

* fix style

* fix fsmt in example tests

* update following Lysandre and Sylvain's comments

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/testing_utils.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update src/transformers/tokenization_utils_base.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2020-10-18 20:51:24 +02:00
Sylvain Gugger 28d183c90c
Allow soft dependencies in the namespace with ImportErrors at use (#7537)
* PoC on RAG

* Format class name/obj name

* Better name in message

* PoC on one TF model

* Add PyTorch and TF dummy objects + script

* Treat scikit-learn

* Bad copy pastes

* Typo
2020-10-05 09:12:04 -04:00