transformers

History

Ariel Ekgren 5f94855dc3 Add gpt-sw3 model to transformers (#20209 ) * Add templates for gpt-sw3 * Add templates for gpt-sw3 * Added sentencepiece tokenizer * intermediate commit with many changes * fixed conflicts * Init commit for tokenization port * Tokenization progress * Remove fast tokenizer * Clean up and rename spm.model -> spiece.model * Remove TF -> PT conversion script template, Clean up Megatron -> PT script * Optimize encode & decode performance * added new attention * added new attention * attention for gpt-sw3 working * attention good * Cache is now working * fixed attention mask so that it works with causal attention * fixed badbmm bug for cpu and caching * updated config with correct parameters * Refactor and leave optimizations as separate functions to avoid breaking expected functionality * Fix special tokens mapping for both tokenizers * cleaning up of code and comments * HF compatible attention outputs * Tokenizer now passing tests, add documentation * Update documentation * reverted back to base implementation after checking that it is identical to pretrained model * updated gpt-sw3 config * updated conversion script * aligned parameters with gpt-sw3 config * changed default scale_attn_by_inverse_layer_idx to true * removed flag from conversion script * added temporary model path * reverted back to functioning convert script * small changes to default config * updated tests for gpt-sw3 * make style, make quality, minor cleanup * Change local paths to testing online repository * Change name: GptSw3 -> GPTSw3 * Remove GPTSw3TokenizerFast references * Use official model repository and add more model sizes * Added reference to 6.7b model * Add GPTSw3DoubleHeadsModel to IGNORE_NON_AUTO_CONFIGURED, like GPT2DoubleHeadsModel * Remove pointers to non-existing TFGPTSw3 * Add GPTSw3 to docs/_toctree.yml * Remove TF artifacts from GPTSw3 in __init__ files * Update README:s with 'make fix-copies' * Add 20b model to archive list * Add documentation for GPT-Sw3 * Fix typo in documentation for GPT-Sw3 * Do 'make fix-copies' again after having updated docs * Fix some typos in docs * Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/configuration_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update tests/models/gpt_sw3/test_tokenization_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/modeling_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Resolve comments from PR feedback * Resolve more comments from PR feedback, also set use_cache=True in convert script * Add '# Copied from' comments for GPTSw3 modeling * Set 'is_parallelizable = False' * Remove '# Copied from' where code was modified and add 'with x->y' when appropriate * Remove parallelize in mdx * make style, make quality * Update GPTSw3Config default values and corresponding documentation * Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/__init__.py Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Clean up and protect GPTSw3Tokenizer imports with is_sentencepiece_available * Make style, make quality * Add dummy object for GPTSw3Tokenizer via 'make fix-copies' * make fix-copies * Remove GPTSw3 modeling classes * make style, make quality * Add GPTSw3 auto-mappings for other GPT2 heads * Update docs/source/en/model_doc/gpt-sw3.mdx Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/convert_megatron_to_pytorch.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Update src/transformers/models/gpt_sw3/tokenization_gpt_sw3.py Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * Remove old TODO-comment * Add example usage to GPTSw3Tokenizer docstring * make style, make quality * Add implementation details and example usage to gpt-sw3.mdx Co-authored-by: JoeyOhman <joeyoh@kth.se> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>		2022-12-12 13:12:13 -05:00
..
test_module	AutoImageProcessor (#20111 )	2022-11-08 19:54:41 +00:00
tf_ops	Check TF ops for ONNX compliance (#10025 )	2021-02-15 07:55:10 -05:00
check_config_docstrings.py	Create dummy models (#19901 )	2022-10-28 13:05:41 +02:00
check_copies.py	README in Hindi 🇮🇳 (#20097 )	2022-12-06 01:04:40 +05:30
check_doc_toc.py	Split model list on modality (#18328 )	2022-08-01 11:10:20 -05:00
check_dummies.py	Add some tests for check_dummies (#19146 )	2022-09-21 14:54:09 -04:00
check_inits.py	Add ESMFold (#19977 )	2022-10-31 21:32:58 -04:00
check_repo.py	Add gpt-sw3 model to transformers (#20209 )	2022-12-12 13:12:13 -05:00
check_self_hosted_runner.py	Add offline runners info in the Slack report (#19169 )	2022-09-23 19:23:05 +02:00
check_table.py	Fix some typos. (#17560 )	2022-07-11 05:00:13 -04:00
check_tf_ops.py	Check TF ops for ONNX compliance (#10025 )	2021-02-15 07:55:10 -05:00
create_dummy_models.py	[Tiny model creation] deal with `ImageProcessor` (#20298 )	2022-11-17 20:49:46 +01:00
custom_init_isort.py	Fix init import_structure sorting (#20477 )	2022-11-29 09:46:10 -05:00
documentation_tests.txt	[Doctest] Add configuration_fsmt.py (#19936 )	2022-11-28 09:47:45 -05:00
download_glue_data.py	Raise exceptions instead of asserts (#13907 )	2021-10-07 12:44:23 +05:30
extract_warnings.py	Update some GH action versions (#20537 )	2022-12-06 16:54:40 +01:00
get_ci_error_statistics.py	Update Past CI report script (#19228 )	2022-09-29 19:22:23 +02:00
get_github_job_time.py	add a script to get time info. from GA workflow jobs (#18822 )	2022-09-01 12:02:52 +02:00
get_modified_files.py	Updates the default branch from master to main (#16326 )	2022-03-23 03:46:59 -04:00
notification_service.py	extract warnings in GH workflows (#20487 )	2022-11-29 15:58:54 +01:00
notification_service_doc_tests.py	fix missing block when there is no failure (#18775 )	2022-08-29 09:10:13 +02:00
past_ci_versions.py	Add PyTorch 1.11 to past CI (#18302 )	2022-07-26 15:47:23 +02:00
prepare_for_doc_test.py	Add a check regarding the number of occurrences of ``` (#18389 )	2022-08-01 14:23:02 +02:00
print_env.py	Print more library versions in CI (#17384 )	2022-06-02 10:24:16 +02:00
release.py	Clean README in post release job as well. (#17519 )	2022-06-02 07:44:03 -04:00
sort_auto_mappings.py	Automatically sort auto mappings (#17250 )	2022-05-16 13:24:20 -04:00
tests_fetcher.py	Add `dpt-hybrid` support (#20645 )	2022-12-07 17:01:55 +01:00
update_metadata.py	Add video classification pipeline (#20151 )	2022-12-08 16:22:43 -05:00