Suraj Patil
ce91bf9a34
[GPTJ] enable common tests and few fixes ( #14190 )
...
* enable common tests, small fixes
* don't tie word embeds
* don't ignore lm_head
2021-11-01 22:38:52 +05:30
Thomas Wang
5b45422b58
Remove n_ctx from configs ( #14165 )
...
* Remove n_ctx from configs
* Fix GPTJ and OpenAIGPT, both are acceptable breaking changes as there are no configs such that it breaks
* Remove unecessary n_positions from TFOpenAIGPT
2021-10-29 11:50:25 +02:00
Suraj Patil
8bbb53e20b
skip gptj slow generate tests for now ( #13809 )
2021-09-30 15:44:33 -04:00
Anton Lozhkov
7c7d2ec952
[GPT-J] Use the `float16` checkpoints in integration tests ( #13676 )
...
* Use fp16 checkpoints
* Style
* Fix outputs and disable OOM tests
* Correct another output
* Use a random smaller model for generation tests
* repo quickfix
* fix gradient checkpointing
2021-09-22 23:17:57 +03:00
Sylvain Gugger
27d4639779
Make gradient_checkpointing a training argument ( #13657 )
...
* Make gradient_checkpointing a training argument
* Update src/transformers/modeling_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Update src/transformers/configuration_utils.py
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
* Fix tests
* Style
* document Gradient Checkpointing as a performance feature
* Small rename
* PoC for not using the config
* Adapt BC to new PoC
* Forgot to save
* Rollout changes to all other models
* Fix typo
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
Co-authored-by: Stas Bekman <stas@stason.org>
2021-09-22 07:51:38 -04:00
Stella Biderman
c02cd95c56
GPT-J-6B ( #13022 )
...
* Test GPTJ implementation
* Fixed conflicts
* Update __init__.py
* Update __init__.py
* change GPT_J to GPTJ
* fix missing imports and typos
* use einops for now
(need to change to torch ops later)
* Use torch ops instead of einsum
* remove einops deps
* Update configuration_auto.py
* Added GPT J
* Update gptj.rst
* Update __init__.py
* Update test_modeling_gptj.py
* Added GPT J
* Changed configs to match GPT2 instead of GPT Neo
* Removed non-existent sequence model
* Update configuration_auto.py
* Update configuration_auto.py
* Update configuration_auto.py
* Update modeling_gptj.py
* Update modeling_gptj.py
* Progress on updating configs to agree with GPT2
* Update modeling_gptj.py
* num_layers -> n_layer
* layer_norm_eps -> layer_norm_epsilon
* attention_layers -> num_hidden_layers
* Update modeling_gptj.py
* attention_pdrop -> attn_pdrop
* hidden_act -> activation_function
* Update configuration_gptj.py
* Update configuration_gptj.py
* Update configuration_gptj.py
* Update configuration_gptj.py
* Update configuration_gptj.py
* Update modeling_gptj.py
* Update modeling_gptj.py
* Update modeling_gptj.py
* Update modeling_gptj.py
* Update modeling_gptj.py
* Update modeling_gptj.py
* fix layernorm and lm_head size
delete attn_type
* Update docs/source/model_doc/gptj.rst
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* removed claim that GPT J uses local attention
* Removed GPTJForSequenceClassification
* Update src/transformers/models/gptj/configuration_gptj.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Removed unsupported boilerplate
* Update tests/test_modeling_gptj.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Update tests/test_modeling_gptj.py
Co-authored-by: Eric Hallahan <eric@hallahans.name>
* Update tests/test_modeling_gptj.py
Co-authored-by: Eric Hallahan <eric@hallahans.name>
* Update tests/test_modeling_gptj.py
Co-authored-by: Eric Hallahan <eric@hallahans.name>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Update __init__.py
* Update configuration_gptj.py
* Update modeling_gptj.py
* Corrected indentation
* Remove stray backslash
* Delete .DS_Store
* Delete .DS_Store
* Delete .DS_Store
* Delete .DS_Store
* Delete .DS_Store
* Update docs to match
* Remove tf loading
* Remove config.jax
* Remove stray `else:` statement
* Remove references to `load_tf_weights_in_gptj`
* Adapt tests to match output from GPT-J 6B
* Apply suggestions from code review
Co-authored-by: Suraj Patil <surajp815@gmail.com>
* Default `activation_function` to `gelu_new`
- Specify the approximate formulation of GELU to ensure parity with the default setting of `jax.nn.gelu()`
* Fix part of the config documentation
* Revert "Update configuration_auto.py"
This reverts commit e9860e9c04
.
* Revert "Update configuration_auto.py"
This reverts commit cfaaae4c4d
.
* Revert "Update configuration_auto.py"
This reverts commit 687788954f
.
* Revert "Update configuration_auto.py"
This reverts commit 194d024ea8
.
* Hyphenate GPT-J
* Undid sorting of the models alphabetically
* Reverting previous commit
* fix style and quality issues
* Update docs/source/model_doc/gptj.rst
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update tests/test_modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/__init__.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/configuration_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/configuration_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/configuration_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Replaced GPTJ-specific code with generic code
* Update src/transformers/models/gptj/modeling_gptj.py
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* Made the code always use rotary positional encodings
* Update index.rst
* Fix documentation
* Combine attention classes
- Condense all attention operations into `GPTJAttention`
- Replicate GPT-2 and improve code clarity by renaming `GPTJAttention.attn_pdrop` and `GPTJAttention.resid_pdrop` to `GPTJAttention.attn_dropout` and `GPTJAttention.resid_dropout`
* Removed `config.rotary_dim` from tests
* Update test_modeling_gptj.py
* Update test_modeling_gptj.py
* Fix formatting
* Removed depreciated argument `layer_id` to `GPTJAttention`
* Update modeling_gptj.py
* Update modeling_gptj.py
* Fix code quality
* Restore model functionality
* Save `lm_head.weight` in checkpoints
* Fix crashes when loading with reduced precision
* refactor self._attn(...)` and rename layer weights"
* make sure logits are in fp32 for sampling
* improve docs
* Add `GPTJForCausalLM` to `TextGenerationPipeline` whitelist
* Added GPT-J to the README
* Fix doc/readme consistency
* Add rough parallelization support
- Remove unused imports and variables
- Clean up docstrings
- Port experimental parallelization code from GPT-2 into GPT-J
* Clean up loose ends
* Fix index.rst
Co-authored-by: kurumuz <kurumuz1@gmail.com>
Co-authored-by: Suraj Patil <surajp815@gmail.com>
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
Co-authored-by: Eric Hallahan <eric@hallahans.name>
Co-authored-by: Leo Gao <54557097+leogao2@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: your_github_username <your_github_email>
Co-authored-by: Patrick von Platen <patrick.v.platen@gmail.com>
2021-08-31 17:53:02 +02:00