* init PR
* optimize top p and add edge case
* styling
* style
* revert tf and flax test
* add edge case test for FLAX and TF
* update doc with smallest set sampling for top p
* make style
* Override save() to use the serving signature as the default
* Replace int32 with int64 in all our serving signatures
* Remember one very important line so as not to break every test at once
* Dtype fix for TFLED
* dtype fix for shift_tokens_right in general
* Dtype fixes in mBART and RAG
* Fix dtypes for test_unpack_inputs
* More dtype fixes
* Yet more mBART + RAG dtype fixes
* Yet more mBART + RAG dtype fixes
* Add a check that the model actually has a serving method
* Updated test values
The image segmentation pipeline tests - tests/pipelines/test_pipelines_image_segmentation.py - were failing after the merging of #1849 (49e44b216b). This was due to the difference in rescaling. Previously the images were rescaled by `image = image / 255`. In the new commit, a `rescale` method was added, and images rescaled using `image = image * scale`. This was known to cause small differences in the processed images (see
[PR comment](https://github.com/huggingface/transformers/pull/18499#discussion_r940347575)).
Testing locally, changing the `rescale` method to divide by a scale factor (255) resulted in the tests passing. It was therefore decided the test values could be updated, as there was no logic difference between the commits.
* Use double quotes, like previous example
* Fix up
* add gpt-neox-japanese model and tokenizer as new model
* Correction to PR's comment for GPT NeoX Japanese
- Fix to be able to use gpu
- Add comment # Copied... at the top of RotaryEmbedding
- Implement nn.Linear instead of original linear class
- Add generation test under @slow
* fix bias treatment for gpt-neox-japanese
* Modidy gpt-neox-japanese following PR
- add doc for bias_dropout_add
- style change following a PR comment
* add document for gpt-neox-japanese
* remove unused import from gpt-neox-japanese
* fix README for gpt-neox-japanese
* First draft
* More improvements
* Improve model, add custom CUDA code
* Import torch before
* Add script that imports custom layer
* Add everything in new ops directory
* Import custom layer in modeling file
* Fix ARCHIVE_MAP typo
* Creating the custom kernel on the fly.
* Import custom layer in modeling file
* More improvements
* Fix CUDA loading
* More improvements
* Improve conversion script
* Improve conversion script
* Make it work until encoder_outputs
* Make forward pass work
* More improvements
* Make logits match original implementation
* Make implementation also support single_scale model
* Add support for single_scale and dilation checkpoint
* Add support for with_box_refine model
* Support also two stage model
* Improve tests
* Fix more tests
* Make more tests pass
* Upload all models to the hub
* Clean up some code
* Improve decoder outputs
* Rename intermediate hidden states and reference points
* Improve model outputs
* Move tests to dedicated folder
* Improve model outputs
* Fix retain_grad test
* Improve docs
* Clean up and make test_initialization pass
* Improve variable names
* Add copied from statements
* Improve docs
* Fix style
* Improve docs
* Improve docs, move tests to model folder
* Fix rebase
* Remove DetrForSegmentation from auto mapping
* Apply suggestions from code review
* Improve variable names and docstrings
* Apply some more suggestions from code review
* Apply suggestion from code review
* better docs and variables names
* hint to num_queries and two_stage confusion
* remove asserts and code refactor
* add exception if two_stage is True and with_box_refine is False
* use f-strings
* Improve docs and variable names
* Fix code quality
* Fix rebase
* Add require_torch_gpu decorator
* Add pip install ninja to CI jobs
* Apply suggestion of @sgugger
* Remove DeformableDetrForObjectDetection from auto mapping
* Remove DeformableDetrModel from auto mapping
* Add model to toctree
* Add model back to mappings, skip model in pipeline tests
* Apply @sgugger's suggestion
* Fix imports in the init
* Fix copies
* Add CPU implementation
* Comment out GPU function
* Undo previous change
* Apply more suggestions
* Remove require_torch_gpu annotator
* Fix quality
* Add logger.info
* Fix logger
* Fix variable names
* Fix initializaztion
* Add missing initialization
* Update checkpoint name
* Add model to doc tests
* Add CPU/GPU equivalence test
* Add Deformable DETR to pipeline tests
* Skip model for object detection pipeline
Co-authored-by: Nicolas Patry <patry.nicolas@protonmail.com>
Co-authored-by: Nouamane Tazi <nouamane98@gmail.com>
Co-authored-by: Sylvain Gugger <Sylvain.gugger@gmail.com>
FlaxLongT5PreTrainedModel is missing "enable_gradient_checkpointing" function. This gives an error if someone tries to enable gradient checkpointing for longt5.
This pull request fixes it.
only main_process will have HPO, and pass argument to other process
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
* Use int64 throughout TFLongFormer
* make style
* Do some more fixed casting in TFLongFormer
* Fix some wonky "is None" conditionals
* Cast all the dtypes, salt the earth
* Fix copies to TFLED as well and do some casting there
* dtype fix in TFLongformer test
* Make fixup
* Expand tolerances on the LED tests too (I think this is a TF32 thing)
* Expand test tolerances for LED a tiny bit (probably a Tensorfloat thing again)