Stas Bekman
619200cc42
[cuda ext tests] fixing tests ( #11619 )
...
* fixing tests
* cleanup
2021-05-06 13:35:28 -07:00
Stas Bekman
4e7bf94e72
[DeepSpeed] fp32 support ( #11499 )
...
* prep for deepspeed==0.3.16
* new version
* too soon
* support and test fp32 mode
* troubleshooting doc start
* workaround no longer needed
* add fp32 doc
* style
* cleanup, add tf32 note
* clarify
* release was made
2021-04-30 12:51:48 -07:00
Stas Bekman
bc2571e61c
[Deepspeed] ZeRO-Infinity integration plus config revamp ( #11418 )
...
* adding Z-inf
* revamp config process
* up version requirement
* wip
* massive rewrite
* cleanup
* cleanup
* Apply suggestions from code review
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
* consistent json commas
* act on suggestions
* leave this feature for 0.3.16
* style
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
2021-04-26 10:40:32 -07:00
Bhadresh Savani
1d30ec95c7
[Examples] Fixes inconsistency around eval vs val and predict vs test ( #11380 )
...
* added changes for uniformity
* modified files
* corrected typo
* fixed qa scripts
* fix typos
* fixed predict typo in qa no trainer
* fixed test file
* reverted trainer changes
* reverted trainer changes in custom exmaples
* updated readme
* added changes in deepspeed test
* added changes for predict and eval
2021-04-26 09:24:31 -07:00
Patrick von Platen
32dbb2d954
make style ( #11442 )
2021-04-26 13:50:34 +02:00
Sylvain Gugger
dabeb15292
Examples reorg ( #11350 )
...
* Base move
* Examples reorganization
* Update references
* Put back test data
* Move conftest
* More fixes
* Move test data to test fixtures
* Update path
* Apply suggestions from code review
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
* Address review comments and clean
Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
2021-04-21 11:11:20 -04:00
Stas Bekman
83206ca6a8
[deepspeed] test on one node 2 gpus max ( #11237 )
...
* test on one node 2 gpus max
* fix the other place
* refactor
* fix
* cleanup
* more exact version
2021-04-14 11:06:59 -07:00
Stas Bekman
3d339ee659
[Deepspeed] zero3 tests band aid ( #11235 )
...
* temp band-aid
* style
2021-04-13 17:58:09 -04:00
Stas Bekman
66446909b2
[tests] relocate core integration tests ( #11146 )
...
* relocate core integration tests
* add sys.path context manager
* cleanup
* try
* try2
* fix path
* doc
* style
* add dep
* add 2 more deps
2021-04-08 13:13:17 -07:00