* add a multi-gpu job for all example tests
* run only ported tests
* rename
* explain why env is re-activated on each step
* mark all unported/checked tests with @require_torch_non_multigpu_but_fix_me
* style
* Apply suggestions from code review
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
Co-authored-by: Sam Shleifer <sshleifer@gmail.com>
* [testing] switch to a new TestCasePlus + get_auto_remove_tmp_dir() for auto-removal of tmp dirs
* respect after=True for tempfile, simplify code
* comments
* comment fix
* put `before` last in args, so can make debug even faster
* Add BERT Loses Patience (Patience-based Early Exit)
* update model archive
* update format
* sort import
* flake8
* Add results
* full results
* align the table
* refactor to inherit
* default per gpu eval = 1
* Formatting
* Formatting
* isort
* modify readme
* Add check
* Fix format
* Fix format
* Doc strings
* ALBERT & BERT for sequence classification don't inherit from the original anymore
* Remove incorrect comments
* Remove incorrect comments
* Remove incorrect comments
* Sync up with new code
* Sync up with new code
* Add a test
* Add a test
* Add a test
* Add a test
* Add a test
* Add a test
* Finishing up!