[docs] post-PR merge fix (#15355)
* [docs] post-PR merge fix * Update docs/source/main_classes/deepspeed.mdx Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
This commit is contained in:
parent
99a2771189
commit
fc8fc400e3
|
@ -31,7 +31,7 @@ won't be possible on a single GPU.
|
|||
|
||||
🤗 Transformers integrates [DeepSpeed](https://github.com/microsoft/DeepSpeed) via 2 options:
|
||||
|
||||
1. Integration of the core DeepSpeed features via [`Trainer`]. This is everything done for your type
|
||||
1. Integration of the core DeepSpeed features via [`Trainer`]. This is an everything-done-for-you type
|
||||
of integration - just supply your custom config file or use our template and you have nothing else to do. Most of
|
||||
this document is focused on this feature.
|
||||
2. If you don't use [`Trainer`] and want to use your own Trainer where you integrated DeepSpeed
|
||||
|
@ -604,7 +604,7 @@ The following is an example of configuration for ZeRO stage 2:
|
|||
**Performance tuning:**
|
||||
|
||||
- enabling `offload_optimizer` should reduce GPU RAM usage (it requires `"stage": 2`)
|
||||
- `"overlap_comm": true` trade offs increased GPU RAM usage to lower all-reduce latency. `overlap_comm` uses 4.5x
|
||||
- `"overlap_comm": true` trades off increased GPU RAM usage to lower all-reduce latency. `overlap_comm` uses 4.5x
|
||||
the `allgather_bucket_size` and `reduce_bucket_size` values. So if they are set to 5e8, this requires a 9GB
|
||||
footprint (`5e8 x 2Bytes x 2 x 4.5`). Therefore, if you have a GPU with 8GB or less RAM, to avoid getting
|
||||
OOM-errors you will need to reduce those parameters to about `2e8`, which would require 3.6GB. You will want to do
|
||||
|
|
Loading…
Reference in New Issue