mmengine/docs/en/get_started/introduction.md

122 lines
6.6 KiB
Markdown

# Introduction
MMEngine is a foundational library for training deep learning models based on
PyTorch. It supports running on Linux, Windows, and macOS. It has the
following three features:
1. **Universal and powerful executor**:
- Supports training different tasks with minimal code, such as training
ImageNet with just 80 lines of code (original PyTorch examples require
400 lines).
- Easily compatible with models from popular algorithm libraries like TIMM,
TorchVision, and Detectron2.
2. **Open architecture with unified interfaces**:
- Handles different tasks with a unified API: you can implement a method
once and apply it to all compatible models.
- Supports various backend devices through a simple, high-level
abstraction. Currently, MMEngine supports model training on Nvidia CUDA,
Mac MPS, AMD, MLU, and other devices.
3. **Customizable training process**:
- Defines a highly modular training engine with "Lego"-like composability.
- Offers a rich set of components and strategies.
- Total control over the training process with different levels of APIs.
## Architecture
![openmmlab-2.0-arch](https://user-images.githubusercontent.com/40779233/187065730-1e9af236-37dc-4dbd-b448-cce3b72b0109.png)
The above diagram illustrates the hierarchy of MMEngine in OpenMMLab 2.0.
MMEngine implements a next-generation training architecture for the OpenMMLab
algorithm library, providing a unified execution foundation for over 30
algorithm libraries within OpenMMLab. Its core components include the training
engine, evaluation engine, and module management.
## Module Introduction
MMEngine abstracts the components involved in the training process and their
relationships. Components of the same type in different algorithm libraries
share the same interface definition.
### Core Modules and Related Components
The core module of the training engine is the
[`Runner`](../tutorials/runner.md). The `Runner` is responsible for executing
training, testing, and inference tasks and managing the various components
required during these processes. In specific locations throughout the
execution of training, testing, and inference tasks, the `Runner` sets up Hooks
to allow users to extend, insert, and execute custom logic. The `Runner`
primarily invokes the following components to complete the training and
inference loops:
- [Dataset](../tutorials/dataset.md): Responsible for constructing datasets in
training, testing, and inference tasks, and feeding the data to the model.
In usage, it is wrapped by a PyTorch DataLoader, which launches multiple
subprocesses to load the data.
- [Model](../tutorials/model.md): Accepts data and outputs the loss during the
training process; accepts data and performs predictions during testing and
inference tasks. In a distributed environment, the model is wrapped by a
Model Wrapper (e.g., `MMDistributedDataParallel`).
- [Optimizer Wrapper](../tutorials/optim_wrapper.md): The optimizer wrapper
performs backpropagation to optimize the model during the training process
and supports mixed-precision training and gradient accumulation through a
unified interface.
- [Parameter Scheduler](../tutorials/param_scheduler.md): Dynamically adjusts
optimizer hyperparameters such as learning rate and momentum during the
training process.
During training intervals or testing phases, the [Metrics &
Evaluator](../tutorials/evaluation.md) are responsible for evaluating the
performance of the model. The `Evaluator` evaluates the model's predictions
based on the dataset. Within the `Evaluator`, there is an abstraction called
`Metrics`, which calculates various metrics such as recall, accuracy, and
others.
To ensure a unified interface, the communication interfaces between the
evaluators, models, and data in various algorithm libraries within OpenMMLab
2.0 are encapsulated using
[Data Elements](../advanced_tutorials/data_element.md).
During training and inference execution, the aforementioned components can
utilize the logging management module and visualizer for structured and
unstructured logging storage and visualization. [Logging
Modules](../advanced_tutorials/logging.md): Responsible for managing various
log information generated during the execution of the Runner. The Message Hub
implements data sharing between components, runners, and log processors, while
the Log Processor processes the log information. The processed logs are then
sent to the `Logger` and `Visualizer` for management and display. The
[`Visualizer`](../advanced_tutorials/visualization.md) is responsible for
visualizing the model's feature maps, prediction results, and structured logs
generated during the training process. It supports multiple visualization
backends such as TensorBoard and WanDB.
### Common Base Modules
MMEngine also implements various common base modules required during the
execution of algorithmic models, including:
- [Config](../advanced_tutorials/config.md): In the OpenMMLab algorithm library, users can configure the training, testing process,
and related components by writing a configuration file (config).
- [Registry](../advanced_tutorials/registry.md): Responsible for managing
modules within the algorithm library that have similar functionality. Based on the abstraction of algorithm library modules, MMEngine defines a set of root registries. Registries within the algorithm library can inherit from these root registries, enabling cross-algorithm library module invocations and interactions. This allows for seamless integration and utilization of modules across different algorithms within the OpenMMLab framework.
- [File I/O](../advanced_tutorials/fileio.md): Provides a unified interface
for file read/write operations in various modules, supporting multiple file
backend systems and formats in a consistent manner, with extensibility.
- [Distributed Communication Primitives](../advanced_tutorials/distributed.md):
Handles communication between different processes during distributed program
execution. This interface abstracts the differences between distributed and
non-distributed environments and automatically handles data devices and
communication backends.
- [Other Utilities](../advanced_tutorials/manager_mixin.md): There are also
utility modules, such as `ManagerMixin`, which implements a way to create
and access global variables. The base class for many globally accessible
objects within the `Runner` is `ManagerMixin`.
Users can further read the [tutorials](../tutorials/runner.md) to understand the advanced usage of these
modules or refer to the [design documents](../design/hook.md) to understand their design principles
and details.