mirror of https://github.com/open-mmlab/mmengine
122 lines
6.6 KiB
Markdown
122 lines
6.6 KiB
Markdown
# Introduction
|
|
|
|
MMEngine is a foundational library for training deep learning models based on
|
|
PyTorch. It supports running on Linux, Windows, and macOS. It has the
|
|
following three features:
|
|
|
|
1. **Universal and powerful executor**:
|
|
|
|
- Supports training different tasks with minimal code, such as training
|
|
ImageNet with just 80 lines of code (original PyTorch examples require
|
|
400 lines).
|
|
- Easily compatible with models from popular algorithm libraries like TIMM,
|
|
TorchVision, and Detectron2.
|
|
|
|
2. **Open architecture with unified interfaces**:
|
|
|
|
- Handles different tasks with a unified API: you can implement a method
|
|
once and apply it to all compatible models.
|
|
- Supports various backend devices through a simple, high-level
|
|
abstraction. Currently, MMEngine supports model training on Nvidia CUDA,
|
|
Mac MPS, AMD, MLU, and other devices.
|
|
|
|
3. **Customizable training process**:
|
|
|
|
- Defines a highly modular training engine with "Lego"-like composability.
|
|
- Offers a rich set of components and strategies.
|
|
- Total control over the training process with different levels of APIs.
|
|
|
|
## Architecture
|
|
|
|

|
|
|
|
The above diagram illustrates the hierarchy of MMEngine in OpenMMLab 2.0.
|
|
MMEngine implements a next-generation training architecture for the OpenMMLab
|
|
algorithm library, providing a unified execution foundation for over 30
|
|
algorithm libraries within OpenMMLab. Its core components include the training
|
|
engine, evaluation engine, and module management.
|
|
|
|
## Module Introduction
|
|
|
|
MMEngine abstracts the components involved in the training process and their
|
|
relationships. Components of the same type in different algorithm libraries
|
|
share the same interface definition.
|
|
|
|
### Core Modules and Related Components
|
|
|
|
The core module of the training engine is the
|
|
[`Runner`](../tutorials/runner.md). The `Runner` is responsible for executing
|
|
training, testing, and inference tasks and managing the various components
|
|
required during these processes. In specific locations throughout the
|
|
execution of training, testing, and inference tasks, the `Runner` sets up Hooks
|
|
to allow users to extend, insert, and execute custom logic. The `Runner`
|
|
primarily invokes the following components to complete the training and
|
|
inference loops:
|
|
|
|
- [Dataset](../tutorials/dataset.md): Responsible for constructing datasets in
|
|
training, testing, and inference tasks, and feeding the data to the model.
|
|
In usage, it is wrapped by a PyTorch DataLoader, which launches multiple
|
|
subprocesses to load the data.
|
|
- [Model](../tutorials/model.md): Accepts data and outputs the loss during the
|
|
training process; accepts data and performs predictions during testing and
|
|
inference tasks. In a distributed environment, the model is wrapped by a
|
|
Model Wrapper (e.g., `MMDistributedDataParallel`).
|
|
- [Optimizer Wrapper](../tutorials/optim_wrapper.md): The optimizer wrapper
|
|
performs backpropagation to optimize the model during the training process
|
|
and supports mixed-precision training and gradient accumulation through a
|
|
unified interface.
|
|
- [Parameter Scheduler](../tutorials/param_scheduler.md): Dynamically adjusts
|
|
optimizer hyperparameters such as learning rate and momentum during the
|
|
training process.
|
|
|
|
During training intervals or testing phases, the [Metrics &
|
|
Evaluator](../tutorials/evaluation.md) are responsible for evaluating the
|
|
performance of the model. The `Evaluator` evaluates the model's predictions
|
|
based on the dataset. Within the `Evaluator`, there is an abstraction called
|
|
`Metrics`, which calculates various metrics such as recall, accuracy, and
|
|
others.
|
|
|
|
To ensure a unified interface, the communication interfaces between the
|
|
evaluators, models, and data in various algorithm libraries within OpenMMLab
|
|
2.0 are encapsulated using
|
|
[Data Elements](../advanced_tutorials/data_element.md).
|
|
|
|
During training and inference execution, the aforementioned components can
|
|
utilize the logging management module and visualizer for structured and
|
|
unstructured logging storage and visualization. [Logging
|
|
Modules](../advanced_tutorials/logging.md): Responsible for managing various
|
|
log information generated during the execution of the Runner. The Message Hub
|
|
implements data sharing between components, runners, and log processors, while
|
|
the Log Processor processes the log information. The processed logs are then
|
|
sent to the `Logger` and `Visualizer` for management and display. The
|
|
[`Visualizer`](../advanced_tutorials/visualization.md) is responsible for
|
|
visualizing the model's feature maps, prediction results, and structured logs
|
|
generated during the training process. It supports multiple visualization
|
|
backends such as TensorBoard and WanDB.
|
|
|
|
### Common Base Modules
|
|
|
|
MMEngine also implements various common base modules required during the
|
|
execution of algorithmic models, including:
|
|
|
|
- [Config](../advanced_tutorials/config.md): In the OpenMMLab algorithm library, users can configure the training, testing process,
|
|
and related components by writing a configuration file (config).
|
|
- [Registry](../advanced_tutorials/registry.md): Responsible for managing
|
|
modules within the algorithm library that have similar functionality. Based on the abstraction of algorithm library modules, MMEngine defines a set of root registries. Registries within the algorithm library can inherit from these root registries, enabling cross-algorithm library module invocations and interactions. This allows for seamless integration and utilization of modules across different algorithms within the OpenMMLab framework.
|
|
- [File I/O](../advanced_tutorials/fileio.md): Provides a unified interface
|
|
for file read/write operations in various modules, supporting multiple file
|
|
backend systems and formats in a consistent manner, with extensibility.
|
|
- [Distributed Communication Primitives](../advanced_tutorials/distributed.md):
|
|
Handles communication between different processes during distributed program
|
|
execution. This interface abstracts the differences between distributed and
|
|
non-distributed environments and automatically handles data devices and
|
|
communication backends.
|
|
- [Other Utilities](../advanced_tutorials/manager_mixin.md): There are also
|
|
utility modules, such as `ManagerMixin`, which implements a way to create
|
|
and access global variables. The base class for many globally accessible
|
|
objects within the `Runner` is `ManagerMixin`.
|
|
|
|
Users can further read the [tutorials](../tutorials/runner.md) to understand the advanced usage of these
|
|
modules or refer to the [design documents](../design/hook.md) to understand their design principles
|
|
and details.
|