Book/guide (#632)

This commit is contained in:
Louis Fortier-Dubois 2023-08-14 11:58:27 -04:00 committed by GitHub
parent 8770368892
commit aebd359e66
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
46 changed files with 825 additions and 197 deletions

View File

@ -1,5 +1,5 @@
[book]
authors = ["Wouter Doppenberg"]
authors = ["Wouter Doppenberg", "Nathaniel Simard", "Louis Fortier-Dubois"]
language = "en"
multilingual = false
src = "src"

View File

@ -1,33 +1,9 @@
[Overview](../../README.md)
[Motivation](./motivation.md)
- [Usage](./usage/README.md)
- [Installation](./usage/installation.md)
- [Quick Start](./usage/quick-start.md)
- [Concepts](./concepts/README.md)
- [Tensors](./concepts/tensors.md)
- [Modules](./concepts/modules.md)
- [Backend](./concepts/backend.md)
- [Configs](./concepts/configs.md)
- [Learner](./concepts/learner.md)
- [Datasets](./concepts/datasets.md)
- [Metrics](./concepts/metrics.md)
- [Losses](./concepts/losses.md)
- [Optimizers](./concepts/optimizers.md)
- [`no_std`](./concepts/no-std.md)
- [Advanced](./concepts/advanced/README.md)
- [Records](./concepts/advanced/records.md)
- [Macros](./concepts/advanced/macros.md)
- [Examples](./examples/README.md)
- [MNIST](./examples/mnist.md)
- [MNIST Inference on Web](./examples/mnist-inference-on-web.md)
- [Named Tensors](./examples/named-tensors.md)
- [ONNX Inference](./examples/onnx-inference.md)
- [Text Classification](./examples/text-classification.md)
- [Text Generation](./examples/text-generation.md)
[Help](./help.md)
[Contributing](./contributing.md)
- [Overview](./overview.md)
- [Why Burn?](./motivation.md)
- [Guide](./guide/README.md)
- [Model](./guide/model.md)
- [Data](./guide/data.md)
- [Training](./guide/training.md)
- [Backend](./guide/backend.md)
- [Inference](./guide/inference.md)
- [Conclusion](./guide/conclusion.md)

View File

@ -1 +0,0 @@
# Concepts

View File

@ -1 +0,0 @@
# Advanced

View File

@ -1 +0,0 @@
# Macros

View File

@ -1 +0,0 @@
# Records

View File

@ -1,9 +0,0 @@
# Backend
Nearly everything in `burn` is based on the `Backend` trait, which enables you to run tensor
operations using different implementations without having to modify your code. While a backend may
not necessarily have autodiff capabilities, the `ADBackend` trait specifies when autodiff is needed.
This trait not only abstracts operations but also tensor, device and element types, providing each
backend the flexibility they need. It's worth noting that the trait assumes eager mode since `burn`
fully supports dynamic graphs. However, we may create another API to assist with integrating
graph-based backends, without requiring any changes to the user's code.

View File

@ -1,29 +0,0 @@
# Configs
The `Config` derive lets you define serializable and deserializable configurations or
hyper-parameters for your [modules](#module) or any components.
```rust
use burn::config::Config;
#[derive(Config)]
pub struct PositionWiseFeedForwardConfig {
pub d_model: usize,
pub d_ff: usize,
#[config(default = 0.1)]
pub dropout: f64,
}
```
The `Derive` macro also adds useful methods to your config, such as a builder pattern.
```rust
fn main() {
let config = PositionWiseFeedForwardConfig::new(512, 2048);
println!("{}", config.d_model); // 512
println!("{}", config.d_ff); // 2048
println!("{}", config.dropout); // 0.1
let config = config.with_dropout(0.2);
println!("{}", config.dropout); // 0.2
}
```

View File

@ -1 +0,0 @@
# Datasets

View File

@ -1,31 +0,0 @@
# Learner
The `Learner` is the main `struct` that let you train a neural network with support for `logging`,
`metric`, `checkpointing` and more. In order to create a learner, you must use the `LearnerBuilder`.
```rust,ignore
use burn::train::LearnerBuilder;
use burn::train::metric::{AccuracyMetric, LossMetric};
use burn::record::DefaultRecordSettings;
fn main() {
let dataloader_train = ...;
let dataloader_valid = ...;
let model = ...;
let optim = ...;
let learner = LearnerBuilder::new("/tmp/artifact_dir")
.metric_train_plot(AccuracyMetric::new())
.metric_valid_plot(AccuracyMetric::new())
.metric_train(LossMetric::new())
.metric_valid(LossMetric::new())
.with_file_checkpointer::<DefaultRecordSettings>(2)
.num_epochs(10)
.build(model, optim);
let _model_trained = learner.fit(dataloader_train, dataloader_valid);
}
```
See this [example](https://github.com/burn-rs/burn/tree/main/examples/mnist) for a real usage.

View File

@ -1 +0,0 @@
# Losses

View File

@ -1 +0,0 @@
# Metrics

View File

@ -1 +0,0 @@
# Models

View File

@ -1,32 +0,0 @@
# Modules
The `Module` derive allows you to create your own neural network modules, similar to PyTorch. The
derive function only generates the necessary methods to essentially act as a parameter container for
your type, it makes no assumptions about how the forward pass is declared.
```rust
use burn::nn;
use burn::module::Module;
use burn::tensor::backend::Backend;
#[derive(Module, Debug)]
pub struct PositionWiseFeedForward<B: Backend> {
linear_inner: Linear<B>,
linear_outer: Linear<B>,
dropout: Dropout,
gelu: GELU,
}
impl<B: Backend> PositionWiseFeedForward<B> {
pub fn forward<const D: usize>(&self, input: Tensor<B, D>) -> Tensor<B, D> {
let x = self.linear_inner.forward(input);
let x = self.gelu.forward(x);
let x = self.dropout.forward(x);
self.linear_outer.forward(x)
}
}
```
Note that all fields declared in the struct must also implement the `Module` trait. The `Tensor`
struct doesn't implement `Module`, but `Param<Tensor<B, D>>` does.

View File

@ -1,15 +0,0 @@
## Support for `no_std`
Burn, including its `burn-ndarray` backend, can work in a `no_std` environment, provided `alloc` is
available for the inference mode. To accomplish this, simply turn off the default features in `burn`
and `burn-ndarray` (which is the minimum requirement for running the inference mode). You can find a
reference example in
[burn-no-std-tests](https://github.com/burn-rs/burn/tree/main/burn-no-std-tests).
The `burn-core` and `burn-tensor` crates also support `no_std` with `alloc`. These crates can be
directly added as dependencies if necessary, as they are reexported by the `burn` crate.
Please be aware that when using the `no_std` mode, a random seed will be generated at build time if
one hasn't been set using the `Backend::seed` method. Also, the
[spin::mutex::Mutex](https://docs.rs/spin/latest/spin/mutex/struct.Mutex.html) is used instead of
[std::sync::Mutex](https://doc.rust-lang.org/std/sync/struct.Mutex.html) in this mode.

View File

@ -1 +0,0 @@
# Optimizers

View File

@ -1,25 +0,0 @@
# Tensors
At the core of burn lies the `Tensor` struct, which encompasses multiple types of tensors, including
`Float`, `Int`, and `Bool`. The element types of these tensors are specified by the backend and are
usually designated as a generic argument (e.g., `NdArrayBackend<f32>`). Although the same struct is
used for all tensors, the available methods differ depending on the tensor kind. You can specify the
desired tensor kind by setting the third generic argument, which defaults to `Float`. The first
generic argument specifies the backend, while the second specifies the number of dimensions.
```rust
use burn::tensor::backend::Backend;
use burn::tensor::{Tensor, Int};
fn function<B: Backend>(tensor_float: Tensor<B, 2>) {
let _tensor_bool = tensor_float.clone().equal_elem(2.0); // Tensor<B, 2, Bool>
let _tensor_int = tensor_float.argmax(1); // Tensor<B, 2, Int>
}
```
As demonstrated in the previous example, nearly all operations require owned tensors as parameters,
which means that calling `Clone` explicitly is necessary when reusing the same tensor multiple
times. However, there's no need to worry since the tensor's data won't be copied, it will be flagged
as readonly when multiple tensors use the same allocated memory. This enables backends to reuse
tensor data when possible, similar to a copy-on-write pattern, while remaining completely
transparent to the user.

View File

@ -1 +0,0 @@
# Contributing

View File

@ -1 +0,0 @@
# Examples

View File

@ -1 +0,0 @@
# MNIST Inference on Web

View File

@ -1 +0,0 @@
# MNIST

View File

@ -1 +0,0 @@
# Named Tensors

View File

@ -1 +0,0 @@
# ONNX Inference

View File

@ -1 +0,0 @@
# Text Classification

View File

@ -1 +0,0 @@
# Text Generation

View File

@ -0,0 +1,15 @@
# Guide
This guide will walk you through the process of creating a custom model built with Burn.
We will train a simple convolutional neural network model on the MNIST dataset and prepare it for inference.
For clarity, we sometimes omit imports in our code snippets. For more details, please refer to the corresponding code in the `examples/guide` directory.
## Key Learnings
* Creating a project
* Creating neural network models
* Importing and preparing datasets
* Training models on data
* Choosing a backend
* Using a model for inference

View File

@ -0,0 +1,31 @@
# Backend
We have effectively written most of the necessary code for training our model. However, we have not explicitly designated the backend to be utilized at any point.
Indeed, only the `main` function remains.
```rust , ignore
use burn::optim::AdamConfig;
use guide::model::ModelConfig;
fn main() {
type MyBackend = burn_wgpu::WgpuBackend<burn_wgpu::AutoGraphicsApi, f32, i32>;
type MyAutodiffBackend = burn_autodiff::ADBackendDecorator<MyBackend>;
let device = burn_wgpu::WgpuDevice::default();
guide::training::train::<MyAutodiffBackend>(
"/tmp/guide",
guide::training::TrainingConfig::new(ModelConfig::new(10, 512), AdamConfig::new()),
device,
);
}
```
In this example, we use the `WgpuBackend` which is compatible with any operating system and will use the GPU. For other options, see the Burn README.
This backend type takes the graphics api, the float type and the int type as generic argument that will be used during the training. By leaving the graphics API as `burn_wgpu::AutoGraphicsApi`, it should automatically use an API available on your machine.
The autodiff backend is simply the same backend, wrapped within the `ADBackendDecorator` struct which imparts differentiability to any backend.
We call the `train` function defined earlier with a directory for artifacts, the configuration of the model (the number of digit classes is 10 and the hidden dimension is 512), the optimizer configuration which in our case will be the default Adam configuration, and the device which can be obtained from the backend.
When running the example, we can see the training progression through a basic CLI dashboard:
<img title="a title" alt="Alt text" src="./training-output.png">

View File

@ -0,0 +1,3 @@
# Conclusion
In this short guide, we've introduced you to the fundamental building blocks for getting started with Burn. While there's still plenty to explore, our goal has been to provide you with the essential knowledge to kickstart your productivity within the framework.

View File

@ -0,0 +1,71 @@
# Data
Typically, one trains a model on some dataset.
Burn provides a library of very useful dataset sources and transformations.
In particular, there are Hugging Face dataset utilities that allow to download and store data from Hugging Face into an SQLite database for extremely efficient data streaming and storage. For this guide, we will use the MNIST dataset provided by Hugging Face.
To iterate over a dataset efficiently, we will define a struct which will implement the `Batcher` trait. The goal of a batcher is to map individual dataset items into a batched tensor that can be used as input to our previously defined model.
```rust , ignore
use burn::{
data::{dataloader::batcher::Batcher, dataset::source::huggingface::MNISTItem},
tensor::{backend::Backend, Data, ElementConversion, Int, Tensor},
};
pub struct MNISTBatcher<B: Backend> {
device: B::Device,
}
impl<B: Backend> MNISTBatcher<B> {
pub fn new(device: B::Device) -> Self {
Self { device }
}
}
```
This codeblock defines a batcher struct with the device in which the tensor should be sent before being passed to the model.
Note that the device is an associative type of the `Backend` trait since not all backends expose the same devices.
As an example, the Libtorch-based backend exposes `Cuda(gpu_index)`, `Cpu`, `Vulkan` and `Metal` devices, while the ndarray backend only exposes the `Cpu` device.
Next, we need to actually implement the batching logic.
```rust , ignore
#[derive(Clone, Debug)]
pub struct MNISTBatch<B: Backend> {
pub images: Tensor<B, 3>,
pub targets: Tensor<B, 1, Int>,
}
impl<B: Backend> Batcher<MNISTItem, MNISTBatch<B>> for MNISTBatcher<B> {
fn batch(&self, items: Vec<MNISTItem>) -> MNISTBatch<B> {
let images = items
.iter()
.map(|item| Data::<f32, 2>::from(item.image))
.map(|data| Tensor::<B, 2>::from_data(data.convert()))
.map(|tensor| tensor.reshape([1, 28, 28]))
// Normalize: make between [0,1] and make the mean=0 and std=1
// values mean=0.1307,std=0.3081 are from the PyTorch MNIST example
// https://github.com/pytorch/examples/blob/54f4572509891883a947411fd7239237dd2a39c3/mnist/main.py#L122
.map(|tensor| ((tensor / 255) - 0.1307) / 0.3081)
.collect();
let targets = items
.iter()
.map(|item| Tensor::<B, 1, Int>::from_data(Data::from([(item.label as i64).elem()])))
.collect();
let images = Tensor::cat(images, 0).to_device(&self.device);
let targets = Tensor::cat(targets, 0).to_device(&self.device);
MNISTBatch { images, targets }
}
}
```
In the previous example, we implement the `Batcher` trait with a list of `MNISTItem` as input and a single `MNISTBatch` as output.
The batch contains the images in the form of a 3D tensor, along with a targets tensor that contains the indexes of the correct digit class.
The first step is to parse the image array into a `Data` struct.
Burn provides the `Data` struct to encapsulate tensor storage information without being specific for a backend.
When creating a tensor from data, we often need to convert the data precision to the current backend in use.
This can be done with the `.convert()` method. While importing the `burn::tensor::ElementConversion` trait, you can call `.elem()` on a specific number to convert it to the current backend element type in use.

View File

@ -0,0 +1,56 @@
# Inference
Now that we have trained our model, the next natural step is to use it for inference.
For loading a model primed for inference, it is of course more efficient to directly load the weights into the model, bypassing the need to initially set arbitrary weights —or worse, weights computed from a Xavier normal initialization— only to then promptly replace them with the stored weights.
With that in mind, let's create a new initialization function receiving the record as input.
```rust , ignore
impl ModelConfig {
/// Returns the initialized model using the recorded weights.
pub fn init_with<B: Backend>(&self, record: ModelRecord<B>) -> Model<B> {
Model {
conv1: Conv2dConfig::new([1, 8], [3, 3]).init_with(record.conv1),
conv2: Conv2dConfig::new([8, 16], [3, 3]).init_with(record.conv2),
pool: AdaptiveAvgPool2dConfig::new([8, 8]).init(),
activation: ReLU::new(),
linear1: LinearConfig::new(16 * 8 * 8, self.hidden_size).init_with(record.linear1),
linear2: LinearConfig::new(self.hidden_size, self.num_classes)
.init_with(record.linear2),
dropout: DropoutConfig::new(self.dropout).init(),
}
}
}
```
It is important to note that the `ModelRecord` was automatically generated thanks to the `Module` trait. It allows us to load the module state without having to deal with fetching the correct type manually.
Everything is validated when loading the model with the record.
Now let's create a simple `infer` method in which we will load our trained model.
```rust , ignore
pub fn infer<B: Backend>(artifact_dir: &str, device: B::Device, item: MNISTItem) {
let config =
TrainingConfig::load(&format!("{artifact_dir}/config.json")).expect("A config exists");
let record = CompactRecorder::new()
.load(format!("{artifact_dir}/model").into())
.expect("Failed to save trained model");
let model = config.model.init_with::<B>(record).to_device(&device);
let label = item.label;
let batcher = MNISTBatcher::new(device);
let batch = batcher.batch(vec![item]);
let output = model.forward(batch.images);
let predicted = output.argmax(1).flatten::<1>(0, 1).into_scalar();
println!("Predicted {} Expected {}", predicted, label);
}
```
The first step is to load the configuration of the training to fetch the correct model configuration.
Then we can fetch the record using the same recorder as we used during training.
Finally we can init the model with the configuration and the record before sending it to the wanted device for inference.
For simplicity we can use the same batcher used during the training to pass from a MNISTItem to a tensor.
By running the infer function, you should see the predictions of your model!

View File

@ -0,0 +1,150 @@
# Model
The first step is to create a project and add the different Burn dependencies.
In a `Cargo.toml` file, add the `burn`, `burn-wgpu`, `burn-dataset`, `burn-autodiff` and `burn-train`.
Note that the `serde` dependency is necessary for serialization and is mandatory for the time being.
```toml
[package]
edition = "2021"
name = "My first Burn model"
[dependencies]
burn = "0.8"
burn-wgpu = "0.8"
burn-dataset = "0.8"
burn-autodiff = "0.8"
burn-train = "0.8"
# Serialization
serde = "1"
```
Our goal will be to create a basic convolutional neural network used for image classification. We will keep the model simple by using two convolution layers followed by two linear layers, some pooling and ReLU activations.
We will also use dropout to improve training performance.
Let us start by creating a model in a file `model.rs`.
```rust , ignore
// Import required for the model.rs file
use burn::{
config::Config,
module::Module,
nn::{
conv::{Conv2d, Conv2dConfig},
pool::{AdaptiveAvgPool2d, AdaptiveAvgPool2dConfig},
Dropout, DropoutConfig, Linear, LinearConfig, ReLU,
},
tensor::{backend::Backend, Tensor},
};
#[derive(Module, Debug)]
pub struct Model<B: Backend> {
conv1: Conv2d<B>,
conv2: Conv2d<B>,
pool: AdaptiveAvgPool2d,
dropout: Dropout,
linear1: Linear<B>,
linear2: Linear<B>,
activation: ReLU,
}
```
There are two major things going on in this code sample.
1. You can create a deep learning module with the `#[derive(Module)]` attribute on top of a struct.
This will generate the necessary code so that the struct implements the `Module` trait.
This trait will make your module both trainable and (de)serializable while adding related functionalities.
Like other attributes often used in Rust, such as `Clone`, `PartialEq` or `Debug`, each field within the struct must also implement the `Module` trait.
2. Note that the struct is generic over the `Backend` trait.
The backend trait abstracts the underlying low level implementations of tensor operations, allowing your new model to run on any backend.
Contrary to other frameworks, the backend abstraction isn't determined by a compilation flag or a device type.
This is important because you can extend the functionalities of a specific backend (which will be covered in the more advanced sections of this book), and it allows for an innovative autodiff system.
You can also change backend during runtime, for instance to compute training metrics on a cpu backend while using a gpu one only to train the model.
In our example, the backend in use will be determined later on.
Next, we need to instantiate the model for training.
```rust , ignore
#[derive(Config, Debug)]
pub struct ModelConfig {
num_classes: usize,
hidden_size: usize,
#[config(default = "0.5")]
dropout: f64,
}
impl ModelConfig {
/// Returns the initialized model.
pub fn init<B: Backend>(&self) -> Model<B> {
Model {
conv1: Conv2dConfig::new([1, 8], [3, 3]).init(),
conv2: Conv2dConfig::new([8, 16], [3, 3]).init(),
pool: AdaptiveAvgPool2dConfig::new([8, 8]).init(),
activation: ReLU::new(),
linear1: LinearConfig::new(16 * 8 * 8, self.hidden_size).init(),
linear2: LinearConfig::new(self.hidden_size, self.num_classes).init(),
dropout: DropoutConfig::new(self.dropout).init(),
}
}
}
```
When creating a custom neural network module, it is often a good idea to create a config alongside the model struct.
This allows you to define default values for your network, thanks to the `Config` attribute.
The benefit of this attribute is that it makes the configuration serializable, enabling you to painlessly save your model hyperparameters, enhancing your experimentation process.
Note that a constructor will automatically be generated for your configuration, which will take as input values for the parameter which do not have default values: `let config = ModelConfig::new(num_classes, hidden_size);`.
The default values can be overridden easily with builder-like methods: (e.g `config.with_dropout(0.2);`)
The first implementation block is related to the initialization method.
As we can see, all fields are set using the configuration of the corresponding neural network underlying module.
In this specific case, we have chosen to expand the tensor channels from 1 to 8 with the first layer, then from 8 to 16 with the second layer, using a kernel size of 3 on all dimensions.
We also use the adaptive average pooling module to reduce the dimensionality of the images to an 8 by 8 matrix, which we will flatten in the forward pass to have a 1024 (16 * 8 * 8) resulting tensor.
Now let's see how the forward pass is defined.
```rust , ignore
impl<B: Backend> Model<B> {
/// # Shapes
/// - Images [batch_size, height, width]
/// - Output [batch_size, num_classes]
pub fn forward(&self, images: Tensor<B, 3>) -> Tensor<B, 2> {
let [batch_size, height, width] = images.dims();
// Create a channel at the second dimension.
let x = images.reshape([batch_size, 1, height, width]);
let x = self.conv1.forward(x); // [batch_size, 8, _, _]
let x = self.dropout.forward(x);
let x = self.conv2.forward(x); // [batch_size, 16, _, _]
let x = self.dropout.forward(x);
let x = self.activation.forward(x);
let x = self.pool.forward(x); // [batch_size, 16, 8, 8]
let x = x.reshape([batch_size, 16 * 8 * 8]);
let x = self.linear1.forward(x);
let x = self.dropout.forward(x);
let x = self.activation.forward(x);
self.linear2.forward(x) // [batch_size, num_classes]
}
}
```
For former PyTorch users, this might feel very intuitive, as each module is directly incorporated into the code using an eager API.
Note that no abstraction is imposed for the forward method. You are free to define multiple forward functions with the names of your liking.
Most of the neural network modules already built with Burn use the `forward` nomenclature, simply because it is standard in the field.
Similar to neural network modules, the `Tensor` struct given as a parameter also takes the Backend trait as a generic argument, alongside its rank.
Even if it is not used in this specific example, it is possible to add the kind of the tensor as a third generic argument.
```rust , ignore
Tensor<B, 3> // Float tensor (default)
Tensor<B, 3, Float> // Float tensor (explicit)
Tensor<B, 3, Int> // Int tensor
Tensor<B, 3, Bool> // Bool tensor
```
Note that the specific element type, such as `f16`, `f32` and the likes, will be defined later with the backend.

Binary file not shown.

After

Width:  |  Height:  |  Size: 52 KiB

View File

@ -0,0 +1,147 @@
# Training
We are now ready to write the necessary code to train our model on the MNIST dataset.
Instead of a simple tensor, the model should output an item that can be understood by the learner, a struct whose responsibility is to apply an optimizer to the model.
The output struct is used for all metrics calculated during the training.
Therefore it should include all the necessary information to calculate any metric that you want for a task.
Burn provides two basic output types: `ClassificationOutput` and `RegressionOutput`. They implement the necessary trait to be used with metrics. It is possible to create your own item, but it is beyond the scope of this guide.
Since the MNIST task is a classification problem, we will use the `ClassificationOutput` type.
```rust , ignore
impl<B: Backend> Model<B> {
pub fn forward_classification(
&self,
images: Tensor<B, 3>,
targets: Tensor<B, 1, Int>,
) -> ClassificationOutput<B> {
let output = self.forward(images);
let loss = CrossEntropyLoss::new(None).forward(output.clone(), targets.clone());
ClassificationOutput::new(loss, output, targets)
}
}
```
As evident from the preceding code block, we employ the cross-entropy loss module for loss calculation, without the inclusion of any padding token.
We then return the classification output containing the loss, the output tensor with all logits and the targets.
Please take note that tensor operations receive owned tensors as input. For reusing a tensor multiple times, you need to use the clone function. There's no need to worry; this process won't involve actual copying of the tensor data. Instead, it will simply indicate that the tensor is employed in multiple instances, implying that certain operations won't be performed in place. In summary, our API has been designed with owned tensors to optimize performance.
Moving forward, we will proceed with the implementation of both the training and validation steps for our model.
```rust , ignore
impl<B: ADBackend> TrainStep<MNISTBatch<B>, ClassificationOutput<B>> for Model<B> {
fn step(&self, batch: MNISTBatch<B>) -> TrainOutput<ClassificationOutput<B>> {
let item = self.forward_classification(batch.images, batch.targets);
TrainOutput::new(self, item.loss.backward(), item)
}
}
impl<B: Backend> ValidStep<MNISTBatch<B>, ClassificationOutput<B>> for Model<B> {
fn step(&self, batch: MNISTBatch<B>) -> ClassificationOutput<B> {
self.forward_classification(batch.images, batch.targets)
}
}
```
Here we define the input and output types as generic arguments in the `TrainStep` and `ValidStep`. We will call them `MNISTBatch` and `ClassificationOutput`.
In the training step, the computation of gradients is straightforward, necessitating a simple invocation of `backward()` on the loss.
Note that contrary to PyTorch, gradients are not store alongside each tensor parameter, but are rather returned by the backward pass, as such: `let gradients = loss.backward();`.
The gradient of a parameter can be obtained with the grad function: `let grad = tensor.grad(&gradients);`. Although it is not necessary when using the learner struct and the optimizers, it can prove to be quite useful when debugging or writing custom training loops.
One of the differences between the training and the validation steps is that the former requires the backend to implement `ADBackend` and not just `Backend`.
Otherwise, the `backward` function is not available, as the backend does not support autodiff.
We will see later how to create a backend with autodiff support.
Let us move on to establishing the practical training configuration.
```rust , ignore
#[derive(Config)]
pub struct TrainingConfig {
pub model: ModelConfig,
pub optimizer: AdamConfig,
#[config(default = 10)]
pub num_epochs: usize,
#[config(default = 64)]
pub batch_size: usize,
#[config(default = 4)]
pub num_workers: usize,
#[config(default = 42)]
pub seed: u64,
#[config(default = 1.0e-4)]
pub learning_rate: f64,
}
pub fn train<B: ADBackend>(artifact_dir: &str, config: TrainingConfig, device: B::Device) {
std::fs::create_dir_all(artifact_dir).ok();
config
.save(&format!("{artifact_dir}/config.json"))
.expect("Save without error");
B::seed(config.seed);
let batcher_train = MNISTBatcher::<B>::new(device.clone());
let batcher_valid = MNISTBatcher::<B::InnerBackend>::new(device.clone());
let dataloader_train = DataLoaderBuilder::new(batcher_train)
.batch_size(config.batch_size)
.shuffle(config.seed)
.num_workers(config.num_workers)
.build(MNISTDataset::train());
let dataloader_test = DataLoaderBuilder::new(batcher_valid)
.batch_size(config.batch_size)
.shuffle(config.seed)
.num_workers(config.num_workers)
.build(MNISTDataset::test());
let learner = LearnerBuilder::new(artifact_dir)
.metric_train_plot(AccuracyMetric::new())
.metric_valid_plot(AccuracyMetric::new())
.metric_train_plot(LossMetric::new())
.metric_valid_plot(LossMetric::new())
.with_file_checkpointer(1, CompactRecorder::new())
.devices(vec![device])
.num_epochs(config.num_epochs)
.build(
config.model.init::<B>(),
config.optimizer.init(),
config.learning_rate,
);
let model_trained = learner.fit(dataloader_train, dataloader_test);
CompactRecorder::new()
.record(
model_trained.into_record(),
format!("{artifact_dir}/model").into(),
)
.expect("Failed to save trained model");
}
```
It is a good practice to use the `Config` derive to create the experiment configuration.
In the `train` function, the first thing we are doing is making sure the `artifact_dir` exists, using the standard rust library for file manipulation.
All checkpoints, logging and metrics will be stored under the this directory.
We then initialize our dataloaders using our previously created batcher.
Since no automatic differentiation is needed during the validation phase, the backend used for the corresponding batcher is `B::InnerBackend` (see [Backend](./backend.md)).
The autodiff capabilities are available through a type system, making it nearly impossible to forget to deactivate gradient calculation.
Next, we create our learner with the accuracy and loss metric on both training and validation steps along with the device and the epoch.
We also configure the checkpointer using the `CompactRecorder` to indicate how weights should be stored.
This struct implements the `Recorder` trait, which makes it capable of saving records for persistency.
We then build the learner with the model, the optimizer and the learning rate.
Notably, the third argument of the build function should actually be a learning rate _scheduler_. When provided with a float as in our example, it is automatically transformed into a _constant_ learning rate scheduler.
The learning rate is not part of the optimizer config as it is often done in other frameworks, but rather passed as a parameter when executing the optimizer step.
This avoids having to mutate the state of the optimizer and is therefore more functional.
It makes no difference when using the learner struct, but it will be an essential nuance to grasp if you implement your own training loop.
Once the learner is created, we can simply call `fit` and provide the training and validation dataloaders.
For the sake of simplicity in this example, we employ the test set as the validation set; however, we do not recommend this practice for actual usage.
Finally, the trained model is returned by the `fit` method, and the only remaining task is saving the trained weights using the `CompactRecorder`.
This recorder employs the `MessagePack` format with `gzip` compression, `f16` for floats and `i32` for integers. Other recorders are available, offering support for various formats, such as `BinCode` and `JSON`, with or without compression. Any backend, regardless of precision, can load recorded data of any kind.

View File

@ -1 +0,0 @@
# Help

View File

@ -1 +1,23 @@
# Motivation
# Why Burn?
Why bother with the effort of creating an entirely new deep learning framework from scratch when PyTorch, TensorFlow, and other frameworks already exist?
Spoiler alert: Burn isn't merely a replication of PyTorch or TensorFlow in Rust.
It represents a novel approach, placing significant emphasis on making the right compromises in the right areas to facilitate exceptional flexibility, high performance, and a seamless developer experience.
Burn isnt a framework specialized for only one type of application, it is designed to serve as a versatile framework suitable for a wide range of research and production uses.
The foundation of Burn's design revolves around three key user profiles:
**Machine Learning Researchers** require tools to construct and execute experiments efficiently.
Its essential for them to iterate quickly on their ideas and design testable experiments which can help them discover new findings.
The framework should facilitate the swift implementation of cutting-edge research while ensuring fast execution for testing.
**Machine Learning Engineers** are another important demographic to keep in mind.
Their focus leans less on swift implementation and more on establishing robustness, seamless deployment, and cost-effective operations.
They seek dependable, economical models capable of achieving objectives without excessive expense.
The whole machine learning workflow —from training to inference— must be as efficient as possible with minimal unpredictable behavior.
**Low level Software Engineers** working with hardware vendors want their processing units to run models as fast as possible to gain competitive advantage.
This endeavor involves harnessing hardware-specific features such as Tensor Core for Nvidia.
Since they are mostly working at a system level, they want to have absolute control over how the computation will be executed.
The goal of Burn is to satisfy all of those personas!

View File

@ -0,0 +1,7 @@
# Overview
This book will help you get started with the Burn deep learning framework.
Although it is still in progress, it is planned to be divided into multiple sections, catering to complete beginners and advanced users. For the time being, we offer an introductory guide in which we perform training and inference on a simple model, while highlighting certain peculiarities of Burn that set it apart from other frameworks.
Throughout the book, we assume a basic understanding of Rust and deep learning concepts.

View File

@ -1 +0,0 @@
# Usage

View File

@ -1 +0,0 @@
# Installation

View File

@ -1 +0,0 @@
# Quick Start

18
examples/guide/Cargo.toml Normal file
View File

@ -0,0 +1,18 @@
[package]
authors = ["nathanielsimard <nathaniel.simard.42@gmail.com>"]
edition = "2021"
license = "MIT OR Apache-2.0"
name = "guide"
publish = false
version = "0.9.0"
[dependencies]
burn = {path = "../../burn"}
burn-autodiff = {path = "../../burn-autodiff"}
burn-wgpu = {path = "../../burn-wgpu"}
burn-train = {path = "../../burn-train"}
burn-dataset = {path = "../../burn-dataset"}
# Serialization
log = {workspace = true}
serde = {workspace = true, features = ["std", "derive"]}

View File

@ -0,0 +1,23 @@
use burn::optim::AdamConfig;
use burn_dataset::Dataset;
use guide::{model::ModelConfig, training::TrainingConfig};
fn main() {
type MyBackend = burn_wgpu::WgpuBackend<burn_wgpu::AutoGraphicsApi, f32, i32>;
type MyAutodiffBackend = burn_autodiff::ADBackendDecorator<MyBackend>;
let device = burn_wgpu::WgpuDevice::default();
let artifact_dir = "/tmp/guide";
guide::training::train::<MyAutodiffBackend>(
artifact_dir,
TrainingConfig::new(ModelConfig::new(10, 512), AdamConfig::new()),
device.clone(),
);
guide::inference::infer::<MyBackend>(
artifact_dir,
device,
burn_dataset::source::huggingface::MNISTDataset::test()
.get(42)
.unwrap(),
);
}

View File

@ -0,0 +1,45 @@
use burn::{
data::{dataloader::batcher::Batcher, dataset::source::huggingface::MNISTItem},
tensor::{backend::Backend, Data, ElementConversion, Int, Tensor},
};
pub struct MNISTBatcher<B: Backend> {
device: B::Device,
}
impl<B: Backend> MNISTBatcher<B> {
pub fn new(device: B::Device) -> Self {
Self { device }
}
}
#[derive(Clone, Debug)]
pub struct MNISTBatch<B: Backend> {
pub images: Tensor<B, 3>,
pub targets: Tensor<B, 1, Int>,
}
impl<B: Backend> Batcher<MNISTItem, MNISTBatch<B>> for MNISTBatcher<B> {
fn batch(&self, items: Vec<MNISTItem>) -> MNISTBatch<B> {
let images = items
.iter()
.map(|item| Data::<f32, 2>::from(item.image))
.map(|data| Tensor::<B, 2>::from_data(data.convert()))
.map(|tensor| tensor.reshape([1, 28, 28]))
// normalize: make between [0,1] and make the mean = 0 and std = 1
// values mean=0.1307,std=0.3081 were copied from Pytorch Mist Example
// https://github.com/pytorch/examples/blob/54f4572509891883a947411fd7239237dd2a39c3/mnist/main.py#L122
.map(|tensor| ((tensor / 255) - 0.1307) / 0.3081)
.collect();
let targets = items
.iter()
.map(|item| Tensor::<B, 1, Int>::from_data([(item.label as i64).elem()]))
.collect();
let images = Tensor::cat(images, 0).to_device(&self.device);
let targets = Tensor::cat(targets, 0).to_device(&self.device);
MNISTBatch { images, targets }
}
}

View File

@ -0,0 +1,27 @@
use crate::{data::MNISTBatcher, training::TrainingConfig};
use burn::{
config::Config,
data::dataloader::batcher::Batcher,
module::Module,
record::{CompactRecorder, Recorder},
tensor::backend::Backend,
};
use burn_dataset::source::huggingface::MNISTItem;
pub fn infer<B: Backend>(artifact_dir: &str, device: B::Device, item: MNISTItem) {
let config =
TrainingConfig::load(&format!("{artifact_dir}/config.json")).expect("A config exists");
let record = CompactRecorder::new()
.load(format!("{artifact_dir}/model").into())
.expect("Failed to save trained model");
let model = config.model.init_with::<B>(record).to_device(&device);
let label = item.label;
let batcher = MNISTBatcher::new(device);
let batch = batcher.batch(vec![item]);
let output = model.forward(batch.images);
let predicted = output.argmax(1).flatten::<1>(0, 1).into_scalar();
println!("Predicted {} Expected {}", predicted, label);
}

View File

@ -0,0 +1,4 @@
pub mod data;
pub mod inference;
pub mod model;
pub mod training;

View File

@ -0,0 +1,83 @@
use burn::{
config::Config,
module::Module,
nn::{
conv::{Conv2d, Conv2dConfig},
pool::{AdaptiveAvgPool2d, AdaptiveAvgPool2dConfig},
Dropout, DropoutConfig, Linear, LinearConfig, ReLU,
},
tensor::{backend::Backend, Tensor},
};
#[derive(Module, Debug)]
pub struct Model<B: Backend> {
conv1: Conv2d<B>,
conv2: Conv2d<B>,
pool: AdaptiveAvgPool2d,
dropout: Dropout,
linear1: Linear<B>,
linear2: Linear<B>,
activation: ReLU,
}
#[derive(Config, Debug)]
pub struct ModelConfig {
num_classes: usize,
hidden_size: usize,
#[config(default = "0.5")]
dropout: f64,
}
impl ModelConfig {
/// Returns the initialized model.
pub fn init<B: Backend>(&self) -> Model<B> {
Model {
conv1: Conv2dConfig::new([1, 8], [3, 3]).init(),
conv2: Conv2dConfig::new([8, 16], [3, 3]).init(),
pool: AdaptiveAvgPool2dConfig::new([8, 8]).init(),
activation: ReLU::new(),
linear1: LinearConfig::new(16 * 8 * 8, self.hidden_size).init(),
linear2: LinearConfig::new(self.hidden_size, self.num_classes).init(),
dropout: DropoutConfig::new(self.dropout).init(),
}
}
/// Returns the initialized model using the recorded weights.
pub fn init_with<B: Backend>(&self, record: ModelRecord<B>) -> Model<B> {
Model {
conv1: Conv2dConfig::new([1, 8], [3, 3]).init_with(record.conv1),
conv2: Conv2dConfig::new([8, 16], [3, 3]).init_with(record.conv2),
pool: AdaptiveAvgPool2dConfig::new([8, 8]).init(),
activation: ReLU::new(),
linear1: LinearConfig::new(16 * 8 * 8, self.hidden_size).init_with(record.linear1),
linear2: LinearConfig::new(self.hidden_size, self.num_classes)
.init_with(record.linear2),
dropout: DropoutConfig::new(self.dropout).init(),
}
}
}
impl<B: Backend> Model<B> {
/// # Shapes
/// - Images [batch_size, height, width]
/// - Output [batch_size, class_prob]
pub fn forward(&self, images: Tensor<B, 3>) -> Tensor<B, 2> {
let [batch_size, height, width] = images.dims();
// Create a channel.
let x = images.reshape([batch_size, 1, height, width]);
let x = self.conv1.forward(x); // [batch_size, 8, _, _]
let x = self.dropout.forward(x);
let x = self.conv2.forward(x); // [batch_size, 16, _, _]
let x = self.dropout.forward(x);
let x = self.activation.forward(x);
let x = self.pool.forward(x); // [batch_size, 16, 8, 8]
let x = x.reshape([batch_size, 16 * 8 * 8]);
let x = self.linear1.forward(x);
let x = self.dropout.forward(x);
let x = self.activation.forward(x);
self.linear2.forward(x) // [batch_size, num_classes]
}
}

View File

@ -0,0 +1,112 @@
use crate::{
data::{MNISTBatch, MNISTBatcher},
model::{Model, ModelConfig},
};
use burn::{
self,
config::Config,
data::dataloader::DataLoaderBuilder,
module::Module,
nn::loss::CrossEntropyLoss,
optim::AdamConfig,
record::{CompactRecorder, Recorder},
tensor::{
backend::{ADBackend, Backend},
Int, Tensor,
},
};
use burn_dataset::source::huggingface::MNISTDataset;
use burn_train::{
metric::{AccuracyMetric, LossMetric},
ClassificationOutput, LearnerBuilder, TrainOutput, TrainStep, ValidStep,
};
impl<B: Backend> Model<B> {
pub fn forward_classification(
&self,
images: Tensor<B, 3>,
targets: Tensor<B, 1, Int>,
) -> ClassificationOutput<B> {
let output = self.forward(images);
let loss = CrossEntropyLoss::new(None).forward(output.clone(), targets.clone());
ClassificationOutput::new(loss, output, targets)
}
}
impl<B: ADBackend> TrainStep<MNISTBatch<B>, ClassificationOutput<B>> for Model<B> {
fn step(&self, batch: MNISTBatch<B>) -> TrainOutput<ClassificationOutput<B>> {
let item = self.forward_classification(batch.images, batch.targets);
TrainOutput::new(self, item.loss.backward(), item)
}
}
impl<B: Backend> ValidStep<MNISTBatch<B>, ClassificationOutput<B>> for Model<B> {
fn step(&self, batch: MNISTBatch<B>) -> ClassificationOutput<B> {
self.forward_classification(batch.images, batch.targets)
}
}
#[derive(Config)]
pub struct TrainingConfig {
pub model: ModelConfig,
pub optimizer: AdamConfig,
#[config(default = 10)]
pub num_epochs: usize,
#[config(default = 64)]
pub batch_size: usize,
#[config(default = 4)]
pub num_workers: usize,
#[config(default = 42)]
pub seed: u64,
#[config(default = 1.0e-4)]
pub learning_rate: f64,
}
pub fn train<B: ADBackend>(artifact_dir: &str, config: TrainingConfig, device: B::Device) {
std::fs::create_dir_all(artifact_dir).ok();
config
.save(&format!("{artifact_dir}/config.json"))
.expect("Save without error");
B::seed(config.seed);
let batcher_train = MNISTBatcher::<B>::new(device.clone());
let batcher_valid = MNISTBatcher::<B::InnerBackend>::new(device.clone());
let dataloader_train = DataLoaderBuilder::new(batcher_train)
.batch_size(config.batch_size)
.shuffle(config.seed)
.num_workers(config.num_workers)
.build(MNISTDataset::train());
let dataloader_test = DataLoaderBuilder::new(batcher_valid)
.batch_size(config.batch_size)
.shuffle(config.seed)
.num_workers(config.num_workers)
.build(MNISTDataset::test());
let learner = LearnerBuilder::new(artifact_dir)
.metric_train_plot(AccuracyMetric::new())
.metric_valid_plot(AccuracyMetric::new())
.metric_train_plot(LossMetric::new())
.metric_valid_plot(LossMetric::new())
.with_file_checkpointer(1, CompactRecorder::new())
.devices(vec![device])
.num_epochs(config.num_epochs)
.build(
config.model.init::<B>(),
config.optimizer.init(),
config.learning_rate,
);
let model_trained = learner.fit(dataloader_train, dataloader_test);
CompactRecorder::new()
.record(
model_trained.into_record(),
format!("{artifact_dir}/model").into(),
)
.expect("Failed to save trained model");
}