499 lines
15 KiB
Markdown
499 lines
15 KiB
Markdown
<!--Copyright 2020 The HuggingFace Team. All rights reserved.
|
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
|
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
|
|
specific language governing permissions and limitations under the License.
|
|
|
|
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
|
|
rendered properly in your Markdown viewer.
|
|
|
|
-->
|
|
|
|
# Pipelines
|
|
|
|
The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of
|
|
the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity
|
|
Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the
|
|
[task summary](../task_summary) for examples of use.
|
|
|
|
There are two categories of pipeline abstractions to be aware about:
|
|
|
|
- The [`pipeline`] which is the most powerful object encapsulating all other pipelines.
|
|
- Task-specific pipelines are available for [audio](#audio), [computer vision](#computer-vision), [natural language processing](#natural-language-processing), and [multimodal](#multimodal) tasks.
|
|
|
|
## The pipeline abstraction
|
|
|
|
The *pipeline* abstraction is a wrapper around all the other available pipelines. It is instantiated as any other
|
|
pipeline but can provide additional quality of life.
|
|
|
|
Simple call on one item:
|
|
|
|
```python
|
|
>>> pipe = pipeline("text-classification")
|
|
>>> pipe("This restaurant is awesome")
|
|
[{'label': 'POSITIVE', 'score': 0.9998743534088135}]
|
|
```
|
|
|
|
If you want to use a specific model from the [hub](https://huggingface.co) you can ignore the task if the model on
|
|
the hub already defines it:
|
|
|
|
```python
|
|
>>> pipe = pipeline(model="FacebookAI/roberta-large-mnli")
|
|
>>> pipe("This restaurant is awesome")
|
|
[{'label': 'NEUTRAL', 'score': 0.7313136458396912}]
|
|
```
|
|
|
|
To call a pipeline on many items, you can call it with a *list*.
|
|
|
|
```python
|
|
>>> pipe = pipeline("text-classification")
|
|
>>> pipe(["This restaurant is awesome", "This restaurant is awful"])
|
|
[{'label': 'POSITIVE', 'score': 0.9998743534088135},
|
|
{'label': 'NEGATIVE', 'score': 0.9996669292449951}]
|
|
```
|
|
|
|
To iterate over full datasets it is recommended to use a `dataset` directly. This means you don't need to allocate
|
|
the whole dataset at once, nor do you need to do batching yourself. This should work just as fast as custom loops on
|
|
GPU. If it doesn't don't hesitate to create an issue.
|
|
|
|
```python
|
|
import datasets
|
|
from transformers import pipeline
|
|
from transformers.pipelines.pt_utils import KeyDataset
|
|
from tqdm.auto import tqdm
|
|
|
|
pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
|
|
dataset = datasets.load_dataset("superb", name="asr", split="test")
|
|
|
|
# KeyDataset (only *pt*) will simply return the item in the dict returned by the dataset item
|
|
# as we're not interested in the *target* part of the dataset. For sentence pair use KeyPairDataset
|
|
for out in tqdm(pipe(KeyDataset(dataset, "file"))):
|
|
print(out)
|
|
# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
|
|
# {"text": ....}
|
|
# ....
|
|
```
|
|
|
|
For ease of use, a generator is also possible:
|
|
|
|
|
|
```python
|
|
from transformers import pipeline
|
|
|
|
pipe = pipeline("text-classification")
|
|
|
|
|
|
def data():
|
|
while True:
|
|
# This could come from a dataset, a database, a queue or HTTP request
|
|
# in a server
|
|
# Caveat: because this is iterative, you cannot use `num_workers > 1` variable
|
|
# to use multiple threads to preprocess data. You can still have 1 thread that
|
|
# does the preprocessing while the main runs the big inference
|
|
yield "This is a test"
|
|
|
|
|
|
for out in pipe(data()):
|
|
print(out)
|
|
# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
|
|
# {"text": ....}
|
|
# ....
|
|
```
|
|
|
|
[[autodoc]] pipeline
|
|
|
|
## Pipeline batching
|
|
|
|
All pipelines can use batching. This will work
|
|
whenever the pipeline uses its streaming ability (so when passing lists or `Dataset` or `generator`).
|
|
|
|
```python
|
|
from transformers import pipeline
|
|
from transformers.pipelines.pt_utils import KeyDataset
|
|
import datasets
|
|
|
|
dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised")
|
|
pipe = pipeline("text-classification", device=0)
|
|
for out in pipe(KeyDataset(dataset, "text"), batch_size=8, truncation="only_first"):
|
|
print(out)
|
|
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]
|
|
# Exactly the same output as before, but the content are passed
|
|
# as batches to the model
|
|
```
|
|
|
|
<Tip warning={true}>
|
|
|
|
However, this is not automatically a win for performance. It can be either a 10x speedup or 5x slowdown depending
|
|
on hardware, data and the actual model being used.
|
|
|
|
Example where it's mostly a speedup:
|
|
|
|
</Tip>
|
|
|
|
```python
|
|
from transformers import pipeline
|
|
from torch.utils.data import Dataset
|
|
from tqdm.auto import tqdm
|
|
|
|
pipe = pipeline("text-classification", device=0)
|
|
|
|
|
|
class MyDataset(Dataset):
|
|
def __len__(self):
|
|
return 5000
|
|
|
|
def __getitem__(self, i):
|
|
return "This is a test"
|
|
|
|
|
|
dataset = MyDataset()
|
|
|
|
for batch_size in [1, 8, 64, 256]:
|
|
print("-" * 30)
|
|
print(f"Streaming batch_size={batch_size}")
|
|
for out in tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)):
|
|
pass
|
|
```
|
|
|
|
```
|
|
# On GTX 970
|
|
------------------------------
|
|
Streaming no batching
|
|
100%|██████████████████████████████████████████████████████████████████████| 5000/5000 [00:26<00:00, 187.52it/s]
|
|
------------------------------
|
|
Streaming batch_size=8
|
|
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:04<00:00, 1205.95it/s]
|
|
------------------------------
|
|
Streaming batch_size=64
|
|
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:02<00:00, 2478.24it/s]
|
|
------------------------------
|
|
Streaming batch_size=256
|
|
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 2554.43it/s]
|
|
(diminishing returns, saturated the GPU)
|
|
```
|
|
|
|
Example where it's most a slowdown:
|
|
|
|
```python
|
|
class MyDataset(Dataset):
|
|
def __len__(self):
|
|
return 5000
|
|
|
|
def __getitem__(self, i):
|
|
if i % 64 == 0:
|
|
n = 100
|
|
else:
|
|
n = 1
|
|
return "This is a test" * n
|
|
```
|
|
|
|
This is a occasional very long sentence compared to the other. In that case, the **whole** batch will need to be 400
|
|
tokens long, so the whole batch will be [64, 400] instead of [64, 4], leading to the high slowdown. Even worse, on
|
|
bigger batches, the program simply crashes.
|
|
|
|
|
|
```
|
|
------------------------------
|
|
Streaming no batching
|
|
100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:05<00:00, 183.69it/s]
|
|
------------------------------
|
|
Streaming batch_size=8
|
|
100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:03<00:00, 265.74it/s]
|
|
------------------------------
|
|
Streaming batch_size=64
|
|
100%|██████████████████████████████████████████████████████████████████████| 1000/1000 [00:26<00:00, 37.80it/s]
|
|
------------------------------
|
|
Streaming batch_size=256
|
|
0%| | 0/1000 [00:00<?, ?it/s]
|
|
Traceback (most recent call last):
|
|
File "/home/nicolas/src/transformers/test.py", line 42, in <module>
|
|
for out in tqdm(pipe(dataset, batch_size=256), total=len(dataset)):
|
|
....
|
|
q = q / math.sqrt(dim_per_head) # (bs, n_heads, q_length, dim_per_head)
|
|
RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 3.95 GiB total capacity; 1.72 GiB already allocated; 354.88 MiB free; 2.46 GiB reserved in total by PyTorch)
|
|
```
|
|
|
|
There are no good (general) solutions for this problem, and your mileage may vary depending on your use cases. Rule of
|
|
thumb:
|
|
|
|
For users, a rule of thumb is:
|
|
|
|
- **Measure performance on your load, with your hardware. Measure, measure, and keep measuring. Real numbers are the
|
|
only way to go.**
|
|
- If you are latency constrained (live product doing inference), don't batch.
|
|
- If you are using CPU, don't batch.
|
|
- If you are using throughput (you want to run your model on a bunch of static data), on GPU, then:
|
|
|
|
- If you have no clue about the size of the sequence_length ("natural" data), by default don't batch, measure and
|
|
try tentatively to add it, add OOM checks to recover when it will fail (and it will at some point if you don't
|
|
control the sequence_length.)
|
|
- If your sequence_length is super regular, then batching is more likely to be VERY interesting, measure and push
|
|
it until you get OOMs.
|
|
- The larger the GPU the more likely batching is going to be more interesting
|
|
- As soon as you enable batching, make sure you can handle OOMs nicely.
|
|
|
|
## Pipeline chunk batching
|
|
|
|
`zero-shot-classification` and `question-answering` are slightly specific in the sense, that a single input might yield
|
|
multiple forward pass of a model. Under normal circumstances, this would yield issues with `batch_size` argument.
|
|
|
|
In order to circumvent this issue, both of these pipelines are a bit specific, they are `ChunkPipeline` instead of
|
|
regular `Pipeline`. In short:
|
|
|
|
|
|
```python
|
|
preprocessed = pipe.preprocess(inputs)
|
|
model_outputs = pipe.forward(preprocessed)
|
|
outputs = pipe.postprocess(model_outputs)
|
|
```
|
|
|
|
Now becomes:
|
|
|
|
|
|
```python
|
|
all_model_outputs = []
|
|
for preprocessed in pipe.preprocess(inputs):
|
|
model_outputs = pipe.forward(preprocessed)
|
|
all_model_outputs.append(model_outputs)
|
|
outputs = pipe.postprocess(all_model_outputs)
|
|
```
|
|
|
|
This should be very transparent to your code because the pipelines are used in
|
|
the same way.
|
|
|
|
This is a simplified view, since the pipeline can handle automatically the batch to ! Meaning you don't have to care
|
|
about how many forward passes you inputs are actually going to trigger, you can optimize the `batch_size`
|
|
independently of the inputs. The caveats from the previous section still apply.
|
|
|
|
## Pipeline custom code
|
|
|
|
If you want to override a specific pipeline.
|
|
|
|
Don't hesitate to create an issue for your task at hand, the goal of the pipeline is to be easy to use and support most
|
|
cases, so `transformers` could maybe support your use case.
|
|
|
|
|
|
If you want to try simply you can:
|
|
|
|
- Subclass your pipeline of choice
|
|
|
|
```python
|
|
class MyPipeline(TextClassificationPipeline):
|
|
def postprocess():
|
|
# Your code goes here
|
|
scores = scores * 100
|
|
# And here
|
|
|
|
|
|
my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...)
|
|
# or if you use *pipeline* function, then:
|
|
my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline)
|
|
```
|
|
|
|
That should enable you to do all the custom code you want.
|
|
|
|
|
|
## Implementing a pipeline
|
|
|
|
[Implementing a new pipeline](../add_new_pipeline)
|
|
|
|
## Audio
|
|
|
|
Pipelines available for audio tasks include the following.
|
|
|
|
### AudioClassificationPipeline
|
|
|
|
[[autodoc]] AudioClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### AutomaticSpeechRecognitionPipeline
|
|
|
|
[[autodoc]] AutomaticSpeechRecognitionPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### TextToAudioPipeline
|
|
|
|
[[autodoc]] TextToAudioPipeline
|
|
- __call__
|
|
- all
|
|
|
|
|
|
### ZeroShotAudioClassificationPipeline
|
|
|
|
[[autodoc]] ZeroShotAudioClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
## Computer vision
|
|
|
|
Pipelines available for computer vision tasks include the following.
|
|
|
|
### DepthEstimationPipeline
|
|
[[autodoc]] DepthEstimationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ImageClassificationPipeline
|
|
|
|
[[autodoc]] ImageClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ImageSegmentationPipeline
|
|
|
|
[[autodoc]] ImageSegmentationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ImageToImagePipeline
|
|
|
|
[[autodoc]] ImageToImagePipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ObjectDetectionPipeline
|
|
|
|
[[autodoc]] ObjectDetectionPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### VideoClassificationPipeline
|
|
|
|
[[autodoc]] VideoClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ZeroShotImageClassificationPipeline
|
|
|
|
[[autodoc]] ZeroShotImageClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ZeroShotObjectDetectionPipeline
|
|
|
|
[[autodoc]] ZeroShotObjectDetectionPipeline
|
|
- __call__
|
|
- all
|
|
|
|
## Natural Language Processing
|
|
|
|
Pipelines available for natural language processing tasks include the following.
|
|
|
|
### ConversationalPipeline
|
|
|
|
[[autodoc]] Conversation
|
|
|
|
[[autodoc]] ConversationalPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### FillMaskPipeline
|
|
|
|
[[autodoc]] FillMaskPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### QuestionAnsweringPipeline
|
|
|
|
[[autodoc]] QuestionAnsweringPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### SummarizationPipeline
|
|
|
|
[[autodoc]] SummarizationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### TableQuestionAnsweringPipeline
|
|
|
|
[[autodoc]] TableQuestionAnsweringPipeline
|
|
- __call__
|
|
|
|
### TextClassificationPipeline
|
|
|
|
[[autodoc]] TextClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### TextGenerationPipeline
|
|
|
|
[[autodoc]] TextGenerationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### Text2TextGenerationPipeline
|
|
|
|
[[autodoc]] Text2TextGenerationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### TokenClassificationPipeline
|
|
|
|
[[autodoc]] TokenClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### TranslationPipeline
|
|
|
|
[[autodoc]] TranslationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ZeroShotClassificationPipeline
|
|
|
|
[[autodoc]] ZeroShotClassificationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
## Multimodal
|
|
|
|
Pipelines available for multimodal tasks include the following.
|
|
|
|
### DocumentQuestionAnsweringPipeline
|
|
|
|
[[autodoc]] DocumentQuestionAnsweringPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### FeatureExtractionPipeline
|
|
|
|
[[autodoc]] FeatureExtractionPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ImageFeatureExtractionPipeline
|
|
|
|
[[autodoc]] ImageFeatureExtractionPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### ImageToTextPipeline
|
|
|
|
[[autodoc]] ImageToTextPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### MaskGenerationPipeline
|
|
|
|
[[autodoc]] MaskGenerationPipeline
|
|
- __call__
|
|
- all
|
|
|
|
### VisualQuestionAnsweringPipeline
|
|
|
|
[[autodoc]] VisualQuestionAnsweringPipeline
|
|
- __call__
|
|
- all
|
|
|
|
## Parent class: `Pipeline`
|
|
|
|
[[autodoc]] Pipeline
|