15 KiB
Pipelines
The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the task summary for examples of use.
There are two categories of pipeline abstractions to be aware about:
- The [
pipeline
] which is the most powerful object encapsulating all other pipelines. - Task-specific pipelines are available for audio, computer vision, natural language processing, and multimodal tasks.
The pipeline abstraction
The pipeline abstraction is a wrapper around all the other available pipelines. It is instantiated as any other pipeline but can provide additional quality of life.
Simple call on one item:
>>> pipe = pipeline("text-classification")
>>> pipe("This restaurant is awesome")
[{'label': 'POSITIVE', 'score': 0.9998743534088135}]
If you want to use a specific model from the hub you can ignore the task if the model on the hub already defines it:
>>> pipe = pipeline(model="FacebookAI/roberta-large-mnli")
>>> pipe("This restaurant is awesome")
[{'label': 'NEUTRAL', 'score': 0.7313136458396912}]
To call a pipeline on many items, you can call it with a list.
>>> pipe = pipeline("text-classification")
>>> pipe(["This restaurant is awesome", "This restaurant is awful"])
[{'label': 'POSITIVE', 'score': 0.9998743534088135},
{'label': 'NEGATIVE', 'score': 0.9996669292449951}]
To iterate over full datasets it is recommended to use a dataset
directly. This means you don't need to allocate
the whole dataset at once, nor do you need to do batching yourself. This should work just as fast as custom loops on
GPU. If it doesn't don't hesitate to create an issue.
import datasets
from transformers import pipeline
from transformers.pipelines.pt_utils import KeyDataset
from tqdm.auto import tqdm
pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
dataset = datasets.load_dataset("superb", name="asr", split="test")
# KeyDataset (only *pt*) will simply return the item in the dict returned by the dataset item
# as we're not interested in the *target* part of the dataset. For sentence pair use KeyPairDataset
for out in tqdm(pipe(KeyDataset(dataset, "file"))):
print(out)
# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
# {"text": ....}
# ....
For ease of use, a generator is also possible:
from transformers import pipeline
pipe = pipeline("text-classification")
def data():
while True:
# This could come from a dataset, a database, a queue or HTTP request
# in a server
# Caveat: because this is iterative, you cannot use `num_workers > 1` variable
# to use multiple threads to preprocess data. You can still have 1 thread that
# does the preprocessing while the main runs the big inference
yield "This is a test"
for out in pipe(data()):
print(out)
# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
# {"text": ....}
# ....
autodoc pipeline
Pipeline batching
All pipelines can use batching. This will work
whenever the pipeline uses its streaming ability (so when passing lists or Dataset
or generator
).
from transformers import pipeline
from transformers.pipelines.pt_utils import KeyDataset
import datasets
dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised")
pipe = pipeline("text-classification", device=0)
for out in pipe(KeyDataset(dataset, "text"), batch_size=8, truncation="only_first"):
print(out)
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]
# Exactly the same output as before, but the content are passed
# as batches to the model
However, this is not automatically a win for performance. It can be either a 10x speedup or 5x slowdown depending on hardware, data and the actual model being used.
Example where it's mostly a speedup:
from transformers import pipeline
from torch.utils.data import Dataset
from tqdm.auto import tqdm
pipe = pipeline("text-classification", device=0)
class MyDataset(Dataset):
def __len__(self):
return 5000
def __getitem__(self, i):
return "This is a test"
dataset = MyDataset()
for batch_size in [1, 8, 64, 256]:
print("-" * 30)
print(f"Streaming batch_size={batch_size}")
for out in tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)):
pass
# On GTX 970
------------------------------
Streaming no batching
100%|██████████████████████████████████████████████████████████████████████| 5000/5000 [00:26<00:00, 187.52it/s]
------------------------------
Streaming batch_size=8
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:04<00:00, 1205.95it/s]
------------------------------
Streaming batch_size=64
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:02<00:00, 2478.24it/s]
------------------------------
Streaming batch_size=256
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 2554.43it/s]
(diminishing returns, saturated the GPU)
Example where it's most a slowdown:
class MyDataset(Dataset):
def __len__(self):
return 5000
def __getitem__(self, i):
if i % 64 == 0:
n = 100
else:
n = 1
return "This is a test" * n
This is a occasional very long sentence compared to the other. In that case, the whole batch will need to be 400 tokens long, so the whole batch will be [64, 400] instead of [64, 4], leading to the high slowdown. Even worse, on bigger batches, the program simply crashes.
------------------------------
Streaming no batching
100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:05<00:00, 183.69it/s]
------------------------------
Streaming batch_size=8
100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:03<00:00, 265.74it/s]
------------------------------
Streaming batch_size=64
100%|██████████████████████████████████████████████████████████████████████| 1000/1000 [00:26<00:00, 37.80it/s]
------------------------------
Streaming batch_size=256
0%| | 0/1000 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/nicolas/src/transformers/test.py", line 42, in <module>
for out in tqdm(pipe(dataset, batch_size=256), total=len(dataset)):
....
q = q / math.sqrt(dim_per_head) # (bs, n_heads, q_length, dim_per_head)
RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 3.95 GiB total capacity; 1.72 GiB already allocated; 354.88 MiB free; 2.46 GiB reserved in total by PyTorch)
There are no good (general) solutions for this problem, and your mileage may vary depending on your use cases. Rule of thumb:
For users, a rule of thumb is:
-
Measure performance on your load, with your hardware. Measure, measure, and keep measuring. Real numbers are the only way to go.
-
If you are latency constrained (live product doing inference), don't batch.
-
If you are using CPU, don't batch.
-
If you are using throughput (you want to run your model on a bunch of static data), on GPU, then:
- If you have no clue about the size of the sequence_length ("natural" data), by default don't batch, measure and try tentatively to add it, add OOM checks to recover when it will fail (and it will at some point if you don't control the sequence_length.)
- If your sequence_length is super regular, then batching is more likely to be VERY interesting, measure and push it until you get OOMs.
- The larger the GPU the more likely batching is going to be more interesting
-
As soon as you enable batching, make sure you can handle OOMs nicely.
Pipeline chunk batching
zero-shot-classification
and question-answering
are slightly specific in the sense, that a single input might yield
multiple forward pass of a model. Under normal circumstances, this would yield issues with batch_size
argument.
In order to circumvent this issue, both of these pipelines are a bit specific, they are ChunkPipeline
instead of
regular Pipeline
. In short:
preprocessed = pipe.preprocess(inputs)
model_outputs = pipe.forward(preprocessed)
outputs = pipe.postprocess(model_outputs)
Now becomes:
all_model_outputs = []
for preprocessed in pipe.preprocess(inputs):
model_outputs = pipe.forward(preprocessed)
all_model_outputs.append(model_outputs)
outputs = pipe.postprocess(all_model_outputs)
This should be very transparent to your code because the pipelines are used in the same way.
This is a simplified view, since the pipeline can handle automatically the batch to ! Meaning you don't have to care
about how many forward passes you inputs are actually going to trigger, you can optimize the batch_size
independently of the inputs. The caveats from the previous section still apply.
Pipeline custom code
If you want to override a specific pipeline.
Don't hesitate to create an issue for your task at hand, the goal of the pipeline is to be easy to use and support most
cases, so transformers
could maybe support your use case.
If you want to try simply you can:
- Subclass your pipeline of choice
class MyPipeline(TextClassificationPipeline):
def postprocess():
# Your code goes here
scores = scores * 100
# And here
my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...)
# or if you use *pipeline* function, then:
my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline)
That should enable you to do all the custom code you want.
Implementing a pipeline
Audio
Pipelines available for audio tasks include the following.
AudioClassificationPipeline
autodoc AudioClassificationPipeline - call - all
AutomaticSpeechRecognitionPipeline
autodoc AutomaticSpeechRecognitionPipeline - call - all
TextToAudioPipeline
autodoc TextToAudioPipeline - call - all
ZeroShotAudioClassificationPipeline
autodoc ZeroShotAudioClassificationPipeline - call - all
Computer vision
Pipelines available for computer vision tasks include the following.
DepthEstimationPipeline
autodoc DepthEstimationPipeline - call - all
ImageClassificationPipeline
autodoc ImageClassificationPipeline - call - all
ImageSegmentationPipeline
autodoc ImageSegmentationPipeline - call - all
ImageToImagePipeline
autodoc ImageToImagePipeline - call - all
ObjectDetectionPipeline
autodoc ObjectDetectionPipeline - call - all
VideoClassificationPipeline
autodoc VideoClassificationPipeline - call - all
ZeroShotImageClassificationPipeline
autodoc ZeroShotImageClassificationPipeline - call - all
ZeroShotObjectDetectionPipeline
autodoc ZeroShotObjectDetectionPipeline - call - all
Natural Language Processing
Pipelines available for natural language processing tasks include the following.
ConversationalPipeline
autodoc Conversation
autodoc ConversationalPipeline - call - all
FillMaskPipeline
autodoc FillMaskPipeline - call - all
QuestionAnsweringPipeline
autodoc QuestionAnsweringPipeline - call - all
SummarizationPipeline
autodoc SummarizationPipeline - call - all
TableQuestionAnsweringPipeline
autodoc TableQuestionAnsweringPipeline - call
TextClassificationPipeline
autodoc TextClassificationPipeline - call - all
TextGenerationPipeline
autodoc TextGenerationPipeline - call - all
Text2TextGenerationPipeline
autodoc Text2TextGenerationPipeline - call - all
TokenClassificationPipeline
autodoc TokenClassificationPipeline - call - all
TranslationPipeline
autodoc TranslationPipeline - call - all
ZeroShotClassificationPipeline
autodoc ZeroShotClassificationPipeline - call - all
Multimodal
Pipelines available for multimodal tasks include the following.
DocumentQuestionAnsweringPipeline
autodoc DocumentQuestionAnsweringPipeline - call - all
FeatureExtractionPipeline
autodoc FeatureExtractionPipeline - call - all
ImageFeatureExtractionPipeline
autodoc ImageFeatureExtractionPipeline - call - all
ImageToTextPipeline
autodoc ImageToTextPipeline - call - all
MaskGenerationPipeline
autodoc MaskGenerationPipeline - call - all
VisualQuestionAnsweringPipeline
autodoc VisualQuestionAnsweringPipeline - call - all
Parent class: Pipeline
autodoc Pipeline