Spaces:
Runtime error
Runtime error
<!--Copyright 2020 The HuggingFace Team. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations under the License. | |
--> | |
The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of | |
the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity | |
Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the | |
[for examples of use. | ](../task_summary)|
There are two categories of pipeline abstractions to be aware about: | |
- The [`pipeline`] which is the most powerful object encapsulating all other pipelines. | |
- Task-specific pipelines are available for [audio]( | |
The *pipeline* abstraction is a wrapper around all the other available pipelines. It is instantiated as any other | |
pipeline but can provide additional quality of life. | |
Simple call on one item: | |
```python | |
>>> pipe = pipeline("text-classification") | |
>>> pipe("This restaurant is awesome") | |
[{'label': 'POSITIVE', 'score': 0.9998743534088135}] | |
``` | |
If you want to use a specific model from the [hub](https://huggingface.co) you can ignore the task if the model on | |
the hub already defines it: | |
```python | |
>>> pipe = pipeline(model="roberta-large-mnli") | |
>>> pipe("This restaurant is awesome") | |
[{'label': 'NEUTRAL', 'score': 0.7313136458396912}] | |
``` | |
To call a pipeline on many items, you can call it with a *list*. | |
```python | |
>>> pipe = pipeline("text-classification") | |
>>> pipe(["This restaurant is awesome", "This restaurant is awful"]) | |
[{'label': 'POSITIVE', 'score': 0.9998743534088135}, | |
{'label': 'NEGATIVE', 'score': 0.9996669292449951}] | |
``` | |
To iterate over full datasets it is recommended to use a `dataset` directly. This means you don't need to allocate | |
the whole dataset at once, nor do you need to do batching yourself. This should work just as fast as custom loops on | |
GPU. If it doesn't don't hesitate to create an issue. | |
```python | |
import datasets | |
from transformers import pipeline | |
from transformers.pipelines.pt_utils import KeyDataset | |
from tqdm.auto import tqdm | |
pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0) | |
dataset = datasets.load_dataset("superb", name="asr", split="test") | |
# KeyDataset (only *pt*) will simply return the item in the dict returned by the dataset item | |
# as we're not interested in the *target* part of the dataset. For sentence pair use KeyPairDataset | |
for out in tqdm(pipe(KeyDataset(dataset, "file"))): | |
print(out) | |
# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"} | |
``` | |
For ease of use, a generator is also possible: | |
```python | |
from transformers import pipeline | |
pipe = pipeline("text-classification") | |
def data(): | |
while True: | |
# This could come from a dataset, a database, a queue or HTTP request | |
# in a server | |
# Caveat: because this is iterative, you cannot use `num_workers > 1` variable | |
# to use multiple threads to preprocess data. You can still have 1 thread that | |
# does the preprocessing while the main runs the big inference | |
yield "This is a test" | |
for out in pipe(data()): | |
print(out) | |
# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"} | |
``` | |
[[autodoc]] pipeline | |
All pipelines can use batching. This will work | |
whenever the pipeline uses its streaming ability (so when passing lists or `Dataset` or `generator`). | |
```python | |
from transformers import pipeline | |
from transformers.pipelines.pt_utils import KeyDataset | |
import datasets | |
dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised") | |
pipe = pipeline("text-classification", device=0) | |
for out in pipe(KeyDataset(dataset, "text"), batch_size=8, truncation="only_first"): | |
print(out) | |
``` | |
<Tip warning={true}> | |
However, this is not automatically a win for performance. It can be either a 10x speedup or 5x slowdown depending | |
on hardware, data and the actual model being used. | |
Example where it's mostly a speedup: | |
</Tip> | |
```python | |
from transformers import pipeline | |
from torch.utils.data import Dataset | |
from tqdm.auto import tqdm | |
pipe = pipeline("text-classification", device=0) | |
class MyDataset(Dataset): | |
def __len__(self): | |
return 5000 | |
def __getitem__(self, i): | |
return "This is a test" | |
dataset = MyDataset() | |
for batch_size in [1, 8, 64, 256]: | |
print("-" * 30) | |
print(f"Streaming batch_size={batch_size}") | |
for out in tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)): | |
pass | |
``` | |
``` | |
# On GTX 970 | |
------------------------------ | |
Streaming no batching | |
100%|██████████████████████████████████████████████████████████████████████| 5000/5000 [00:26<00:00, 187.52it/s] | |
------------------------------ | |
Streaming batch_size=8 | |
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:04<00:00, 1205.95it/s] | |
------------------------------ | |
Streaming batch_size=64 | |
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:02<00:00, 2478.24it/s] | |
------------------------------ | |
Streaming batch_size=256 | |
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 2554.43it/s] | |
(diminishing returns, saturated the GPU) | |
``` | |
Example where it's most a slowdown: | |
```python | |
class MyDataset(Dataset): | |
def __len__(self): | |
return 5000 | |
def __getitem__(self, i): | |
if i % 64 == 0: | |
n = 100 | |
else: | |
n = 1 | |
return "This is a test" * n | |
``` | |
This is a occasional very long sentence compared to the other. In that case, the **whole** batch will need to be 400 | |
tokens long, so the whole batch will be [64, 400] instead of [64, 4], leading to the high slowdown. Even worse, on | |
bigger batches, the program simply crashes. | |
``` | |
------------------------------ | |
Streaming no batching | |
100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:05<00:00, 183.69it/s] | |
------------------------------ | |
Streaming batch_size=8 | |
100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:03<00:00, 265.74it/s] | |
------------------------------ | |
Streaming batch_size=64 | |
100%|██████████████████████████████████████████████████████████████████████| 1000/1000 [00:26<00:00, 37.80it/s] | |
------------------------------ | |
Streaming batch_size=256 | |
0%| | 0/1000 [00:00<?, ?it/s] | |
Traceback (most recent call last): | |
File "/home/nicolas/src/transformers/test.py", line 42, in <module> | |
for out in tqdm(pipe(dataset, batch_size=256), total=len(dataset)): | |
.... | |
q = q / math.sqrt(dim_per_head) | |
RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 3.95 GiB total capacity; 1.72 GiB already allocated; 354.88 MiB free; 2.46 GiB reserved in total by PyTorch) | |
``` | |
There are no good (general) solutions for this problem, and your mileage may vary depending on your use cases. Rule of | |
thumb: | |
For users, a rule of thumb is: | |
- **Measure performance on your load, with your hardware. Measure, measure, and keep measuring. Real numbers are the | |
only way to go.** | |
- If you are latency constrained (live product doing inference), don't batch | |
- If you are using CPU, don't batch. | |
- If you are using throughput (you want to run your model on a bunch of static data), on GPU, then: | |
- If you have no clue about the size of the sequence_length ("natural" data), by default don't batch, measure and | |
try tentatively to add it, add OOM checks to recover when it will fail (and it will at some point if you don't | |
control the sequence_length.) | |
- If your sequence_length is super regular, then batching is more likely to be VERY interesting, measure and push | |
it until you get OOMs. | |
- The larger the GPU the more likely batching is going to be more interesting | |
- As soon as you enable batching, make sure you can handle OOMs nicely. | |
## Pipeline chunk batching | |
`zero-shot-classification` and `question-answering` are slightly specific in the sense, that a single input might yield | |
multiple forward pass of a model. Under normal circumstances, this would yield issues with `batch_size` argument. | |
In order to circumvent this issue, both of these pipelines are a bit specific, they are `ChunkPipeline` instead of | |
regular `Pipeline`. In short: | |
```python | |
preprocessed = pipe.preprocess(inputs) | |
model_outputs = pipe.forward(preprocessed) | |
outputs = pipe.postprocess(model_outputs) | |
``` | |
Now becomes: | |
```python | |
all_model_outputs = [] | |
for preprocessed in pipe.preprocess(inputs): | |
model_outputs = pipe.forward(preprocessed) | |
all_model_outputs.append(model_outputs) | |
outputs = pipe.postprocess(all_model_outputs) | |
``` | |
This should be very transparent to your code because the pipelines are used in | |
the same way. | |
This is a simplified view, since the pipeline can handle automatically the batch to ! Meaning you don't have to care | |
about how many forward passes you inputs are actually going to trigger, you can optimize the `batch_size` | |
independently of the inputs. The caveats from the previous section still apply. | |
## Pipeline custom code | |
If you want to override a specific pipeline. | |
Don't hesitate to create an issue for your task at hand, the goal of the pipeline is to be easy to use and support most | |
cases, so `transformers` could maybe support your use case. | |
If you want to try simply you can: | |
- Subclass your pipeline of choice | |
```python | |
class MyPipeline(TextClassificationPipeline): | |
def postprocess(): | |
# Your code goes here | |
scores = scores * 100 | |
# And here | |
my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...) | |
# or if you use *pipeline* function, then: | |
my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline) | |
``` | |
That should enable you to do all the custom code you want. | |
## Implementing a pipeline | |
[Implementing a new pipeline](../add_new_pipeline) | |
## Audio | |
Pipelines available for audio tasks include the following. | |
### AudioClassificationPipeline | |
[[autodoc]] AudioClassificationPipeline | |
- __call__ | |
- all | |
### AutomaticSpeechRecognitionPipeline | |
[[autodoc]] AutomaticSpeechRecognitionPipeline | |
- __call__ | |
- all | |
### ZeroShotAudioClassificationPipeline | |
[[autodoc]] ZeroShotAudioClassificationPipeline | |
- __call__ | |
- all | |
## Computer vision | |
Pipelines available for computer vision tasks include the following. | |
### DepthEstimationPipeline | |
[[autodoc]] DepthEstimationPipeline | |
- __call__ | |
- all | |
### ImageClassificationPipeline | |
[[autodoc]] ImageClassificationPipeline | |
- __call__ | |
- all | |
### ImageSegmentationPipeline | |
[[autodoc]] ImageSegmentationPipeline | |
- __call__ | |
- all | |
### ObjectDetectionPipeline | |
[[autodoc]] ObjectDetectionPipeline | |
- __call__ | |
- all | |
### VideoClassificationPipeline | |
[[autodoc]] VideoClassificationPipeline | |
- __call__ | |
- all | |
### ZeroShotImageClassificationPipeline | |
[[autodoc]] ZeroShotImageClassificationPipeline | |
- __call__ | |
- all | |
### ZeroShotObjectDetectionPipeline | |
[[autodoc]] ZeroShotObjectDetectionPipeline | |
- __call__ | |
- all | |
## Natural Language Processing | |
Pipelines available for natural language processing tasks include the following. | |
### ConversationalPipeline | |
[[autodoc]] Conversation | |
[[autodoc]] ConversationalPipeline | |
- __call__ | |
- all | |
### FillMaskPipeline | |
[[autodoc]] FillMaskPipeline | |
- __call__ | |
- all | |
### NerPipeline | |
[[autodoc]] NerPipeline | |
See [`TokenClassificationPipeline`] for all details. | |
### QuestionAnsweringPipeline | |
[[autodoc]] QuestionAnsweringPipeline | |
- __call__ | |
- all | |
### SummarizationPipeline | |
[[autodoc]] SummarizationPipeline | |
- __call__ | |
- all | |
### TableQuestionAnsweringPipeline | |
[[autodoc]] TableQuestionAnsweringPipeline | |
- __call__ | |
### TextClassificationPipeline | |
[[autodoc]] TextClassificationPipeline | |
- __call__ | |
- all | |
### TextGenerationPipeline | |
[[autodoc]] TextGenerationPipeline | |
- __call__ | |
- all | |
### Text2TextGenerationPipeline | |
[[autodoc]] Text2TextGenerationPipeline | |
- __call__ | |
- all | |
### TokenClassificationPipeline | |
[[autodoc]] TokenClassificationPipeline | |
- __call__ | |
- all | |
### TranslationPipeline | |
[[autodoc]] TranslationPipeline | |
- __call__ | |
- all | |
### ZeroShotClassificationPipeline | |
[[autodoc]] ZeroShotClassificationPipeline | |
- __call__ | |
- all | |
## Multimodal | |
Pipelines available for multimodal tasks include the following. | |
### DocumentQuestionAnsweringPipeline | |
[[autodoc]] DocumentQuestionAnsweringPipeline | |
- __call__ | |
- all | |
### FeatureExtractionPipeline | |
[[autodoc]] FeatureExtractionPipeline | |
- __call__ | |
- all | |
### ImageToTextPipeline | |
[[autodoc]] ImageToTextPipeline | |
- __call__ | |
- all | |
### VisualQuestionAnsweringPipeline | |
[[autodoc]] VisualQuestionAnsweringPipeline | |
- __call__ | |
- all | |
## Parent class: `Pipeline` | |
[[autodoc]] Pipeline | |