# Pipelines The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the [task summary](../task_summary) for examples of use. There are two categories of pipeline abstractions to be aware about: - The [`pipeline`] which is the most powerful object encapsulating all other pipelines. - Task-specific pipelines are available for [audio](#audio), [computer vision](#computer-vision), [natural language processing](#natural-language-processing), and [multimodal](#multimodal) tasks. ## The pipeline abstraction The *pipeline* abstraction is a wrapper around all the other available pipelines. It is instantiated as any other pipeline but can provide additional quality of life. Simple call on one item: ```python >>> pipe = pipeline("text-classification") >>> pipe("This restaurant is awesome") [{'label': 'POSITIVE', 'score': 0.9998743534088135}] ``` If you want to use a specific model from the [hub](https://huggingface.co) you can ignore the task if the model on the hub already defines it: ```python >>> pipe = pipeline(model="roberta-large-mnli") >>> pipe("This restaurant is awesome") [{'label': 'NEUTRAL', 'score': 0.7313136458396912}] ``` To call a pipeline on many items, you can call it with a *list*. ```python >>> pipe = pipeline("text-classification") >>> pipe(["This restaurant is awesome", "This restaurant is awful"]) [{'label': 'POSITIVE', 'score': 0.9998743534088135}, {'label': 'NEGATIVE', 'score': 0.9996669292449951}] ``` To iterate over full datasets it is recommended to use a `dataset` directly. This means you don't need to allocate the whole dataset at once, nor do you need to do batching yourself. This should work just as fast as custom loops on GPU. If it doesn't don't hesitate to create an issue. ```python import datasets from transformers import pipeline from transformers.pipelines.pt_utils import KeyDataset from tqdm.auto import tqdm pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0) dataset = datasets.load_dataset("superb", name="asr", split="test") # KeyDataset (only *pt*) will simply return the item in the dict returned by the dataset item # as we're not interested in the *target* part of the dataset. For sentence pair use KeyPairDataset for out in tqdm(pipe(KeyDataset(dataset, "file"))): print(out) # {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"} # {"text": ....} # .... ``` For ease of use, a generator is also possible: ```python from transformers import pipeline pipe = pipeline("text-classification") def data(): while True: # This could come from a dataset, a database, a queue or HTTP request # in a server # Caveat: because this is iterative, you cannot use `num_workers > 1` variable # to use multiple threads to preprocess data. You can still have 1 thread that # does the preprocessing while the main runs the big inference yield "This is a test" for out in pipe(data()): print(out) # {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"} # {"text": ....} # .... ``` [[autodoc]] pipeline ## Pipeline batching All pipelines can use batching. This will work whenever the pipeline uses its streaming ability (so when passing lists or `Dataset` or `generator`). ```python from transformers import pipeline from transformers.pipelines.pt_utils import KeyDataset import datasets dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised") pipe = pipeline("text-classification", device=0) for out in pipe(KeyDataset(dataset, "text"), batch_size=8, truncation="only_first"): print(out) # [{'label': 'POSITIVE', 'score': 0.9998743534088135}] # Exactly the same output as before, but the content are passed # as batches to the model ``` However, this is not automatically a win for performance. It can be either a 10x speedup or 5x slowdown depending on hardware, data and the actual model being used. Example where it's mostly a speedup: ```python from transformers import pipeline from torch.utils.data import Dataset from tqdm.auto import tqdm pipe = pipeline("text-classification", device=0) class MyDataset(Dataset): def __len__(self): return 5000 def __getitem__(self, i): return "This is a test" dataset = MyDataset() for batch_size in [1, 8, 64, 256]: print("-" * 30) print(f"Streaming batch_size={batch_size}") for out in tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)): pass ``` ``` # On GTX 970 ------------------------------ Streaming no batching 100%|██████████████████████████████████████████████████████████████████████| 5000/5000 [00:26<00:00, 187.52it/s] ------------------------------ Streaming batch_size=8 100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:04<00:00, 1205.95it/s] ------------------------------ Streaming batch_size=64 100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:02<00:00, 2478.24it/s] ------------------------------ Streaming batch_size=256 100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 2554.43it/s] (diminishing returns, saturated the GPU) ``` Example where it's most a slowdown: ```python class MyDataset(Dataset): def __len__(self): return 5000 def __getitem__(self, i): if i % 64 == 0: n = 100 else: n = 1 return "This is a test" * n ``` This is a occasional very long sentence compared to the other. In that case, the **whole** batch will need to be 400 tokens long, so the whole batch will be [64, 400] instead of [64, 4], leading to the high slowdown. Even worse, on bigger batches, the program simply crashes. ``` ------------------------------ Streaming no batching 100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:05<00:00, 183.69it/s] ------------------------------ Streaming batch_size=8 100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:03<00:00, 265.74it/s] ------------------------------ Streaming batch_size=64 100%|██████████████████████████████████████████████████████████████████████| 1000/1000 [00:26<00:00, 37.80it/s] ------------------------------ Streaming batch_size=256 0%| | 0/1000 [00:00 for out in tqdm(pipe(dataset, batch_size=256), total=len(dataset)): .... q = q / math.sqrt(dim_per_head) # (bs, n_heads, q_length, dim_per_head) RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 3.95 GiB total capacity; 1.72 GiB already allocated; 354.88 MiB free; 2.46 GiB reserved in total by PyTorch) ``` There are no good (general) solutions for this problem, and your mileage may vary depending on your use cases. Rule of thumb: For users, a rule of thumb is: - **Measure performance on your load, with your hardware. Measure, measure, and keep measuring. Real numbers are the only way to go.** - If you are latency constrained (live product doing inference), don't batch - If you are using CPU, don't batch. - If you are using throughput (you want to run your model on a bunch of static data), on GPU, then: - If you have no clue about the size of the sequence_length ("natural" data), by default don't batch, measure and try tentatively to add it, add OOM checks to recover when it will fail (and it will at some point if you don't control the sequence_length.) - If your sequence_length is super regular, then batching is more likely to be VERY interesting, measure and push it until you get OOMs. - The larger the GPU the more likely batching is going to be more interesting - As soon as you enable batching, make sure you can handle OOMs nicely. ## Pipeline chunk batching `zero-shot-classification` and `question-answering` are slightly specific in the sense, that a single input might yield multiple forward pass of a model. Under normal circumstances, this would yield issues with `batch_size` argument. In order to circumvent this issue, both of these pipelines are a bit specific, they are `ChunkPipeline` instead of regular `Pipeline`. In short: ```python preprocessed = pipe.preprocess(inputs) model_outputs = pipe.forward(preprocessed) outputs = pipe.postprocess(model_outputs) ``` Now becomes: ```python all_model_outputs = [] for preprocessed in pipe.preprocess(inputs): model_outputs = pipe.forward(preprocessed) all_model_outputs.append(model_outputs) outputs = pipe.postprocess(all_model_outputs) ``` This should be very transparent to your code because the pipelines are used in the same way. This is a simplified view, since the pipeline can handle automatically the batch to ! Meaning you don't have to care about how many forward passes you inputs are actually going to trigger, you can optimize the `batch_size` independently of the inputs. The caveats from the previous section still apply. ## Pipeline custom code If you want to override a specific pipeline. Don't hesitate to create an issue for your task at hand, the goal of the pipeline is to be easy to use and support most cases, so `transformers` could maybe support your use case. If you want to try simply you can: - Subclass your pipeline of choice ```python class MyPipeline(TextClassificationPipeline): def postprocess(): # Your code goes here scores = scores * 100 # And here my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...) # or if you use *pipeline* function, then: my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline) ``` That should enable you to do all the custom code you want. ## Implementing a pipeline [Implementing a new pipeline](../add_new_pipeline) ## Audio Pipelines available for audio tasks include the following. ### AudioClassificationPipeline [[autodoc]] AudioClassificationPipeline - __call__ - all ### AutomaticSpeechRecognitionPipeline [[autodoc]] AutomaticSpeechRecognitionPipeline - __call__ - all ### ZeroShotAudioClassificationPipeline [[autodoc]] ZeroShotAudioClassificationPipeline - __call__ - all ## Computer vision Pipelines available for computer vision tasks include the following. ### DepthEstimationPipeline [[autodoc]] DepthEstimationPipeline - __call__ - all ### ImageClassificationPipeline [[autodoc]] ImageClassificationPipeline - __call__ - all ### ImageSegmentationPipeline [[autodoc]] ImageSegmentationPipeline - __call__ - all ### ObjectDetectionPipeline [[autodoc]] ObjectDetectionPipeline - __call__ - all ### VideoClassificationPipeline [[autodoc]] VideoClassificationPipeline - __call__ - all ### ZeroShotImageClassificationPipeline [[autodoc]] ZeroShotImageClassificationPipeline - __call__ - all ### ZeroShotObjectDetectionPipeline [[autodoc]] ZeroShotObjectDetectionPipeline - __call__ - all ## Natural Language Processing Pipelines available for natural language processing tasks include the following. ### ConversationalPipeline [[autodoc]] Conversation [[autodoc]] ConversationalPipeline - __call__ - all ### FillMaskPipeline [[autodoc]] FillMaskPipeline - __call__ - all ### NerPipeline [[autodoc]] NerPipeline See [`TokenClassificationPipeline`] for all details. ### QuestionAnsweringPipeline [[autodoc]] QuestionAnsweringPipeline - __call__ - all ### SummarizationPipeline [[autodoc]] SummarizationPipeline - __call__ - all ### TableQuestionAnsweringPipeline [[autodoc]] TableQuestionAnsweringPipeline - __call__ ### TextClassificationPipeline [[autodoc]] TextClassificationPipeline - __call__ - all ### TextGenerationPipeline [[autodoc]] TextGenerationPipeline - __call__ - all ### Text2TextGenerationPipeline [[autodoc]] Text2TextGenerationPipeline - __call__ - all ### TokenClassificationPipeline [[autodoc]] TokenClassificationPipeline - __call__ - all ### TranslationPipeline [[autodoc]] TranslationPipeline - __call__ - all ### ZeroShotClassificationPipeline [[autodoc]] ZeroShotClassificationPipeline - __call__ - all ## Multimodal Pipelines available for multimodal tasks include the following. ### DocumentQuestionAnsweringPipeline [[autodoc]] DocumentQuestionAnsweringPipeline - __call__ - all ### FeatureExtractionPipeline [[autodoc]] FeatureExtractionPipeline - __call__ - all ### ImageToTextPipeline [[autodoc]] ImageToTextPipeline - __call__ - all ### VisualQuestionAnsweringPipeline [[autodoc]] VisualQuestionAnsweringPipeline - __call__ - all ## Parent class: `Pipeline` [[autodoc]] Pipeline