Queuing

Every Gradio app comes with a built-in queuing system that can scale to thousands of concurrent users. Because many of your event listeners may involve heavy processing, Gradio automatically creates a queue to handle every event listener in the backend. Every event listener in your app automatically has a queue to process incoming events.

Configuring the Queue

By default, each event listener has its own queue, which handles one request at a time. This can be configured via two arguments:

concurrency_count: this sets the number of requests that will be processed simultaneously. Take a look at the example below:

import gradio as gr

with gr.Blocks() as demo:
    prompt = gr.Textbox()
    image = gr.Image()
    generate_btn = gr.Button("Generate Image")
    generate_btn.click(image_gen, prompt, image, concurrency_count=5)

In the code above, there would be 5 workers processing requests simultaneously - all other requests would be queued until a worker freed up.

concurrency_id: this allows event listeners to share a queue by having the same id. Imagine that your setup has only 2 GPUs, and your app has three functions, all of which require GPUs. Between your three functions, you'd want to have a single queue that has 2 workers. This is what that would look like:

import gradio as gr

with gr.Blocks() as demo:
    prompt = gr.Textbox()
    image = gr.Image()
    generate_btn_1 = gr.Button("Generate Image via model 1")
    generate_btn_2 = gr.Button("Generate Image via model 2")
    generate_btn_3 = gr.Button("Generate Image via model 3")
    generate_btn_1.click(image_gen_1, prompt, image, concurrency_count=2, concurrency_id="gpu_queue")
    generate_btn_2.click(image_gen_2, prompt, image, concurrency_id="gpu_queue")
    generate_btn_3.click(image_gen_3, prompt, image, concurrency_id="gpu_queue")

Now all the event listeners have the same queue by setting the same string for concurrency_id. We also set the concurrency_count for the queue to be 2. These two variables make it very easy to manage the queue!

If you want an unlimited number of requests processed simultaneously for an event, then you can also set concurrency_count=None.