Hub documentation

Datasets Overview

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Datasets Overview

Datasets on the Hub

The Hugging Face Hub hosts a large number of community-curated datasets for a diverse range of tasks such as translation, automatic speech recognition, and image classification. Alongside the information contained in the dataset card, many datasets, such as GLUE, include a Dataset Viewer to showcase the data.

Each dataset is a Git repository that contains the data required to generate splits for training, evaluation, and testing. For information on how a dataset repository is structured, refer to the Data files Configuration page. Following the supported repo structure will ensure that the dataset page on the Hub will have a Viewer.

Search for datasets

Like models and spaces, you can search the Hub for datasets using the search bar in the top navigation or on the main datasets page. There’s a large number of languages, tasks, and licenses that you can use to filter your results to find a dataset that’s right for you.

Privacy

Since datasets are repositories, you can toggle their visibility between private and public through the Settings tab. If a dataset is owned by an organization, the privacy settings apply to all the members of the organization.

< > Update on GitHub