Hub Python Library documentation

Filesystem API

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.26.2).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Filesystem API

The HfFileSystem class provides a pythonic file interface to the Hugging Face Hub based on fsspec.

HfFileSystem

HfFileSystem is based on fsspec, so it is compatible with most of the APIs that it offers. For more details, check out our guide and fsspec’s API Reference.

class huggingface_hub.HfFileSystem

< >

( *args **kwargs )

Parameters

Access a remote Hugging Face Hub repository as if were a local file system.

HfFileSystem provides fsspec compatibility, which is useful for libraries that require it (e.g., reading Hugging Face datasets directly with pandas). However, it introduces additional overhead due to this compatibility layer. For better performance and reliability, it’s recommended to use HfApi methods when possible.

Usage:

>>> from huggingface_hub import HfFileSystem

>>> fs = HfFileSystem()

>>> # List files
>>> fs.glob("my-username/my-model/*.bin")
['my-username/my-model/pytorch_model.bin']
>>> fs.ls("datasets/my-username/my-dataset", detail=False)
['datasets/my-username/my-dataset/.gitattributes', 'datasets/my-username/my-dataset/README.md', 'datasets/my-username/my-dataset/data.json']

>>> # Read/write files
>>> with fs.open("my-username/my-model/pytorch_model.bin") as f:
...     data = f.read()
>>> with fs.open("my-username/my-model/pytorch_model.bin", "wb") as f:
...     f.write(data)

__init__

< >

( *args endpoint: Optional = None token: Union = None **storage_options )

cp_file

< >

( path1: str path2: str revision: Optional = None **kwargs )

Parameters

  • path1 (str) — Source path to copy from.
  • path2 (str) — Destination path to copy to.
  • revision (str, optional) — The git revision to copy from.

Copy a file within or between repositories.

Note: When possible, use HfApi.upload_file() for better performance.

exists

< >

( path **kwargs ) bool

Parameters

  • path (str) — Path to check.

Returns

bool

True if file exists, False otherwise.

Check if a file exists.

For more details, refer to fsspec documentation.

Note: When possible, use HfApi.file_exists() for better performance.

find

< >

( path: str maxdepth: Optional = None withdirs: bool = False detail: bool = False refresh: bool = False revision: Optional = None **kwargs ) Union[List[str], Dict[str, Dict[str, Any]]]

Parameters

  • path (str) — Root path to list files from.
  • maxdepth (int, optional) — Maximum depth to descend into subdirectories.
  • withdirs (bool, optional) — Include directory paths in the output. Defaults to False.
  • detail (bool, optional) — If True, returns a dict mapping paths to file information. Defaults to False.
  • refresh (bool, optional) — If True, bypass the cache and fetch the latest data. Defaults to False.
  • revision (str, optional) — The git revision to list from.

Returns

Union[List[str], Dict[str, Dict[str, Any]]]

List of paths or dict of file information.

List all files below path.

For more details, refer to fsspec documentation.

get_file

< >

( rpath lpath callback = <fsspec.callbacks.NoOpCallback object at 0x7f1f107d88e0> outfile = None **kwargs )

Parameters

  • rpath (str) — Remote path to download from.
  • lpath (str) — Local path to download to.
  • callback (Callback, optional) — Optional callback to track download progress. Defaults to no callback.
  • outfile (IO, optional) — Optional file-like object to write to. If provided, lpath is ignored.

Copy single remote file to local.

Note: When possible, use HfApi.hf_hub_download() for better performance.

glob

< >

( path: str **kwargs ) List[str]

Parameters

  • path (str) — Path pattern to match.

Returns

List[str]

List of paths matching the pattern.

Find files by glob-matching.

For more details, refer to fsspec documentation.

info

< >

( path: str refresh: bool = False revision: Optional = None **kwargs ) Dict[str, Any]

Parameters

  • path (str) — Path to get info for.
  • refresh (bool, optional) — If True, bypass the cache and fetch the latest data. Defaults to False.
  • revision (str, optional) — The git revision to get info from.

Returns

Dict[str, Any]

Dictionary containing file information (type, size, commit info, etc.).

Get information about a file or directory.

For more details, refer to fsspec documentation.

Note: When possible, use HfApi.get_paths_info() or HfApi.repo_info() for better performance.

invalidate_cache

< >

( path: Optional = None )

Parameters

  • path (str, optional) — Path to clear from cache. If not provided, clear the entire cache.

Clear the cache for a given path.

For more details, refer to fsspec documentation.

isdir

< >

( path ) bool

Parameters

  • path (str) — Path to check.

Returns

bool

True if path is a directory, False otherwise.

Check if a path is a directory.

For more details, refer to fsspec documentation.

isfile

< >

( path ) bool

Parameters

  • path (str) — Path to check.

Returns

bool

True if path is a file, False otherwise.

Check if a path is a file.

For more details, refer to fsspec documentation.

ls

< >

( path: str detail: bool = True refresh: bool = False revision: Optional = None **kwargs ) List[Union[str, Dict[str, Any]]]

Parameters

  • path (str) — Path to the directory.
  • detail (bool, optional) — If True, returns a list of dictionaries containing file information. If False, returns a list of file paths. Defaults to True.
  • refresh (bool, optional) — If True, bypass the cache and fetch the latest data. Defaults to False.
  • revision (str, optional) — The git revision to list from.

Returns

List[Union[str, Dict[str, Any]]]

List of file paths (if detail=False) or list of file information dictionaries (if detail=True).

List the contents of a directory.

For more details, refer to fsspec documentation.

Note: When possible, use HfApi.list_repo_tree() for better performance.

modified

< >

( path: str **kwargs ) datetime

Parameters

  • path (str) — Path to the file.

Returns

datetime

Last commit date of the file.

Get the last modified time of a file.

For more details, refer to fsspec documentation.

resolve_path

< >

( path: str revision: Optional = None ) HfFileSystemResolvedPath

Parameters

  • path (str) — Path to resolve.
  • revision (str, optional) — The revision of the repo to resolve. Defaults to the revision specified in the path.

Returns

HfFileSystemResolvedPath

Resolved path information containing repo_type, repo_id, revision and path_in_repo.

Raises

ValueError or NotImplementedError

  • ValueError — If path contains conflicting revision information.
  • NotImplementedError — If trying to list repositories.

Resolve a Hugging Face file system path into its components.

rm

< >

( path: str recursive: bool = False maxdepth: Optional = None revision: Optional = None **kwargs )

Parameters

  • path (str) — Path to delete.
  • recursive (bool, optional) — If True, delete directory and all its contents. Defaults to False.
  • maxdepth (int, optional) — Maximum number of subdirectories to visit when deleting recursively.
  • revision (str, optional) — The git revision to delete from.

Delete files from a repository.

For more details, refer to fsspec documentation.

Note: When possible, use HfApi.delete_file() for better performance.

url

< >

( path: str ) str

Parameters

  • path (str) — Path to get URL for.

Returns

str

HTTP URL to access the file or directory on the Hub.

Get the HTTP URL of the given path.

walk

< >

( path: str *args **kwargs ) Iterator[Tuple[str, List[str], List[str]]]

Parameters

  • path (str) — Root path to list files from.

Returns

Iterator[Tuple[str, List[str], List[str]]]

An iterator of (path, list of directory names, list of file names) tuples.

Return all files below the given path.

For more details, refer to fsspec documentation.

< > Update on GitHub