maskgct

Running

App Files Files Community

maskgct / egs /datasets /README.md

Hecheng0625

Upload 167 files

8c92a11 verified 15 days ago

preview code

raw

history blame

12.1 kB

	# Datasets Format

	Amphion support the following academic datasets (sort alphabetically):

	- [Datasets Format](#datasets-format)
	- [AudioCaps](#audiocaps)
	- [CSD](#csd)
	- [CustomSVCDataset](#customsvcdataset)
	- [Hi-Fi TTS](#hifitts)
	- [KiSing](#kising)
	- [LibriLight](#librilight)
	- [LibriTTS](#libritts)
	- [LJSpeech](#ljspeech)
	- [M4Singer](#m4singer)
	- [NUS-48E](#nus-48e)
	- [Opencpop](#opencpop)
	- [OpenSinger](#opensinger)
	- [Opera](#opera)
	- [PopBuTFy](#popbutfy)
	- [PopCS](#popcs)
	- [PJS](#pjs)
	- [SVCC](#svcc)
	- [VCTK](#vctk)

	The downloading link and the file structure tree of each dataset is displayed as follows.

	> Note: When using Docker to run Amphion, mount the dataset to the container is necessary after downloading. Check [Mount dataset in Docker container](./docker.md) for more details.

	## AudioCaps

	AudioCaps is a dataset of around 44K audio-caption pairs, where each audio clip corresponds to a caption with rich semantic information.

	Download AudioCaps dataset [here](https://github.com/cdjkim/audiocaps). The file structure looks like below:

	```plaintext
	[AudioCaps dataset path]
	┣ AudioCpas
	┃ ┣ wav
	┃ ┃ ┣ ---1_cCGK4M_0_10000.wav
	┃ ┃ ┣ ---lTs1dxhU_30000_40000.wav
	┃ ┃ ┣ ...
	```

	## CSD

	Download the official CSD dataset [here](https://zenodo.org/records/4785016). The file structure looks like below:

	```plaintext
	[CSD dataset path]
	┣ english
	┣ korean
	┣ utterances
	┃ ┣ en001a
	┃ ┃ ┣ {UtterenceID}.wav
	┃ ┣ en001b
	┃ ┣ en002a
	┃ ┣ en002b
	┃ ┣ ...
	┣ README
	```

	## CustomSVCDataset

	We support custom dataset for Singing Voice Conversion. Organize your data in the following structure to construct your own dataset:

	```plaintext
	[Your Custom Dataset Path]
	┣ singer1
	┃ ┣ song1
	┃ ┃ ┣ utterance1.wav
	┃ ┃ ┣ utterance2.wav
	┃ ┃ ┣ ...
	┃ ┣ song2
	┃ ┣ ...
	┣ singer2
	┣ ...
	```


	## Hi-Fi TTS

	Download the official Hi-Fi TTS dataset [here](https://www.openslr.org/109/). The file structure looks like below:

	```plaintext
	[Hi-Fi TTS dataset path]
	┣ audio
	┃ ┣ 11614_other {Speaker_ID}_{SNR_subset}
	┃ ┃ ┣ 10547 {Book_ID}
	┃ ┃ ┃ ┣ thousandnights8_04_anonymous_0001.flac
	┃ ┃ ┃ ┣ thousandnights8_04_anonymous_0003.flac
	┃ ┃ ┃ ┣ thousandnights8_04_anonymous_0004.flac
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┣ ...
	┃ ┣ ...
	┣ 92_manifest_clean_dev.json
	┣ 92_manifest_clean_test.json
	┣ 92_manifest_clean_train.json
	┣ ...
	┣ {Speaker_ID}_manifest_{SNR_subset}_{dataset_split}.json
	┣ ...
	┣ books_bandwidth.tsv
	┣ LICENSE.txt
	┣ readers_books_clean.txt
	┣ readers_books_other.txt
	┣ README.txt

	```

	## KiSing

	Download the official KiSing dataset [here](http://shijt.site/index.php/2021/05/16/kising-the-first-open-source-mandarin-singing-voice-synthesis-corpus/). The file structure looks like below:

	```plaintext
	[KiSing dataset path]
	┣ clean
	┃ ┣ 421
	┃ ┣ 422
	┃ ┣ ...
	```

	## LibriLight

	Download the official LibriLight dataset [here](https://github.com/facebookresearch/libri-light). The file structure looks like below:

	```plaintext
	[LibriTTS dataset path]
	┣ small (Subset)
	┃ ┣ 100 {Speaker_ID}
	┃ ┃ ┣ sea_fairies_0812_librivox_64kb_mp3 {Chapter_ID}
	┃ ┃ ┃ ┣ 01_baum_sea_fairies_64kb.flac
	┃ ┃ ┃ ┣ 02_baum_sea_fairies_64kb.flac
	┃ ┃ ┃ ┣ 03_baum_sea_fairies_64kb.flac
	┃ ┃ ┃ ┣ 22_baum_sea_fairies_64kb.flac
	┃ ┃ ┃ ┣ 01_baum_sea_fairies_64kb.json
	┃ ┃ ┃ ┣ 02_baum_sea_fairies_64kb.json
	┃ ┃ ┃ ┣ 03_baum_sea_fairies_64kb.json
	┃ ┃ ┃ ┣ 22_baum_sea_fairies_64kb.json
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┣ ...
	┃ ┣ ...
	┣ medium (Subset)
	┣ ...
	```

	## LibriTTS

	Download the official LibriTTS dataset [here](https://www.openslr.org/60/). The file structure looks like below:

	```plaintext
	[LibriTTS dataset path]
	┣ BOOKS.txt
	┣ CHAPTERS.txt
	┣ eval_sentences10.tsv
	┣ LICENSE.txt
	┣ NOTE.txt
	┣ reader_book.tsv
	┣ README_librispeech.txt
	┣ README_libritts.txt
	┣ speakers.tsv
	┣ SPEAKERS.txt
	┣ dev-clean (Subset)
	┃ ┣ 1272{Speaker_ID}
	┃ ┃ ┣ 128104 {Chapter_ID}
	┃ ┃ ┃ ┣ 1272_128104_000001_000000.normalized.txt
	┃ ┃ ┃ ┣ 1272_128104_000001_000000.original.txt
	┃ ┃ ┃ ┣ 1272_128104_000001_000000.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┃ ┣ 1272_128104.book.tsv
	┃ ┃ ┃ ┣ 1272_128104.trans.tsv
	┃ ┃ ┣ ...
	┃ ┣ ...
	┣ dev-other (Subset)
	┃ ┣ 116 (Speaker)
	┃ ┃ ┣ 288045 {Chapter_ID}
	┃ ┃ ┃ ┣ 116_288045_000003_000000.normalized.txt
	┃ ┃ ┃ ┣ 116_288045_000003_000000.original.txt
	┃ ┃ ┃ ┣ 116_288045_000003_000000.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┃ ┣ 116_288045.book.tsv
	┃ ┃ ┃ ┣ 116_288045.trans.tsv
	┃ ┃ ┣ ...
	┃ ┣ ...
	┃ ┣ ...
	┣ test-clean (Subset)
	┃ ┣ {Speaker_ID}
	┃ ┃ ┣ {Chapter_ID}
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.book.tsv
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.trans.tsv
	┃ ┃ ┣ ...
	┃ ┣ ...
	┣ test-other
	┃ ┣ {Speaker_ID}
	┃ ┃ ┣ {Chapter_ID}
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.book.tsv
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.trans.tsv
	┃ ┃ ┣ ...
	┃ ┣ ...
	┣ train-clean-100
	┃ ┣ {Speaker_ID}
	┃ ┃ ┣ {Chapter_ID}
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.book.tsv
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.trans.tsv
	┃ ┃ ┣ ...
	┃ ┣ ...
	┣ train-clean-360
	┃ ┣ {Speaker_ID}
	┃ ┃ ┣ {Chapter_ID}
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.book.tsv
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.trans.tsv
	┃ ┃ ┣ ...
	┃ ┣ ...
	┣ train-other-500
	┃ ┣ {Speaker_ID}
	┃ ┃ ┣ {Chapter_ID}
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.normalized.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.original.txt
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}_{Utterance_ID}.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.book.tsv
	┃ ┃ ┃ ┣ {Speaker_ID}_{Chapter_ID}.trans.tsv
	┃ ┃ ┣ ...
	┃ ┣ ...
	```

	## LJSpeech

	Download the official LJSpeech dataset [here](https://keithito.com/LJ-Speech-Dataset/). The file structure looks like below:

	```plaintext
	[LJSpeech dataset path]
	┣ metadata.csv
	┣ wavs
	┃ ┣ LJ001-0001.wav
	┃ ┣ LJ001-0002.wav
	┃ ┣ ...
	┣ README
	```

	## M4Singer

	Download the official M4Singer dataset [here](https://drive.google.com/file/d/1xC37E59EWRRFFLdG3aJkVqwtLDgtFNqW/view). The file structure looks like below:

	```plaintext
	[M4Singer dataset path]
	┣ {Singer_1}#{Song_1}
	┃ ┣ 0000.mid
	┃ ┣ 0000.TextGrid
	┃ ┣ 0000.wav
	┃ ┣ ...
	┣ {Singer_1}#{Song_2}
	┣ ...
	┣ {Singer_2}#{Song_1}
	┣ {Singer_2}#{Song_2}
	┣ ...
	┗ meta.json
	```

	## NUS-48E

	Download the official NUS-48E dataset [here](https://drive.google.com/drive/folders/12pP9uUl0HTVANU3IPLnumTJiRjPtVUMx). The file structure looks like below:

	```plaintext
	[NUS-48E dataset path]
	┣ {SpeakerID}
	┃ ┣ read
	┃ ┃ ┣ {SongID}.txt
	┃ ┃ ┣ {SongID}.wav
	┃ ┃ ┣ ...
	┃ ┣ sing
	┃ ┃ ┣ {SongID}.txt
	┃ ┃ ┣ {SongID}.wav
	┃ ┃ ┣ ...
	┣ ...
	┣ README.txt

	```

	## Opencpop

	Download the official Opencpop dataset [here](https://wenet.org.cn/opencpop/). The file structure looks like below:

	```plaintext
	[Opencpop dataset path]
	┣ midis
	┃ ┣ 2001.midi
	┃ ┣ 2002.midi
	┃ ┣ 2003.midi
	┃ ┣ ...
	┣ segments
	┃ ┣ wavs
	┃ ┃ ┣ 2001000001.wav
	┃ ┃ ┣ 2001000002.wav
	┃ ┃ ┣ 2001000003.wav
	┃ ┃ ┣ ...
	┃ ┣ test.txt
	┃ ┣ train.txt
	┃ ┗ transcriptions.txt
	┣ textgrids
	┃ ┣ 2001.TextGrid
	┃ ┣ 2002.TextGrid
	┃ ┣ 2003.TextGrid
	┃ ┣ ...
	┣ wavs
	┃ ┣ 2001.wav
	┃ ┣ 2002.wav
	┃ ┣ 2003.wav
	┃ ┣ ...
	┣ TERMS_OF_ACCESS
	┗ readme.md
	```

	## OpenSinger

	Download the official OpenSinger dataset [here](https://drive.google.com/file/d/1EofoZxvalgMjZqzUEuEdleHIZ6SHtNuK/view). The file structure looks like below:

	```plaintext
	[OpenSinger dataset path]
	┣ ManRaw
	┃ ┣ {Singer_1}_{Song_1}
	┃ ┃ ┣ {Singer_1}_{Song_1}_0.lab
	┃ ┃ ┣ {Singer_1}_{Song_1}_0.txt
	┃ ┃ ┣ {Singer_1}_{Song_1}_0.wav
	┃ ┃ ┣ ...
	┃ ┣ {Singer_1}_{Song_2}
	┃ ┣ ...
	┣ WomanRaw
	┣ LICENSE
	┗ README.md
	```

	## Opera

	Download the official Opera dataset [here](http://isophonics.net/SingingVoiceDataset). The file structure looks like below:

	```plaintext
	[Opera dataset path]
	┣ monophonic
	┃ ┣ chinese
	┃ ┃ ┣ {Gender}_{SingerID}
	┃ ┃ ┃ ┣ {Emotion}_{SongID}.wav
	┃ ┃ ┃ ┣ ...
	┃ ┃ ┣ ...
	┃ ┣ western
	┣ polyphonic
	┃ ┣ chinese
	┃ ┣ western
	┣ CrossculturalDataSet.xlsx
	```

	## PopBuTFy

	Download the official PopBuTFy dataset [here](https://github.com/MoonInTheRiver/NeuralSVB). The file structure looks like below:

	```plaintext
	[PopBuTFy dataset path]
	┣ data
	┃ ┣ {SingerID}#singing#{SongName}_Amateur
	┃ ┃ ┣ {SingerID}#singing#{SongName}_Amateur_{UtteranceID}.mp3
	┃ ┃ ┣ ...
	┃ ┣ {SingerID}#singing#{SongName}_Professional
	┃ ┃ ┣ {SingerID}#singing#{SongName}_Professional_{UtteranceID}.mp3
	┃ ┃ ┣ ...
	┣ text_labels
	┗ TERMS_OF_ACCESS
	```

	## PopCS

	Download the official PopCS dataset [here](https://github.com/MoonInTheRiver/DiffSinger/blob/master/resources/apply_form.md). The file structure looks like below:

	```plaintext
	[PopCS dataset path]
	┣ popcs
	┃ ┣ popcs-{SongName}
	┃ ┃ ┣ {UtteranceID}_ph.txt
	┃ ┃ ┣ {UtteranceID}_wf0.wav
	┃ ┃ ┣ {UtteranceID}.TextGrid
	┃ ┃ ┣ {UtteranceID}.txt
	┃ ┃ ┣ ...
	┃ ┣ ...
	┗ TERMS_OF_ACCESS
	```

	## PJS

	Download the official PJS dataset [here](https://sites.google.com/site/shinnosuketakamichi/research-topics/pjs_corpus). The file structure looks like below:

	```plaintext
	[PJS dataset path]
	┣ PJS_corpus_ver1.1
	┃ ┣ background_noise
	┃ ┣ pjs{SongID}
	┃ ┃ ┣ pjs{SongID}_song.wav
	┃ ┃ ┣ pjs{SongID}_speech.wav
	┃ ┃ ┣ pjs{SongID}.lab
	┃ ┃ ┣ pjs{SongID}.mid
	┃ ┃ ┣ pjs{SongID}.musicxml
	┃ ┃ ┣ pjs{SongID}.txt
	┃ ┣ ...
	```

	## SVCC

	Download the official SVCC dataset [here](https://github.com/lesterphillip/SVCC23_FastSVC/tree/main/egs/generate_dataset). The file structure looks like below:

	```plaintext
	[SVCC dataset path]
	┣ Data
	┃ ┣ CDF1
	┃ ┃ ┣ 10001.wav
	┃ ┃ ┣ 10002.wav
	┃ ┃ ┣ ...
	┃ ┣ CDM1
	┃ ┣ IDF1
	┃ ┣ IDM1
	┗ README.md
	```

	## VCTK

	Download the official VCTK dataset [here](https://datashare.ed.ac.uk/handle/10283/3443). The file structure looks like below:

	```plaintext
	[VCTK dataset path]
	┣ txt
	┃ ┣ {Speaker_1}
	┃ ┃ ┣ {Speaker_1}_001.txt
	┃ ┃ ┣ {Speaker_1}_002.txt
	┃ ┃ ┣ ...
	┃ ┣ {Speaker_2}
	┃ ┣ ...
	┣ wav48_silence_trimmed
	┃ ┣ {Speaker_1}
	┃ ┃ ┣ {Speaker_1}_001_mic1.flac
	┃ ┃ ┣ {Speaker_1}_001_mic2.flac
	┃ ┃ ┣ {Speaker_1}_002_mic1.flac
	┃ ┃ ┣ ...
	┃ ┣ {Speaker_2}
	┃ ┣ ...
	┣ speaker-info.txt
	┗ update.txt
	```