Spaces:
Build error
Build error
File size: 25,124 Bytes
f1f433f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 |
# VOICEVOX ENGINE
[![build](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/build.yml/badge.svg)](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/build.yml)
[![releases](https://img.shields.io/github/v/release/VOICEVOX/voicevox_engine)](https://github.com/VOICEVOX/voicevox_engine/releases)
[![discord](https://img.shields.io/discord/879570910208733277?color=5865f2&label=&logo=discord&logoColor=ffffff)](https://discord.gg/WMwWetrzuh)
[![test](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/test.yml/badge.svg)](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/test.yml)
[![Coverage Status](https://coveralls.io/repos/github/VOICEVOX/voicevox_engine/badge.svg)](https://coveralls.io/github/VOICEVOX/voicevox_engine)
[![build-docker](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/build-docker.yml/badge.svg)](https://github.com/VOICEVOX/voicevox_engine/actions/workflows/build-docker.yml)
[![docker](https://img.shields.io/docker/pulls/voicevox/voicevox_engine)](https://hub.docker.com/r/voicevox/voicevox_engine)
[VOICEVOX](https://voicevox.hiroshiba.jp/) ã®ãšã³ãžã³ã§ãã
å®æ
㯠HTTP ãµãŒããŒãªã®ã§ããªã¯ãšã¹ããéä¿¡ããã°ããã¹ãé³å£°åæã§ããŸãã
ïŒãšãã£ã¿ãŒã¯ [VOICEVOX](https://github.com/VOICEVOX/voicevox/) ã
ã³ã¢ã¯ [VOICEVOX CORE](https://github.com/VOICEVOX/voicevox_core/) ã
å
šäœæ§æ㯠[ãã¡ã](https://github.com/VOICEVOX/voicevox/blob/main/docs/%E5%85%A8%E4%BD%93%E6%A7%8B%E6%88%90.md) ã«è©³çŽ°ããããŸããïŒ
## ããŠã³ããŒã
[ãã¡ã](https://github.com/VOICEVOX/voicevox_engine/releases/latest)ãã察å¿ãããšã³ãžã³ãããŠã³ããŒãããŠãã ããã
## API ããã¥ã¡ã³ã
[API ããã¥ã¡ã³ã](https://voicevox.github.io/voicevox_engine/api/)ããåç
§ãã ããã
VOICEVOX ãšã³ãžã³ãããã¯ãšãã£ã¿ãèµ·åããç¶æ
㧠http://127.0.0.1:50021/docs ã«ã¢ã¯ã»ã¹ãããšãèµ·åäžã®ãšã³ãžã³ã®ããã¥ã¡ã³ãã確èªã§ããŸãã
ä»åŸã®æ¹éãªã©ã«ã€ããŠã¯ [VOICEVOX é³å£°åæãšã³ãžã³ãšã®é£æº](./docs/VOICEVOXé³å£°åæãšã³ãžã³ãšã®é£æº.md) ãåèã«ãªããããããŸããã
ãªã¯ãšã¹ãã»ã¬ã¹ãã³ã¹ã®æåã³ãŒãã¯ãã¹ãŠ UTF-8 ã§ãã
### HTTP ãªã¯ãšã¹ãã§é³å£°åæãããµã³ãã«ã³ãŒã
```bash
echo -n "ããã«ã¡ã¯ãé³å£°åæã®äžçãžãããã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1"\
--get --data-urlencode [email protected] \
> query.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio.wav
```
çæãããé³å£°ã¯ãµã³ããªã³ã°ã¬ãŒãã 24000Hz ãšå°ãç¹æ®ãªãããé³å£°ãã¬ãŒã€ãŒã«ãã£ãŠã¯åçã§ããªãå ŽåããããŸãã
`speaker` ã«æå®ããå€ã¯ `/speakers` ãšã³ããã€ã³ãã§åŸããã `style_id` ã§ããäºææ§ã®ããã« `speaker` ãšããååã«ãªã£ãŠããŸãã
### èªã¿æ¹ã AquesTalk èšæ³ã§ååŸã»ä¿®æ£ãããµã³ãã«ã³ãŒã
`/audio_query`ã®ã¬ã¹ãã³ã¹ã«ã¯ãšã³ãžã³ãå€æããèªã¿æ¹ã AquesTalk ã©ã€ã¯ãªèšæ³([æ¬å®¶ã®èšæ³](https://www.a-quest.com/archive/manual/siyo_onseikigou.pdf)ãšã¯äžéšç°ãªããŸã)ã§èšé²ãããŠããŸãã
èšæ³ã¯æ¬¡ã®ã«ãŒã«ã«åŸããŸãã
- å
šãŠã®ã«ãã¯ã«ã¿ã«ãã§èšè¿°ããã
- ã¢ã¯ã»ã³ãå¥ã¯`/`ãŸãã¯`ã`ã§åºåãã`ã`ã§åºåã£ãå Žåã«éãç¡é³åºéãæ¿å
¥ãããã
- ã«ãã®æåã«`_`ãå
¥ãããšãã®ã«ãã¯ç¡å£°åããã
- ã¢ã¯ã»ã³ãäœçœ®ã`'`ã§æå®ãããå
šãŠã®ã¢ã¯ã»ã³ãå¥ã«ã¯ã¢ã¯ã»ã³ãäœçœ®ã 1 ã€æå®ããå¿
èŠãããã
- ã¢ã¯ã»ã³ãå¥æ«ã«`ïŒ`(å
šè§)ãå
¥ããããšã«ããçåæã®çºé³ãã§ãã
```bash
# èªãŸãããæç« ãutf-8ã§text.txtã«æžãåºã
echo -n "ãã£ãŒãã©ãŒãã³ã°ã¯äžèœè¬ã§ã¯ãããŸãã" >text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=1" \
--get --data-urlencode [email protected] \
> query.json
cat query.json | grep -o -E "\"kana\":\".*\""
# çµæ... "kana":"ãã£'ã€ã/ã©'ã¢ãã³ã°ã¯/ãã³ããªã€ã¯ãã¯ã¢ãªãã»'ã³"
# "ãã£ã€ãã©'ã¢ãã³ã°ã¯/ãã³ã'ãªã€ã¯ãã¯/ã¢ãªãã»'ã³"ãšèªãŸãããã®ã§ã
# is_kana=trueãã€ããŠã€ã³ãããŒã·ã§ã³ãååŸãnewphrases.jsonã«ä¿å
echo -n "ãã£ã€ãã©'ã¢ãã³ã°ã¯/ãã³ã'ãªã€ã¯ãã¯/ã¢ãªãã»'ã³" > kana.txt
curl -s \
-X POST \
"127.0.0.1:50021/accent_phrases?speaker=1&is_kana=true" \
--get --data-urlencode [email protected] \
> newphrases.json
# query.jsonã®"accent_phrases"ã®å
容ãnewphrases.jsonã®å
容ã«çœ®ãæãã
cat query.json | sed -e "s/\[{.*}\]/$(cat newphrases.json)/g" > newquery.json
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @newquery.json \
"127.0.0.1:50021/synthesis?speaker=1" \
> audio.wav
```
### ãŠãŒã¶ãŒèŸæžæ©èœã«ã€ããŠ
APIãããŠãŒã¶ãŒèŸæžã®åç
§ãåèªã®è¿œå ãç·šéãåé€ãè¡ãããšãã§ããŸãã
#### åç
§
`/user_dict`ã«GETãªã¯ãšã¹ããæããããšã§ãŠãŒã¶ãŒèŸæžã®äžèŠ§ãååŸããããšãã§ããŸãã
```bash
curl -s -X GET "127.0.0.1:50021/user_dict"
```
#### åèªè¿œå
`/user_dict_word`ã«POSTãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã«åèªãè¿œå ããããšãã§ããŸãã
URLãã©ã¡ãŒã¿ãšããŠã以äžãå¿
èŠã§ãã
- surface ïŒèŸæžã«ç»é²ããåèªïŒ
- pronunciation ïŒã«ã¿ã«ãã§ã®èªã¿æ¹ïŒ
- accent_type ïŒã¢ã¯ã»ã³ãæ žäœçœ®ãæŽæ°ïŒ
ã¢ã¯ã»ã³ãæ žäœçœ®ã«ã€ããŠã¯ããã¡ãã®æç« ãåèã«ãªãããšæããŸãã
ãåãšãªã£ãŠããæ°åã®éšåãã¢ã¯ã»ã³ãæ žäœçœ®ã«ãªããŸãã
https://tdmelodic.readthedocs.io/ja/latest/pages/introduction.html
æåããå Žåã®è¿ãå€ã¯åèªã«å²ãåœãŠãããUUIDã®æååã«ãªããŸãã
```bash
surface="test"
pronunciation="ãã¹ã"
accent_type="1"
curl -s -X POST "127.0.0.1:50021/user_dict_word" \
--get \
--data-urlencode "surface=$surface" \
--data-urlencode "pronunciation=$pronunciation" \
--data-urlencode "accent_type=$accent_type"
```
#### åèªä¿®æ£
`/user_dict_word/{word_uuid}`ã«PUTãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã®åèªãä¿®æ£ããããšãã§ããŸãã
URLãã©ã¡ãŒã¿ãšããŠã以äžãå¿
èŠã§ãã
- surface ïŒèŸæžã«ç»é²ããã¯ãŒãïŒ
- pronunciation ïŒã«ã¿ã«ãã§ã®èªã¿æ¹ïŒ
- accent_type ïŒã¢ã¯ã»ã³ãæ žäœçœ®ãæŽæ°ïŒ
word_uuidã¯åèªè¿œå æã«ç¢ºèªã§ããã»ãããŠãŒã¶ãŒèŸæžãåç
§ããããšã§ã確èªã§ããŸãã
æåããå Žåã®è¿ãå€ã¯`204 No Content`ã«ãªããŸãã
```bash
surface="test2"
pronunciation="ãã¹ãããŒ"
accent_type="2"
# ç°å¢ã«ãã£ãŠword_uuidã¯é©å®æžãæããŠãã ãã
word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d"
curl -s -X PUT "127.0.0.1:50021/user_dict_word/$word_uuid" \
--get \
--data-urlencode "surface=$surface" \
--data-urlencode "pronunciation=$pronunciation" \
--data-urlencode "accent_type=$accent_type"
```
#### åèªåé€
`/user_dict_word/{word_uuid}`ã«DELETEãªã¯ãšã¹ããæããäºã§ãŠãŒã¶ãŒèŸæžã®åèªãåé€ããããšãã§ããŸãã
word_uuidã¯åèªè¿œå æã«ç¢ºèªã§ããã»ãããŠãŒã¶ãŒèŸæžãåç
§ããããšã§ã確èªã§ããŸãã
æåããå Žåã®è¿ãå€ã¯`204 No Content`ã«ãªããŸãã
```bash
# ç°å¢ã«ãã£ãŠword_uuidã¯é©å®æžãæããŠãã ãã
word_uuid="cce59b5f-86ab-42b9-bb75-9fd3407f1e2d"
curl -s -X DELETE "127.0.0.1:50021/user_dict_word/$word_uuid"
```
### ããªã»ããæ©èœã«ã€ããŠ
`presets.yaml`ãç·šéããããšã§è©±è
ã話éãªã©ã®ããªã»ããã䜿ãããšãã§ããŸãã
```bash
echo -n "ããªã»ãããããŸã掻çšããã°ããµãŒãããŒãã£éã§åãèšå®ã䜿ãããšãã§ããŸã" >text.txt
# ããªã»ããæ
å ±ãååŸ
curl -s -X GET "127.0.0.1:50021/presets" > presets.json
preset_id=$(cat presets.json | sed -r 's/^.+"id"\:\s?([0-9]+?).+$/\1/g')
style_id=$(cat presets.json | sed -r 's/^.+"style_id"\:\s?([0-9]+?).+$/\1/g')
# AudioQueryã®ååŸ
curl -s \
-X POST \
"127.0.0.1:50021/audio_query_from_preset?preset_id=$preset_id"\
--get --data-urlencode [email protected] \
> query.json
# é³å£°åæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=$style_id" \
> audio.wav
```
- `speaker_uuid`ã¯ã`/speakers`ã§ç¢ºèªã§ããŸã
- `id`ã¯éè€ããŠã¯ãããŸãã
- ãšã³ãžã³èµ·ååŸã«ãã¡ã€ã«ãæžãæãããšãšã³ãžã³ã«åæ ãããŸã
### 2 人ã®è©±è
ã§ã¢ãŒãã£ã³ã°ãããµã³ãã«ã³ãŒã
`/synthesis_morphing`ã§ã¯ã2 人ã®è©±è
ã§ããããåæãããé³å£°ãå
ã«ãã¢ãŒãã£ã³ã°ããé³å£°ãçæããŸãã
```bash
echo -n "ã¢ãŒãã£ã³ã°ãå©çšããããšã§ãïŒã€ã®å£°ãæ··ããããšãã§ããŸãã" > text.txt
curl -s \
-X POST \
"127.0.0.1:50021/audio_query?speaker=0"\
--get --data-urlencode [email protected] \
> query.json
# å
ã®è©±è
ã§ã®åæçµæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis?speaker=0" \
> audio.wav
export MORPH_RATE=0.5
# 話è
2人åã®é³å£°åæ+WORLDã«ããé³å£°åæãå
¥ãããæéãæããã®ã§æ³šæ
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis_morphing?base_speaker=0&target_speaker=1&morph_rate=$MORPH_RATE" \
> audio.wav
export MORPH_RATE=0.9
# queryãbase_speakerãtarget_speakerãåãå Žåã¯ãã£ãã·ã¥ã䜿çšãããããæ¯èŒçé«éã«çæããã
curl -s \
-H "Content-Type: application/json" \
-X POST \
-d @query.json \
"127.0.0.1:50021/synthesis_morphing?base_speaker=0&target_speaker=1&morph_rate=$MORPH_RATE" \
> audio.wav
```
### 話è
ã®è¿œå æ
å ±ãååŸãããµã³ãã«ã³ãŒã
è¿œå æ
å ±ã®äžã® portrait.png ãååŸããã³ãŒãã§ãã
ïŒ[jq](https://stedolan.github.io/jq/)ã䜿çšã㊠json ãããŒã¹ããŠããŸããïŒ
```bash
curl -s -X GET "127.0.0.1:50021/speaker_info?speaker_uuid=7ffcb7ce-00ec-4bdc-82cd-45a8889e43ff" \
| jq -r ".portrait" \
| base64 -d \
> portrait.png
```
### ãã£ã³ã»ã«å¯èœãªé³å£°åæ
`/cancellable_synthesis`ã§ã¯éä¿¡ãåæããå Žåã«å³åº§ã«èšç®ãªãœãŒã¹ãéæŸãããŸãã
(`/synthesis`ã§ã¯éä¿¡ãåæããŠãæåŸãŸã§é³å£°åæã®èšç®ãè¡ãããŸã)
ãã® API ã¯å®éšçæ©èœã§ããããšã³ãžã³èµ·åæã«åŒæ°ã§`--enable_cancellable_synthesis`ãæå®ããªããšæå¹åãããŸããã
é³å£°åæã«å¿
èŠãªãã©ã¡ãŒã¿ã¯`/synthesis`ãšåæ§ã§ãã
### CORSèšå®
VOICEVOXã§ã¯ã»ãã¥ãªãã£ä¿è·ã®ãã`localhost`ã»`127.0.0.1`ã»`app://`ã»Originãªã以å€ã®Originãããªã¯ãšã¹ããåãå
¥ããªãããã«ãªã£ãŠããŸãã
ãã®ãããäžéšã®ãµãŒãããŒãã£ã¢ããªããã®ã¬ã¹ãã³ã¹ãåãåããªãå¯èœæ§ããããŸãã
ãããåé¿ããæ¹æ³ãšããŠããšã³ãžã³ããèšå®ã§ããUIãçšæããŠããŸãã
#### èšå®æ¹æ³
1. <http://127.0.0.1:50021/setting> ã«ã¢ã¯ã»ã¹ããŸãã
2. å©çšããã¢ããªã«åãããŠèšå®ãå€æŽãè¿œå ããŠãã ããã
3. ä¿åãã¿ã³ãæŒããŠãå€æŽã確å®ããŠãã ããã
4. èšå®ã®é©çšã«ã¯ãšã³ãžã³ã®åèµ·åãå¿
èŠã§ããå¿
èŠã«å¿ããŠåèµ·åãããŠãã ããã
## ã¢ããããŒã
ãšã³ãžã³ãã£ã¬ã¯ããªå
ã«ãããã¡ã€ã«ãå
šãŠæ¶å»ããæ°ãããã®ã«çœ®ãæããŠãã ããã
## Docker ã€ã¡ãŒãž
### CPU
```bash
docker pull voicevox/voicevox_engine:cpu-ubuntu20.04-latest
docker run --rm -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:cpu-ubuntu20.04-latest
```
### GPU
```bash
docker pull voicevox/voicevox_engine:nvidia-ubuntu20.04-latest
docker run --rm --gpus all -p '127.0.0.1:50021:50021' voicevox/voicevox_engine:nvidia-ubuntu20.04-latest
```
#### ãã©ãã«ã·ã¥ãŒãã£ã³ã°
GPUçãå©çšããå Žåãç°å¢ã«ãã£ãŠãšã©ãŒãçºçããããšããããŸãããã®å Žåã`--runtime=nvidia`ã`docker run`ã«ã€ããŠå®è¡ãããšè§£æ±ºã§ããããšããããŸãã
## è²¢ç®è
ã®æ¹ãž
Issue ã解決ãããã«ãªã¯ãšã¹ããäœæãããéã¯ãå¥ã®æ¹ãšåã Issue ã«åãçµãããšãé¿ããããã
Issue åŽã§åãçµã¿å§ããããšãäŒããããæåã« Draft ãã«ãªã¯ãšã¹ããäœæããŠãã ããã
[VOICEVOX éå
¬åŒ Discord ãµãŒããŒ](https://discord.gg/WMwWetrzuh)ã«ãŠãéçºã®è°è«ãéè«ãè¡ã£ãŠããŸããæ°è»œã«ãåå ãã ããã
## ç°å¢æ§ç¯
`Python 3.11.3` ãçšããŠéçºãããŠããŸãã
ã€ã³ã¹ããŒã«ããã«ã¯ãå OS ããšã® C/C++ ã³ã³ãã€ã©ãCMake ãå¿
èŠã«ãªããŸãã
```bash
# éçºã«å¿
èŠãªã©ã€ãã©ãªã®ã€ã³ã¹ããŒã«
python -m pip install -r requirements-dev.txt -r requirements-test.txt
# ãšããããå®è¡ãããã ããªã代ããã«ãã¡ã
python -m pip install -r requirements.txt
```
## å®è¡
ã³ãã³ãã©ã€ã³åŒæ°ã®è©³çŽ°ã¯ä»¥äžã®ã³ãã³ãã§ç¢ºèªããŠãã ããã
```bash
python run.py --help
```
```bash
# 補åç VOICEVOX ã§ãµãŒããŒãèµ·å
VOICEVOX_DIR="C:/path/to/voicevox" # 補åç VOICEVOX ãã£ã¬ã¯ããªã®ãã¹
python run.py --voicevox_dir=$VOICEVOX_DIR
```
<!-- å·®ãæ¿ãå¯èœãªé³å£°ã©ã€ãã©ãªãŸãã¯ãã®ä»æ§ãå
¬éããããã³ã¡ã³ããå€ã
```bash
# é³å£°ã©ã€ãã©ãªãå·®ãæ¿ãã
VOICELIB_DIR="C:/path/to/your/tts-model"
python run.py --voicevox_dir=$VOICEVOX_DIR --voicelib_dir=$VOICELIB_DIR
```
-->
```bash
# ã¢ãã¯ã§ãµãŒããŒèµ·å
python run.py --enable_mock
```
```bash
# ãã°ãUTF8ã«å€æŽ
python run.py --output_log_utf8
# ããã㯠VV_OUTPUT_LOG_UTF8=1 python run.py
```
### CPU ã¹ã¬ããæ°ãæå®ãã
CPU ã¹ã¬ããæ°ãæªæå®ã®å Žåã¯ãè«çã³ã¢æ°ã®ååãç©çã³ã¢æ°ã䜿ãããŸããïŒæ®ã©ã® CPU ã§ãããã¯å
šäœã®åŠçèœåã®ååã§ãïŒ
ãã IaaS äžã§å®è¡ããŠããããå°çšãµãŒããŒã§å®è¡ããŠããå Žåãªã©ã
ãšã³ãžã³ã䜿ãåŠçèœåã調ç¯ãããå Žåã¯ãCPU ã¹ã¬ããæ°ãæå®ããããšã§å®çŸã§ããŸãã
- å®è¡æåŒæ°ã§æå®ãã
```bash
python run.py --voicevox_dir=$VOICEVOX_DIR --cpu_num_threads=4
```
- ç°å¢å€æ°ã§æå®ãã
```bash
export VV_CPU_NUM_THREADS=4
python run.py --voicevox_dir=$VOICEVOX_DIR
```
### éå»ã®ããŒãžã§ã³ã®ã³ã¢ã䜿ã
VOICEVOX Core 0.5.4以éã®ã³ã¢ã䜿çšããäºãå¯èœã§ãã
Macã§ã®libtorchçã³ã¢ã®ãµããŒãã¯ããŠããŸããã
#### éå»ã®ãã€ããªãæå®ãã
補åçVOICEVOXãããã¯ã³ã³ãã€ã«æžã¿ãšã³ãžã³ã®ãã£ã¬ã¯ããªã`--voicevox_dir`åŒæ°ã§æå®ãããšããã®ããŒãžã§ã³ã®ã³ã¢ã䜿çšãããŸãã
```bash
python run.py --voicevox_dir="/path/to/voicevox"
```
Macã§ã¯ã`DYLD_LIBRARY_PATH`ã®æå®ãå¿
èŠã§ãã
```bash
DYLD_LIBRARY_PATH="/path/to/voicevox" python run.py --voicevox_dir="/path/to/voicevox"
```
#### é³å£°ã©ã€ãã©ãªãçŽæ¥æå®ãã
[VOICEVOX Coreã®zipãã¡ã€ã«](https://github.com/VOICEVOX/voicevox_core/releases)ã解åãããã£ã¬ã¯ããªã`--voicelib_dir`åŒæ°ã§æå®ããŸãã
ãŸããã³ã¢ã®ããŒãžã§ã³ã«åãããŠã[libtorch](https://pytorch.org/)ã[onnxruntime](https://github.com/microsoft/onnxruntime)ã®ãã£ã¬ã¯ããªã`--runtime_dir`åŒæ°ã§æå®ããŸãã
ãã ããã·ã¹ãã ã®æ¢çŽ¢ãã¹äžã«libtorchãonnxruntimeãããå Žåã`--runtime_dir`åŒæ°ã®æå®ã¯äžèŠã§ãã
`--voicelib_dir`åŒæ°ã`--runtime_dir`åŒæ°ã¯è€æ°å䜿çšå¯èœã§ãã
APIãšã³ããã€ã³ãã§ã³ã¢ã®ããŒãžã§ã³ãæå®ããå Žåã¯`core_version`åŒæ°ãæå®ããŠãã ãããïŒæªæå®ã®å Žåã¯ææ°ã®ã³ã¢ã䜿çšãããŸãïŒ
```bash
python run.py --voicelib_dir="/path/to/voicevox_core" --runtime_dir="/path/to/libtorch_or_onnx"
```
Macã§ã¯ã`--runtime_dir`åŒæ°ã®ä»£ããã«`DYLD_LIBRARY_PATH`ã®æå®ãå¿
èŠã§ãã
```bash
DYLD_LIBRARY_PATH="/path/to/onnx" python run.py --voicelib_dir="/path/to/voicevox_core"
```
## ã³ãŒããã©ãŒããã
ãã®ãœãããŠã§ã¢ã§ã¯ããªã¢ãŒãã«ããã·ã¥ããåã«ã³ãŒããã©ãŒãããã確èªããä»çµã¿(éç解æããŒã«)ãå©çšã§ããŸãã
å©çšããã«ã¯ãéçºã«å¿
èŠãªã©ã€ãã©ãªã®ã€ã³ã¹ããŒã«ã«å ããŠã以äžã®ã³ãã³ããå®è¡ããŠãã ããã
ãã«ãªã¯ãšã¹ããäœæããéã¯ãå©çšããããšãæšå¥šããŸãã
```bash
pre-commit install -t pre-push
```
ãšã©ãŒãåºãéã¯ã以äžã®ã³ãã³ãã§ä¿®æ£ããããšãå¯èœã§ãããªããå®å
šã«ä¿®æ£ã§ããããã§ã¯ãªãã®ã§æ³šæããŠãã ããã
```bash
pysen run format lint
```
## ã¿ã€ããã§ãã¯
[typos](https://github.com/crate-ci/typos) ã䜿ã£ãŠã¿ã€ãã®ãã§ãã¯ãè¡ã£ãŠããŸãã
[typos ãã€ã³ã¹ããŒã«](https://github.com/crate-ci/typos#install) ããåŸ
```bash
typos
```
ã§ã¿ã€ããã§ãã¯ãè¡ããŸãã
ãã誀å€å®ããã§ãã¯ããé€å€ãã¹ããã¡ã€ã«ãããã°
[èšå®ãã¡ã€ã«ã®èª¬æ](https://github.com/crate-ci/typos#false-positives) ã«åŸã£ãŠ`_typos.toml`ãç·šéããŠãã ããã
## API ããã¥ã¡ã³ãã®ç¢ºèª
[API ããã¥ã¡ã³ã](https://voicevox.github.io/voicevox_engine/api/)ïŒå®äœã¯`docs/api/index.html`ïŒã¯èªåã§æŽæ°ãããŸãã
次ã®ã³ãã³ã㧠API ããã¥ã¡ã³ããæåã§äœæããããšãã§ããŸãã
```bash
python make_docs.py
```
## ãã«ã
ãã®æ¹æ³ã§ãã«ããããã®ã¯ããªãªãŒã¹ã§å
¬éãããŠãããã®ãšã¯ç°ãªããŸãã
ãŸããGPUã§å©çšããã«ã¯cuDNNãCUDAãDirectMLãªã©ã®ã©ã€ãã©ãªãè¿œå ã§å¿
èŠãšãªããŸãã
```bash
python -m pip install -r requirements-dev.txt
OUTPUT_LICENSE_JSON_PATH=licenses.json \
bash build_util/create_venv_and_generate_licenses.bash
# ãã«ãèªäœã¯LIBCORE_PATHåã³LIBONNXRUNTIME_PATHã®æå®ããªããŠãå¯èœã§ã
LIBCORE_PATH="/path/to/libcore" \
LIBONNXRUNTIME_PATH="/path/to/libonnxruntime" \
pyinstaller --noconfirm run.spec
```
## äŸåé¢ä¿
### æŽæ°
[Poetry](https://python-poetry.org/) ãçšããŠäŸåã©ã€ãã©ãªã®ããŒãžã§ã³ãåºå®ããŠããŸãã
以äžã®ã³ãã³ãã§æäœã§ããŸã:
```bash
# ããã±ãŒãžãè¿œå ããå Žå
poetry add `ããã±ãŒãžå`
poetry add --group dev `ããã±ãŒãžå` # éçºäŸåã®è¿œå
poetry add --group test `ããã±ãŒãžå` # ãã¹ãäŸåã®è¿œå
# ããã±ãŒãžãã¢ããããŒãããå Žå
poetry update `ããã±ãŒãžå`
poetry update # å
šéšæŽæ°
# requirements.txtã®æŽæ°
poetry export --without-hashes -o requirements.txt # ãã¡ããæŽæ°ããå Žåã¯äžïŒã€ãæŽæ°ããå¿
èŠããããŸãã
poetry export --without-hashes --with dev -o requirements-dev.txt
poetry export --without-hashes --with test -o requirements-test.txt
poetry export --without-hashes --with license -o requirements-license.txt
```
### ã©ã€ã»ã³ã¹
äŸåã©ã€ãã©ãªã¯ãã³ã¢ãã«ãæã«ãªã³ã¯ããŠäžäœåããŠããã³ã¢éšã®ã³ãŒãéå
¬é OKããªã©ã€ã»ã³ã¹ãæã€å¿
èŠããããŸãã
äž»èŠã©ã€ã»ã³ã¹ã®å¯åŠã¯ä»¥äžã®éãã§ãã
- MIT/Apache/BSD-3: OK
- LGPL: OK ïŒã³ã¢ãšåçåé¢ãããŠããããïŒ
- GPL: NG ïŒå
šé¢é£ã³ãŒãã®å
¬éãå¿
èŠãªããïŒ
## ãŠãŒã¶ãŒèŸæžã®æŽæ°ã«ã€ããŠ
以äžã®ã³ãã³ã㧠openjtalk ã®ãŠãŒã¶ãŒèŸæžãã³ã³ãã€ã«ã§ããŸãã
```bash
python -c "import pyopenjtalk; pyopenjtalk.create_user_dict('default.csv','user.dic')"
```
## ãã«ããšã³ãžã³æ©èœã«é¢ããŠ
VOICEVOX ãšãã£ã¿ãŒã§ã¯ãè€æ°ã®ãšã³ãžã³ãåæã«èµ·åããããšãã§ããŸãã
ãã®æ©èœãå©çšããããšã§ãèªäœã®é³å£°åæãšã³ãžã³ãæ¢åã®é³å£°åæãšã³ãžã³ã VOICEVOX ãšãã£ã¿ãŒäžã§åããããšãå¯èœã§ãã
<img src="./docs/res/ãã«ããšã³ãžã³æŠå¿µå³.svg" width="320">
<details>
### ãã«ããšã³ãžã³æ©èœã®ä»çµã¿
VOICEVOX API ã«æºæ ããè€æ°ã®ãšã³ãžã³ã® Web API ãããŒããåããŠèµ·åããçµ±äžçã«æ±ãããšã§ãã«ããšã³ãžã³æ©èœãå®çŸããŠããŸãã
ãšãã£ã¿ãŒãããããã®ãšã³ãžã³ãå®è¡ãã€ããªçµç±ã§èµ·åããEngineID ãšçµã³ã€ããŠèšå®ãç¶æ
ãåå¥ç®¡çããŸãã
### ãã«ããšã³ãžã³æ©èœãžã®å¯Ÿå¿æ¹æ³
VOICEVOX API æºæ ãšã³ãžã³ãèµ·åããå®è¡ãã€ããªãäœãããšã§å¯Ÿå¿ãå¯èœã§ãã
VOICEVOX ENGINE ãªããžããªã fork ããäžéšã®æ©èœãæ¹é ããã®ãç°¡åã§ãã
æ¹é ãã¹ãç¹ã¯ãšã³ãžã³æ
å ±ã»ãã£ã©ã¯ã¿ãŒæ
å ±ã»é³å£°åæã®ïŒç¹ã§ãã
ãšã³ãžã³ã®æ
å ±ã¯ãšã³ãžã³ãããã§ã¹ãïŒ`engine_manifest.json`ïŒã§ç®¡çãããŠããŸãã
ãããã§ã¹ããã¡ã€ã«å
ã®æ
å ±ãèŠãŠé©å®å€æŽããŠãã ããã
é³å£°åæææ³ã«ãã£ãŠã¯ãäŸãã°ã¢ãŒãã£ã³ã°æ©èœãªã©ãVOICEVOX ãšåãæ©èœãæã€ããšãã§ããªãå ŽåããããŸãã
ãã®å Žåã¯ãããã§ã¹ããã¡ã€ã«å
ã®`supported_features`å
ã®æ
å ±ãé©å®å€æŽããŠãã ããã
ãã£ã©ã¯ã¿ãŒæ
å ±ã¯`speaker_info`ãã£ã¬ã¯ããªå
ã®ãã¡ã€ã«ã§ç®¡çãããŠããŸãã
ãããŒã®ã¢ã€ã³ã³ãªã©ãçšæãããŠããã®ã§é©å®å€æŽããŠãã ããã
é³å£°åæã¯`voicevox_engine/synthesis_engine/synthesis_engine.py`ã§è¡ãããŠããŸãã
VOICEVOX API ã§ã®é³å£°åæã¯ããšã³ãžã³åŽã§é³å£°åæã¯ãšãª`AudioQuery`ã®åæå€ãäœæããŠãŠãŒã¶ãŒã«è¿ãããŠãŒã¶ãŒãå¿
èŠã«å¿ããŠã¯ãšãªãç·šéããããšããšã³ãžã³ãã¯ãšãªã«åŸã£ãŠé³å£°åæããããšã§å®çŸããŠããŸãã
ã¯ãšãªäœæã¯`/audio_query`ãšã³ããã€ã³ãã§ãé³å£°åæã¯`/synthesis`ãšã³ããã€ã³ãã§è¡ã£ãŠãããæäœãã®ïŒã€ã«å¯Ÿå¿ããã° VOICEVOX API ã«æºæ ããããšã«ãªããŸãã
### ãã«ããšã³ãžã³æ©èœå¯Ÿå¿ãšã³ãžã³ã®é
åžæ¹æ³
VVPP ãã¡ã€ã«ãšããŠé
åžããã®ãããããã§ãã
VVPP ã¯ãVOICEVOX ãã©ã°ã€ã³ããã±ãŒãžãã®ç¥ã§ãäžèº«ã¯ãã«ããããšã³ãžã³ãªã©ãå«ãã ãã£ã¬ã¯ããªã® Zip ãã¡ã€ã«ã§ãã
æ¡åŒµåã`.vvpp`ã«ãããšãããã«ã¯ãªãã¯ã§ VOICEVOX ãšãã£ã¿ãŒã«ã€ã³ã¹ããŒã«ã§ããŸãã
ãšãã£ã¿ãŒåŽã¯åãåã£ã VVPP ãã¡ã€ã«ãããŒã«ã«ãã£ã¹ã¯äžã« Zip å±éããããšãã«ãŒãã®çŽäžã«ãã`engine_manifest.json`ã«åŸã£ãŠãã¡ã€ã«ãæ¢æ»ããŸãã
VOICEVOX ãšãã£ã¿ãŒã«ããŸãèªã¿èŸŒãŸããããªããšãã¯ããšãã£ã¿ãŒã®ãšã©ãŒãã°ãåç
§ããŠãã ããã
ãŸãã`xxx.vvpp`ã¯åå²ããŠé£çªãä»ãã`xxx.0.vvppp`ãã¡ã€ã«ãšããŠé
åžããããšãå¯èœã§ãã
ããã¯ãã¡ã€ã«å®¹éã倧ãããŠé
åžãå°é£ãªå Žåã«æçšã§ãã
</details>
## GitHub Actions
### Variables
| name | description |
| :----------------- | :---------------------------------------------------------------------- |
| DOCKERHUB_USERNAME | Docker Hub ãŠãŒã¶å |
### Secrets
| name | description |
| :----------------- | :---------------------------------------------------------------------- |
| DOCKERHUB_TOKEN | [Docker Hub ã¢ã¯ã»ã¹ããŒã¯ã³](https://hub.docker.com/settings/security) |
## äºäŸçŽ¹ä»
**[voicevox-client](https://github.com/tuna2134/voicevox-client) [@tuna2134](https://github.com/tuna2134)**  VOICEVOX ENGINE ã®ããã®Pythonã©ãããŒ
## ã©ã€ã»ã³ã¹
LGPL v3 ãšããœãŒã¹ã³ãŒãã®å
¬éãäžèŠãªå¥ã©ã€ã»ã³ã¹ã®ãã¥ã¢ã«ã©ã€ã»ã³ã¹ã§ãã
å¥ã©ã€ã»ã³ã¹ãååŸãããå Žåã¯ãããïŒtwitter: @hiho_karutaïŒã«æ±ããŠãã ããã
|