Per IANA registry, iw was deprecated as the code for Hebrew in 1989 and the preferred code is he
Browse filesPer [IANA registry](https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry), `iw` was deprecated as the code for Hebrew in 1989 and the preferred code is `he`
Original PR was merged into whisper here - https://github.com/openai/whisper/pull/401
HuggingFace transformers PR here - https://github.com/huggingface/transformers/pull/21310
The correct subtag:
```
%%
Type: language
Subtag: he
Description: Hebrew
Added: 2005-10-16
Suppress-Script: Hebr
%%
```
And the deprecation
```
%%
Type: language
Subtag: iw
Description: Hebrew
Added: 2005-10-16
Deprecated: 1989-01-01
Preferred-Value: he
Suppress-Script: Hebr
%%
```
- added_tokens.json +1 -1
added_tokens.json
CHANGED
@@ -30,6 +30,7 @@
|
|
30 |
"<|gu|>": 50333,
|
31 |
"<|haw|>": 50352,
|
32 |
"<|ha|>": 50354,
|
|
|
33 |
"<|hi|>": 50276,
|
34 |
"<|hr|>": 50291,
|
35 |
"<|ht|>": 50339,
|
@@ -38,7 +39,6 @@
|
|
38 |
"<|id|>": 50275,
|
39 |
"<|is|>": 50311,
|
40 |
"<|it|>": 50274,
|
41 |
-
"<|iw|>": 50279,
|
42 |
"<|ja|>": 50266,
|
43 |
"<|jw|>": 50356,
|
44 |
"<|ka|>": 50329,
|
|
|
30 |
"<|gu|>": 50333,
|
31 |
"<|haw|>": 50352,
|
32 |
"<|ha|>": 50354,
|
33 |
+
"<|he|>": 50279,
|
34 |
"<|hi|>": 50276,
|
35 |
"<|hr|>": 50291,
|
36 |
"<|ht|>": 50339,
|
|
|
39 |
"<|id|>": 50275,
|
40 |
"<|is|>": 50311,
|
41 |
"<|it|>": 50274,
|
|
|
42 |
"<|ja|>": 50266,
|
43 |
"<|jw|>": 50356,
|
44 |
"<|ka|>": 50329,
|