expand abbreviations of TLD .
Browse files
README.md
CHANGED
@@ -40,7 +40,7 @@ The training data used is
|
|
40 |
#### Preprocessing
|
41 |
The following filtering is done
|
42 |
- Remove documents that do not use a single hiragana character. This removes English-only documents and documents in Chinese.
|
43 |
-
- Whitelist-style filtering using
|
44 |
|
45 |
#### Training Hyperparameters
|
46 |
|
|
|
40 |
#### Preprocessing
|
41 |
The following filtering is done
|
42 |
- Remove documents that do not use a single hiragana character. This removes English-only documents and documents in Chinese.
|
43 |
+
- Whitelist-style filtering using the top level domain of URL to remove affiliate sites.
|
44 |
|
45 |
#### Training Hyperparameters
|
46 |
|