|
--- |
|
license: mit |
|
language: |
|
- en |
|
pipeline_tag: text-classification |
|
tags: |
|
- url |
|
- urls |
|
- classification |
|
new_version: CrabInHoney/urlbert-tiny-base-v2 |
|
--- |
|
This is a very small version of BERT, intended for later fine-tune under URL analysis. |
|
|
|
Model size |
|
6.53M params |
|
|
|
Tensor type |
|
F32 |
|
|
|
Test example: |
|
|
|
from transformers import BertTokenizerFast, BertForMaskedLM, pipeline |
|
import torch |
|
|
|
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') |
|
print(f"Используемое устройство: {device}") |
|
|
|
model_path = "./urlbertV1" |
|
|
|
tokenizer = BertTokenizerFast.from_pretrained(model_path) |
|
|
|
model = BertForMaskedLM.from_pretrained(model_path) |
|
model.to(device) |
|
|
|
fill_mask = pipeline( |
|
"fill-mask", |
|
model=model, |
|
tokenizer=tokenizer, |
|
device=0 if torch.cuda.is_available() else -1 |
|
) |
|
|
|
sentences = [ |
|
"http://helloworld.[MASK]/events/" |
|
] |
|
|
|
for sentence in sentences: |
|
print(f"\nИсходное предложение: {sentence}") |
|
results = fill_mask(sentence) |
|
for result in results: |
|
token_str = result['token_str'] |
|
score = result['score'] |
|
print(f"Предсказанное слово: {token_str}, вероятность: {score:.4f}") |
|
|
|
Output: |
|
|
|
Исходное предложение: http://helloworld.[MASK]/events/ |
|
|
|
Предсказанное слово: com, вероятность: 0.7575 |
|
|
|
Предсказанное слово: org, вероятность: 0.0884 |
|
|
|
Предсказанное слово: nl, вероятность: 0.0294 |
|
|
|
Предсказанное слово: net, вероятность: 0.0198 |
|
|
|
Предсказанное слово: ca, вероятность: 0.0153 |
|
|
|
|
|
## License |
|
|
|
[MIT](https://choosealicense.com/licenses/mit/) |