ahmedabdelali
/

bert-base-qarib_far

Inference Endpoints

Model card Files Files and versions Community

bert-base-qarib_far / README.md

ahmedabdelali's picture

Update README.md

b81ea4c over 3 years ago

|

history blame contribute delete

2.24 kB

	---
	language: ar
	tags:
	- pytorch
	- tf
	- QARiB
	- qarib
	datasets:
	- arabic_billion_words
	- open_subtitles
	- twitter
	- Farasa
	metrics:
	- f1
	widget:
	- text: "و+قام ال+مدير [MASK]"
	---
	# QARiB: QCRI Arabic and Dialectal BERT
	## About QARiB Farasa
	QCRI Arabic and Dialectal BERT (QARiB) model, was trained on a collection of ~ 420 Million tweets and ~ 180 Million sentences of text.
	For the tweets, the data was collected using twitter API and using language filter. `lang:ar`. For the text data, it was a combination from
	[Arabic GigaWord](url), [Abulkhair Arabic Corpus]() and [OPUS](http://opus.nlpl.eu/).
	QARiB: Is the Arabic name for "Boat".
	## Model and Parameters:
	- Data size: 14B tokens
	- Vocabulary: 64k
	- Iterations: 10M
	- Number of Layers: 12
	## Training QARiB
	See details in [Training QARiB](https://github.com/qcri/QARIB/Training_QARiB.md)
	## Using QARiB
	You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to be fine-tuned on a downstream task. See the model hub to look for fine-tuned versions on a task that interests you. For more details, see [Using QARiB](https://github.com/qcri/QARIB/Using_QARiB.md)

	This model expects the data to be segmented. You may use [Farasa Segmenter](https://farasa-api.qcri.org/segmentation/) API.

	### How to use
	You can use this model directly with a pipeline for masked language modeling:
	```python
	>>>from transformers import pipeline
	>>>fill_mask = pipeline("fill-mask", model="./models/bert-base-qarib_far")
	>>> fill_mask("و+قام ال+مدير [MASK]")

	>>> fill_mask("و+قام+ت ال+مدير+ة [MASK]")

	>>> fill_mask("قللي وشفيييك يرحم [MASK]")

	```
	## Evaluations:


	## Model Weights and Vocab Download
	From Huggingface site: https://huggingface.co/qarib/bert-base-qarib_far
	## Contacts
	Ahmed Abdelali, Sabit Hassan, Hamdy Mubarak, Kareem Darwish and Younes Samih
	## Reference
	```
	@article{abdelali2021pretraining,
	title={Pre-Training BERT on Arabic Tweets: Practical Considerations},
	author={Ahmed Abdelali and Sabit Hassan and Hamdy Mubarak and Kareem Darwish and Younes Samih},
	year={2021},
	eprint={2102.10684},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```