ccdv
/

lsg-camembert-base-4096

Model card Files Files and versions Community

ccdv commited on Aug 3, 2022

Commit

eaaafbb

•

1 Parent(s): 073319f

readme

Files changed (1) hide show

README.md +4 -2

README.md CHANGED Viewed

@@ -1,6 +1,7 @@
 ---
 language: fr
 tags:
 - long context
 pipeline_tag: fill-mask
 ---
@@ -10,6 +11,8 @@ pipeline_tag: fill-mask
 **This model relies on a custom modeling file, you need to add trust_remote_code=True**\
 **See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
 * [Usage](#usage)
 * [Parameters](#parameters)
 * [Sparse selection type](#sparse-selection-type)
@@ -18,10 +21,9 @@ pipeline_tag: fill-mask
 This model is adapted from [CamemBERT-base](https://huggingface.co/camembert-base) without additional pretraining yet. It uses the same number of parameters/layers and the same tokenizer.
 This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
-The model requires sequences whose length is a multiple of the block size. The model is "adaptive" and automatically pads the sequences if needed (adaptive=True in config). It is however recommended, thanks to the tokenizer, to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...). \
 Support encoder-decoder but I didnt test it extensively.\
 Implemented in PyTorch.

 ---
 language: fr
 tags:
+- camembert
 - long context
 pipeline_tag: fill-mask
 ---
 **This model relies on a custom modeling file, you need to add trust_remote_code=True**\
 **See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
+Conversion script is available at this [link](https://github.com/ccdv-ai/convert_checkpoint_to_lsg).
 * [Usage](#usage)
 * [Parameters](#parameters)
 * [Sparse selection type](#sparse-selection-type)
 This model is adapted from [CamemBERT-base](https://huggingface.co/camembert-base) without additional pretraining yet. It uses the same number of parameters/layers and the same tokenizer.
 This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
+The model requires sequences whose length is a multiple of the block size. The model is "adaptive" and automatically pads the sequences if needed (adaptive=True in config). It is however recommended, thanks to the tokenizer, to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...).
 Support encoder-decoder but I didnt test it extensively.\
 Implemented in PyTorch.