ccdv commited on
Commit
eaaafbb
1 Parent(s): 073319f
Files changed (1) hide show
  1. README.md +4 -2
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
  language: fr
3
  tags:
 
4
  - long context
5
  pipeline_tag: fill-mask
6
  ---
@@ -10,6 +11,8 @@ pipeline_tag: fill-mask
10
  **This model relies on a custom modeling file, you need to add trust_remote_code=True**\
11
  **See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
12
 
 
 
13
  * [Usage](#usage)
14
  * [Parameters](#parameters)
15
  * [Sparse selection type](#sparse-selection-type)
@@ -18,10 +21,9 @@ pipeline_tag: fill-mask
18
 
19
  This model is adapted from [CamemBERT-base](https://huggingface.co/camembert-base) without additional pretraining yet. It uses the same number of parameters/layers and the same tokenizer.
20
 
21
-
22
  This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
23
 
24
- The model requires sequences whose length is a multiple of the block size. The model is "adaptive" and automatically pads the sequences if needed (adaptive=True in config). It is however recommended, thanks to the tokenizer, to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...). \
25
 
26
  Support encoder-decoder but I didnt test it extensively.\
27
  Implemented in PyTorch.
 
1
  ---
2
  language: fr
3
  tags:
4
+ - camembert
5
  - long context
6
  pipeline_tag: fill-mask
7
  ---
 
11
  **This model relies on a custom modeling file, you need to add trust_remote_code=True**\
12
  **See [\#13467](https://github.com/huggingface/transformers/pull/13467)**
13
 
14
+ Conversion script is available at this [link](https://github.com/ccdv-ai/convert_checkpoint_to_lsg).
15
+
16
  * [Usage](#usage)
17
  * [Parameters](#parameters)
18
  * [Sparse selection type](#sparse-selection-type)
 
21
 
22
  This model is adapted from [CamemBERT-base](https://huggingface.co/camembert-base) without additional pretraining yet. It uses the same number of parameters/layers and the same tokenizer.
23
 
 
24
  This model can handle long sequences but faster and more efficiently than Longformer or BigBird (from Transformers) and relies on Local + Sparse + Global attention (LSG).
25
 
26
+ The model requires sequences whose length is a multiple of the block size. The model is "adaptive" and automatically pads the sequences if needed (adaptive=True in config). It is however recommended, thanks to the tokenizer, to truncate the inputs (truncation=True) and optionally to pad with a multiple of the block size (pad_to_multiple_of=...).
27
 
28
  Support encoder-decoder but I didnt test it extensively.\
29
  Implemented in PyTorch.