lysandre HF staff commited on
Commit
a74dd20
1 Parent(s): 9fcd16f

Large -> Xlarge

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -6,7 +6,7 @@ datasets:
6
  - wikipedia
7
  ---
8
 
9
- # ALBERT Large v1
10
 
11
  Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
12
  [this paper](https://arxiv.org/abs/1909.11942) and first released in
@@ -36,15 +36,15 @@ classifier using the features produced by the ALBERT model as inputs.
36
 
37
  ALBERT is particular in that it shares its layers across its Transformer. Therefore, all layers have the same weights. Using repeating layers results in a small memory footprint, however, the computational cost remains similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same number of (repeating) layers.
38
 
39
- This is the first version of the large model. Version 2 is different from version 1 due to different dropout rates, additional training data, and longer training. It has better results in nearly all downstream tasks.
40
 
41
  This model has the following configuration:
42
 
43
  - 24 repeating layers
44
  - 128 embedding dimension
45
- - 1024 hidden dimension
46
  - 16 attention heads
47
- - 17M parameters
48
 
49
  ## Intended uses & limitations
50
 
@@ -62,7 +62,7 @@ You can use this model directly with a pipeline for masked language modeling:
62
 
63
  ```python
64
  >>> from transformers import pipeline
65
- >>> unmasker = pipeline('fill-mask', model='albert-large-v1')
66
  >>> unmasker("Hello I'm a [MASK] model.")
67
  [
68
  {
@@ -102,8 +102,8 @@ Here is how to use this model to get the features of a given text in PyTorch:
102
 
103
  ```python
104
  from transformers import AlbertTokenizer, AlbertModel
105
- tokenizer = AlbertTokenizer.from_pretrained('albert-large-v1')
106
- model = AlbertModel.from_pretrained("albert-large-v1")
107
  text = "Replace me by any text you'd like."
108
  encoded_input = tokenizer(text, return_tensors='pt')
109
  output = model(**encoded_input)
@@ -113,8 +113,8 @@ and in TensorFlow:
113
 
114
  ```python
115
  from transformers import AlbertTokenizer, TFAlbertModel
116
- tokenizer = AlbertTokenizer.from_pretrained('albert-large-v1')
117
- model = TFAlbertModel.from_pretrained("albert-large-v1")
118
  text = "Replace me by any text you'd like."
119
  encoded_input = tokenizer(text, return_tensors='tf')
120
  output = model(encoded_input)
@@ -127,7 +127,7 @@ predictions:
127
 
128
  ```python
129
  >>> from transformers import pipeline
130
- >>> unmasker = pipeline('fill-mask', model='albert-large-v1')
131
  >>> unmasker("The man worked as a [MASK].")
132
 
133
  [
 
6
  - wikipedia
7
  ---
8
 
9
+ # ALBERT XLarge v1
10
 
11
  Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in
12
  [this paper](https://arxiv.org/abs/1909.11942) and first released in
 
36
 
37
  ALBERT is particular in that it shares its layers across its Transformer. Therefore, all layers have the same weights. Using repeating layers results in a small memory footprint, however, the computational cost remains similar to a BERT-like architecture with the same number of hidden layers as it has to iterate through the same number of (repeating) layers.
38
 
39
+ This is the first version of the xlarge model. Version 2 is different from version 1 due to different dropout rates, additional training data, and longer training. It has better results in nearly all downstream tasks.
40
 
41
  This model has the following configuration:
42
 
43
  - 24 repeating layers
44
  - 128 embedding dimension
45
+ - 2048 hidden dimension
46
  - 16 attention heads
47
+ - 58M parameters
48
 
49
  ## Intended uses & limitations
50
 
 
62
 
63
  ```python
64
  >>> from transformers import pipeline
65
+ >>> unmasker = pipeline('fill-mask', model='albert-xlarge-v1')
66
  >>> unmasker("Hello I'm a [MASK] model.")
67
  [
68
  {
 
102
 
103
  ```python
104
  from transformers import AlbertTokenizer, AlbertModel
105
+ tokenizer = AlbertTokenizer.from_pretrained('albert-xlarge-v1')
106
+ model = AlbertModel.from_pretrained("albert-xlarge-v1")
107
  text = "Replace me by any text you'd like."
108
  encoded_input = tokenizer(text, return_tensors='pt')
109
  output = model(**encoded_input)
 
113
 
114
  ```python
115
  from transformers import AlbertTokenizer, TFAlbertModel
116
+ tokenizer = AlbertTokenizer.from_pretrained('albert-xlarge-v1')
117
+ model = TFAlbertModel.from_pretrained("albert-xlarge-v1")
118
  text = "Replace me by any text you'd like."
119
  encoded_input = tokenizer(text, return_tensors='tf')
120
  output = model(encoded_input)
 
127
 
128
  ```python
129
  >>> from transformers import pipeline
130
+ >>> unmasker = pipeline('fill-mask', model='albert-xlarge-v1')
131
  >>> unmasker("The man worked as a [MASK].")
132
 
133
  [