edubz commited on
Commit
3742af5
1 Parent(s): af7c621

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -1,3 +1,16 @@
1
  ---
2
  license: mit
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
  ---
4
+ This model was trained on a new dataset composed of available poems by Anne Bradstreet hosted by [Public Domain Poetry.](https://www.public-domain-poetry.com/anne-bradstreet) Specifically I downloaded all 40 poems and fine-tuned a bert-base-uncased text classification model on Amazon SageMaker. For the negative class, I actually generated GPT-2 samples of length 70. That is to say, for each line of Bradstreet I generated a generic GPT-2 reposes. I considered these responses my negative class.
5
+
6
+ In the classifier, I had a total of 6947 positive lines written by Anne Bradstreet, and 5219 lines generated by GPT-2 in response, totally a dataset of 12,166 labeled lines. I used only the GPT-2 responses in the training set, keeping the actual Bradstreet lines in the positive samples alone.
7
+
8
+ I split the train and test set in 80/20, leaving a total of 9732 labeled samples in training, and 2435 samples in test.
9
+
10
+ These I trained on SageMaker, using the Hugging Face deep learning container. I also used SageMaker Training Compiler, which achieved 64 samples per batch on an ml.p3.2xlarge. After 42 minutes of training, on only 5 epochs, I achieved a train loss of 0.0714. Test loss is forthcoming.
11
+
12
+ In my own tests, the model seems to be always very confident. That is to say, it routinely gives a confidence score of at least 99.8%. All predictions should be single-lines only, as this is how the model was fine-tuned. Multiple lines in a prediction request will always result in a Label0 response, ie not written by Anne Bradstreet, even if pulled directly from her works.
13
+
14
+ In short, the model seems to know the difference between generic GPT-2 text responding to a Bradstreet prompt, vs the output of a model fine-tuned on Bradstreet text and generating based on Bradstreet responses.
15
+
16
+ This was developed exclusively for use at an upcoming workshop.