古诗词

Model description

古诗词AI生成

How to use

使用 pipeline 调用模型:

from transformers import AutoTokenizer, GPT2LMHeadModel, TextGenerationPipeline
model_checkpoint = "supermy/poetry"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
model = GPT2LMHeadModel.from_pretrained(model_checkpoint)
text_generator = TextGenerationPipeline(model, tokenizer)
text_generator.model.config.pad_token_id = text_generator.model.config.eos_token_id

print(text_generator("举头 望 明月，", max_length=100, do_sample=True))
print(text_generator("物换 星移 几度 秋，", max_length=100, do_sample=True))

>>> print(text_generator("举头 望 明月，", max_length=100, do_sample=True))
[{'generated_text': '举头 望 明月， 何以 喻 无言 。 顾影 若为 舞 ， 啸 风清 独 伤 。 四时 别有 意 ， 千古 得 从容 。 赏音 我非 此 ， 何如 鸥鹭 群 。 崎 山有 佳色 ， 落落 样 相宜 。 不嫌 雪霜 温 ， 宁 受 四时 肥 。 老 态 如 偷 面 ， 冬 心 似 相知 。 春风 不可 恃 ， 触 动 春 何为 。 岁晚 忽然 老 ， 花前 岁月深 。 可笑 一场 梦 ， 婵娟 乍 自 心 。 列 名 多 岁月 ， 森 列 尽 林峦 。 试问 影 非 笑'}]
>>> print(text_generator("物换 星移 几度 秋，", max_length=100, do_sample=True))
[{'generated_text': '物换 星移 几度 秋， 消长 随时 向 一丘 。 渔者 下 逢 勾漏 令 ， 漏声 高出 景阳 丘 。 天津 大尹 昔 从游 ， 大尹 来时 春复 秋 。 旗鼓 日 严 宣 使 从 ， 联镳 歌笑 又 风流 。 冈峦 比 并 瑶 溪 水 ， 叠嶂 高 盘 黼黻 洲 。 花木 芳菲 三月 天 ， 莺花 暖 翠 几 流年 。 一从 别后 多 携手 ， 肠断 酒阑 怀 凛然 。 北阙 人称 似梦中 ， 西山 别样 梦魂 香 。 多君 观国 亲 圭璧 ， 能 预 陇西 称 巨 良 。 刷羽 刷羽'}]

Here is how to use this model to get the features of a given text in PyTorch:

from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("supermy/poetry")
model = AutoModelForCausalLM.from_pretrained("supermy/poetry")

Training data

非常全的古诗词数据，收录了从先秦到现代的共计85万余首古诗词。

统计信息

朝代	诗词数	作者数
宋	287114	9446
明	236957	4439
清	90089	8872
唐	49195	2736
元	37375	1209
近现代	28419	790
当代	28219	177
明末清初	17700	176
元末明初	15736	79
清末民国初	15367	99
清末近现代初	12464	48
宋末元初	12058	41
南北朝	4586	434
近现代末当代初	3426	23
魏晋	3020	251
金末元初	3019	17
金	2741	253
民国末当代初	1948	9
隋	1170	84
唐末宋初	1118	44
先秦	570	8
隋末唐初	472	40
汉	363	83
宋末金初	234	9
辽	22	7
秦	2	2
魏晋末南北朝初	1	1
总和	853385	29377

Training procedure

模型：GPT2 训练环境：英伟达16G显卡

bpe分词："vocab_size"=50000


***** Running training *****
  Num examples = 16431
  Num Epochs = 680
  Instantaneous batch size per device = 24
  Total train batch size (w. parallel, distributed & accumulation) = 192
  Gradient Accumulation steps = 8
  Total optimization steps = 57800
  Number of trainable parameters = 124242432
GPT-2 size: 124.2M parameters
  0%|          | 0/57800 [00:00<?, ?it/s]You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
    9%|▊         | 5000/57800 [6:58:57<72:53:18,  4.97s/it]***** Running Evaluation *****
  Num examples = 1755
  Batch size = 24
{'loss': 3.1345, 'learning_rate': 0.0004939065828881268, 'epoch': 58.82}
  9%|▊         | 5000/57800 [6:59:14<72:53:18, Saving model checkpoint to poetry-trainer/checkpoint-5000
Configuration saved in poetry-trainer/checkpoint-5000/config.json
Model weights saved in poetry-trainer/checkpoint-5000/pytorch_model.bin
tokenizer config file saved in poetry-trainer/checkpoint-5000/tokenizer_config.json
Special tokens file saved in poetry-trainer/checkpoint-5000/special_tokens_map.json
 17%|█▋        | 10000/57800 [13:55:32<65:40:41,  4.95s/it]***** Running Evaluation *****
  Num examples = 1755
  Batch size = 24
{'eval_loss': 11.14090633392334, 'eval_runtime': 16.8326, 'eval_samples_per_second': 104.262, 'eval_steps_per_second': 4.396, 'epoch': 58.82}
{'loss': 0.2511, 'learning_rate': 0.00046966687938531824, 'epoch': 117.64}
 17%|█▋        | 10000/57800 [13:55:48<65:40:41Saving model checkpoint to poetry-trainer/checkpoint-10000
..........
 95%|█████████▌| 55000/57800 [76:06:46<3:59:33,  5.13s/it]***** Running Evaluation *****
  Num examples = 1755
  Batch size = 24
{'eval_loss': 14.860174179077148, 'eval_runtime': 16.7826, 'eval_samples_per_second': 104.572, 'eval_steps_per_second': 4.409, 'epoch': 588.23}
{'loss': 0.0083, 'learning_rate': 3.0262183266589473e-06, 'epoch': 647.06}
 95%|█████████▌| 55000/57800 [76:07:03<3:59:33,Saving model checkpoint to poetry-trainer/checkpoint-55000

{'eval_loss': 14.830656051635742, 'eval_runtime': 16.7365, 'eval_samples_per_second': 104.86, 'eval_steps_per_second': 4.421, 'epoch': 647.06}
{'train_runtime': 287920.5857, 'train_samples_per_second': 38.806, 'train_steps_per_second': 0.201, 'train_loss': 0.33751299874592816, 'epoch': 679.99}

100%|██████████| 57800/57800 [79:58:40<00:00,  4.93s/it]

###  entry and citation info

supermy
/

poetry

古诗词

Model description

How to use

Training data

统计信息

Training procedure

Spaces using supermy/poetry 2