{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "provenance": [] }, "kernelspec": { "name": "python3", "display_name": "Python 3" }, "language_info": { "name": "python" } }, "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "id": "47fPyWltjSqE" }, "outputs": [], "source": [ "!pip install transformers sentencepiece" ] }, { "cell_type": "code", "source": [ "from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer\n", "\n", "hi_text = \"जीवन एक चॉकलेट बॉक्स की तरह है।\"\n", "chinese_text = \"生活就像一盒巧克力。\"\n", "\n", "model = M2M100ForConditionalGeneration.from_pretrained(\"facebook/m2m100_1.2B\")\n", "model.eval()\n", "\"\"\"\n", "在PyTorch中,`model.eval()`是用来将模型设置为评估(evaluation)模式的方法。在深度学习中,训练和评估两个阶段的模型行为可能会有所不同。以下是`model.eval()`的主要作用:\n", "\n", "1. **Batch Normalization和Dropout的影响:**\n", "- 在训练阶段,`Batch Normalization`和`Dropout`等层的行为通常是不同的。在训练时,`Batch Normalization`使用批次统计信息来规范化输入,而`Dropout`层会随机丢弃一些神经元。在评估阶段,我们通常希望使用整个数据集的统计信息来规范化,而不是每个批次的统计信息,并且不再需要随机丢弃神经元。因此,通过执行`model.eval()`,模型会切换到评估模式,从而确保这些层的行为在评估时是正确的。\n", "\n", "2. **梯度计算的关闭:**\n", "- 在评估模式下,PyTorch会关闭自动求导(autograd)的计算图,这样可以避免不必要的梯度计算和内存消耗。在训练时,我们通常需要计算梯度以进行反向传播和参数更新,而在评估时,我们只对模型的前向传播感兴趣,因此关闭梯度计算可以提高评估的速度和减少内存使用。\n", "\n", "总的来说,执行`model.eval()`是为了确保在评估阶段模型的行为和性能是正确的,并且可以提高评估时的效率。\n", "\"\"\"\n", "tokenizer = M2M100Tokenizer.from_pretrained(\"facebook/m2m100_1.2B\")" ], "metadata": { "id": "ziPisPX_jXNC" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# translate Hindi to French\n", "tokenizer.src_lang = \"hi\"\n", "encoded_hi = tokenizer(hi_text, return_tensors=\"pt\")\n", "generated_tokens = model.generate(\n", " **encoded_hi, forced_bos_token_id=tokenizer.get_lang_id(\"fr\")\n", ")\n", "tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "00h7PwrOjehw", "outputId": "eb4e92ec-5e00-452d-8ead-d06d2e23b78e" }, "execution_count": 3, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['La vie est comme une boîte de chocolat.']" ] }, "metadata": {}, "execution_count": 3 } ] }, { "cell_type": "code", "source": [ "# translate Chinese to English\n", "tokenizer.src_lang = \"zh\"\n", "encoded_zh = tokenizer(chinese_text, return_tensors=\"pt\")\n", "generated_tokens = model.generate(\n", " **encoded_zh, forced_bos_token_id=tokenizer.get_lang_id(\"en\")\n", ")\n", "tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)" ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ifzvH6Ezj62j", "outputId": "c5c6307d-5811-4978-f565-709e22d4a16b" }, "execution_count": 4, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ "['Life is like a box of chocolate.']" ] }, "metadata": {}, "execution_count": 4 } ] }, { "cell_type": "code", "source": [], "metadata": { "id": "YwHxXY-RkDPH" }, "execution_count": null, "outputs": [] } ] }