π BUOD: Text Summarization Model for the Filipino Language Directory
Authors: James Esguerra, Julia Avila, Hazielle Bugayong
Foreword: This research was done in two parts, gathering the data and running transformer models, namely distilBART and bert2bert. Below is the step-by-step process of the experientaton of the study:
π Steps
- π Gathering the data
- π§ Initializing the transfomer models; fine-tuning of the models: -- via Google Colab -- via Google Colab (Local runtime) -- via Jupyter Notebook
π Gathering data
An article scraper was used in this experimentation which can gather bodies of text from various news sites. The data gathered was used to pre-train and finetune the models in the next step. This also includes instructions on how to use the article scraper.
π§ Initialization of transformer models
via Google Colab
Two models, distilBART and bert2bert were used to compar abstractive text summarization performance. They can be found here:
via Google Colab Local Runtime
Dependencies
- Jupyter Notebook
- Anaconda
- Optional: CUDA Toolkit for Nvidia, requires an account to install
- Tensorflow
Installing dependencies
Create an anaconda environment. This can also be used for tensorflow, which links your GPU to Google colab's Local runtime:
conda create -n tf-gpu
conda activate tf-gpu
Optional Step: GPU Utilization (if you are using an external GPU)
Next, install the CUDA toolkit, this is the version that was used in this experiment. You may find a more compatible version for your hardware:
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
Then, upgrade pip and install tensorflow:
pip install βupgrade pip
pip install βtensorflow<2.11β βuser
Now, check if tensorflow has been configured to use the GPU, Type in termnial:
python
Next, type the following to verify:
import tensorflow as tf
tf.test.is_built_with_cuda()
If it returns true
, you have succesfully initialized the environment with your external GPU. If not, you may follow the tutorials found here:
- CUDA Toolkit Tutorial here
- Creating and Anaconda environment step-by-step
- Installing Tensorflow locally using this tutorial
Connecting to a Google Colab Local Runtime
To connect this on a Google Colab Local Runtime, this tutorial was used.
First, install Jupyter notebook (if you haven't) and enable server permissions:
pip install jupyter_http_over_ws
jupyter serverextension enable --py jupyter_http_over_ws
Next, start and authenticate the server:
jupyter notebook --NotebookApp.allow_origin='https://colab.research.google.com' --port=8888 --NotebookApp.port_retries=0
You can now copy the token url and paste it on your Google Colab.
Running the notebook using Jupyter Notebook
Dependencies
- Jupyter Notebook
- Anaconda
- Optional: CUDA Toolkit for Nvidia, requires an account to install
- Tensorflow
Download the notebooks and save them in your chosen directory. Create an environment where you can run the notebook via Anaconda
conda create -n env
conda activate env
**You may also opt to install the CUDA toolkit and tensforflow in this environment. Next, run the notebooks via Jupyter Notebook.
jupyter notebook
After you're done
Deactivate the environment and also disable the server using the commands in your console.
conda deactivate
jupyter serverextension disable --py jupyter_http_over_ws
π Additional Links/ Directory
Here are some links to resources and or references.
Name | Link |
---|---|
Ateneo Social Computing Lab | https://huggingface.co/ateneoscsl |