{ "cells": [ { "cell_type": "markdown", "id": "5ede69a1", "metadata": {}, "source": [ "# Classification Challenge using CatBoost\n", "\n", "## INF2179 Fall 2021\n", "### Hamid Yuksel\n", "\n", "This submission uses [CatBoost](https://catboost.ai/).\n", "CatBoost was chosen for its listed benefits, mainly in requiring less hyperparameter tuning and preprocessing of categorical and text features. It is also fast and fairly easy to set up.\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 119, "id": "ee82451e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: catboost in /Users/yuksel/.local/lib/python3.8/site-packages (1.0.3)\n", "Requirement already satisfied: six in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from catboost) (1.15.0)\n", "Requirement already satisfied: pandas>=0.24.0 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from catboost) (1.2.4)\n", "Requirement already satisfied: matplotlib in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from catboost) (3.4.3)\n", "Requirement already satisfied: graphviz in /Users/yuksel/.local/lib/python3.8/site-packages (from catboost) (0.18)\n", "Requirement already satisfied: scipy in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from catboost) (1.6.2)\n", "Requirement already satisfied: numpy>=1.16.0 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from catboost) (1.20.1)\n", "Requirement already satisfied: plotly in /Users/yuksel/.local/lib/python3.8/site-packages (from catboost) (5.3.1)\n", "Requirement already satisfied: python-dateutil>=2.7.3 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from pandas>=0.24.0->catboost) (2.8.1)\n", "Requirement already satisfied: pytz>=2017.3 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from pandas>=0.24.0->catboost) (2021.1)\n", "Requirement already satisfied: pillow>=6.2.0 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from matplotlib->catboost) (8.2.0)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from matplotlib->catboost) (1.3.1)\n", "Requirement already satisfied: pyparsing>=2.2.1 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from matplotlib->catboost) (2.4.7)\n", "Requirement already satisfied: cycler>=0.10 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from matplotlib->catboost) (0.10.0)\n", "Requirement already satisfied: tenacity>=6.2.0 in /Users/yuksel/.local/lib/python3.8/site-packages (from plotly->catboost) (8.0.1)\n", "\u001b[33mWARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.\n", "You should consider upgrading via the '/Users/yuksel/opt/anaconda3/bin/python -m pip install --upgrade pip' command.\u001b[0m\n", "Requirement already satisfied: ipywidgets in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (7.6.3)\n", "Requirement already satisfied: traitlets>=4.3.1 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipywidgets) (5.0.5)\n", "Requirement already satisfied: ipython>=4.0.0 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipywidgets) (7.22.0)\n", "Requirement already satisfied: widgetsnbextension~=3.5.0 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipywidgets) (3.5.1)\n", "Requirement already satisfied: nbformat>=4.2.0 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipywidgets) (5.1.3)\n", "Requirement already satisfied: ipykernel>=4.5.1 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipywidgets) (5.3.4)\n", "Requirement already satisfied: jupyterlab-widgets>=1.0.0 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipywidgets) (1.0.0)\n", "Requirement already satisfied: appnope in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets) (0.1.2)\n", "Requirement already satisfied: tornado>=4.2 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets) (6.1)\n", "Requirement already satisfied: jupyter-client in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipykernel>=4.5.1->ipywidgets) (6.1.12)\n", "Requirement already satisfied: pygments in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (2.8.1)\n", "Requirement already satisfied: pexpect>4.3 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (4.8.0)\n", "Requirement already satisfied: jedi>=0.16 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (0.17.2)\n", "Requirement already satisfied: decorator in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (5.0.6)\n", "Requirement already satisfied: backcall in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (0.2.0)\n", "Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (3.0.17)\n", "Requirement already satisfied: setuptools>=18.5 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (52.0.0.post20210125)\n", "Requirement already satisfied: pickleshare in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from ipython>=4.0.0->ipywidgets) (0.7.5)\n", "Requirement already satisfied: parso<0.8.0,>=0.7.0 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from jedi>=0.16->ipython>=4.0.0->ipywidgets) (0.7.0)\n", "Requirement already satisfied: ipython-genutils in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets) (0.2.0)\n", "Requirement already satisfied: jsonschema!=2.5.0,>=2.4 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets) (3.2.0)\n", "Requirement already satisfied: jupyter-core in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from nbformat>=4.2.0->ipywidgets) (4.7.1)\n", "Requirement already satisfied: pyrsistent>=0.14.0 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets) (0.17.3)\n", "Requirement already satisfied: attrs>=17.4.0 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets) (20.3.0)\n", "Requirement already satisfied: six>=1.11.0 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat>=4.2.0->ipywidgets) (1.15.0)\n", "Requirement already satisfied: ptyprocess>=0.5 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from pexpect>4.3->ipython>=4.0.0->ipywidgets) (0.7.0)\n", "Requirement already satisfied: wcwidth in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0->ipywidgets) (0.2.5)\n", "Requirement already satisfied: notebook>=4.4.1 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from widgetsnbextension~=3.5.0->ipywidgets) (6.3.0)\n", "Requirement already satisfied: argon2-cffi in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (20.1.0)\n", "Requirement already satisfied: nbconvert in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (6.0.7)\n", "Requirement already satisfied: Send2Trash>=1.5.0 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.5.0)\n", "Requirement already satisfied: jinja2 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (2.11.3)\n", "Requirement already satisfied: pyzmq>=17 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (20.0.0)\n", "Requirement already satisfied: terminado>=0.8.3 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.9.4)\n", "Requirement already satisfied: prometheus-client in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.10.1)\n", "Requirement already satisfied: python-dateutil>=2.1 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from jupyter-client->ipykernel>=4.5.1->ipywidgets) (2.8.1)\n", "Requirement already satisfied: cffi>=1.0.0 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.14.5)\n", "Requirement already satisfied: pycparser in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from cffi>=1.0.0->argon2-cffi->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (2.20)\n", "Requirement already satisfied: MarkupSafe>=0.23 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from jinja2->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (2.0.1)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: mistune<2,>=0.8.1 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.8.4)\n", "Requirement already satisfied: bleach in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (3.3.0)\n", "Requirement already satisfied: testpath in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.4.4)\n", "Requirement already satisfied: jupyterlab-pygments in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.1.2)\n", "Requirement already satisfied: defusedxml in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.7.1)\n", "Requirement already satisfied: nbclient<0.6.0,>=0.5.0 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.5.3)\n", "Requirement already satisfied: pandocfilters>=1.4.1 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.4.3)\n", "Requirement already satisfied: entrypoints>=0.2.2 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.3)\n", "Requirement already satisfied: nest-asyncio in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.5.1)\n", "Requirement already satisfied: async-generator in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from nbclient<0.6.0,>=0.5.0->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (1.10)\n", "Requirement already satisfied: packaging in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (20.9)\n", "Requirement already satisfied: webencodings in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (0.5.1)\n", "Requirement already satisfied: pyparsing>=2.0.2 in /Users/yuksel/opt/anaconda3/lib/python3.8/site-packages (from packaging->bleach->nbconvert->notebook>=4.4.1->widgetsnbextension~=3.5.0->ipywidgets) (2.4.7)\n", "\u001b[33mWARNING: You are using pip version 21.2.4; however, version 21.3.1 is available.\n", "You should consider upgrading via the '/Users/yuksel/opt/anaconda3/bin/python -m pip install --upgrade pip' command.\u001b[0m\n", "Enabling notebook extension jupyter-js-widgets/extension...\n", " - Validating: \u001b[32mOK\u001b[0m\n" ] } ], "source": [ "#Installing and Importing required libraries\n", "! pip3 install --user catboost\n", "! pip3 install --user ipywidgets\n", "! jupyter nbextension enable --py widgetsnbextension\n", "\n", "import pandas as pd \n", "import numpy as np\n", "from sklearn.neural_network import MLPClassifier\n", "from sklearn.metrics import accuracy_score \n", "from catboost import Pool, CatBoostClassifier" ] }, { "cell_type": "code", "execution_count": 121, "id": "1853f85d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Rock 24486\n", "Pop 16251\n", "Hip Hop 9263\n", "unknown 5000\n", "Name: Genre, dtype: int64" ] }, "execution_count": 121, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Reading data\n", "df = pd.read_csv('data.csv')\n", "\n", "# Splitting\n", "training = df.head(50000)\n", "holdout_set = training.sample(5000, random_state=1) # pick 5000 observations randomly\n", "training = training.drop(holdout_set.index) # Remove holdout from training data\n", "testing = df.tail(5000)\n", "\n", "#Looking at counts per genre\n", "df['Genre'].value_counts()" ] }, { "cell_type": "code", "execution_count": 122, "id": "f0a20f7a", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Lyric | \n", "
---|---|
50000 | \n", "Feels so good,. Feels so good,. Feels so good ... | \n", "
50001 | \n", "Shadow of a doubt. I heard your heart,. you he... | \n", "
50002 | \n", "Slaves. Hebrews born to serve to the pharaoh. ... | \n", "
50003 | \n", "You've been picked and it's over. What's the c... | \n", "
50004 | \n", "Magic happens. But only if you are open to the... | \n", "
... | \n", "... | \n", "
54995 | \n", "I can't believe what you did to me. Down on my... | \n", "
54996 | \n", "Have all the songs been written?. Have all the... | \n", "
54997 | \n", "Everything you do you do so right. The clothes... | \n", "
54998 | \n", "(trecho). (Rule Number Two. Understanding what... | \n", "
54999 | \n", "As fall rides off in the Sunset. I sweep the S... | \n", "
5000 rows × 1 columns
\n", "