--- language: en tags: - token-classification - named-entity-recognition - bert - transformers license: mit datasets: - conll2003 --- # Token Classification Model ## Description This project involves developing a machine learning model for token classification, specifically for Named Entity Recognition (NER). Using a fine-tuned BERT model from the Hugging Face library, this system classifies tokens in text into predefined categories like names, locations, and dates. The model is trained on a dataset annotated with entity labels to accurately classify each token. This token classification system is useful for information extraction, document processing, and conversational AI applications. ## Technologies Used ### Dataset - **Source:** Kaggle: conll2003 - **Purpose:** Contains text data with annotated entities for token classification. ### Model - **Base Model:** BERT (bert-base-uncased) - **Library:** Hugging Face transformers - **Task:** Token Classification (Named Entity Recognition) ### Approach #### Preprocessing: - Load and preprocess the dataset. - Tokenize the text data and align labels with tokens. #### Fine-Tuning: - Fine-tune the BERT model on the token classification dataset. #### Training: - Train the model to classify each token into predefined entity labels. #### Inference: - Use the trained model to predict entity labels for new text inputs. ### Key Technologies - **Deep Learning (BERT):** For advanced token classification and contextual understanding. - **Natural Language Processing (NLP):** For text preprocessing, tokenization, and entity recognition. - **Machine Learning Algorithms:** For model training and prediction tasks. ## Streamlit App You can view and interact with the Streamlit app for token classification [here](https://huggingface.co/spaces/AdilHayat173/token_classifcation). ## Examples Here are some examples of outputs from the model: ![example1](https://github.com/user-attachments/assets/9e9dd85c-1447-4229-b691-febec17439cf) ![example2](https://github.com/user-attachments/assets/97dfc391-bda9-4614-93f7-a5f45d64dd03) ## Google Colab Notebook You can view and run the Google Colab notebook for this project [here](https://colab.research.google.com/drive/1GYVlIToQ_lnT8XEjGrR2WFkUQWpWXgQi#scrollTo=ZlyX1Lgn8gjj). ## Acknowledgements - Hugging Face for transformer models and libraries. - Streamlit for creating the interactive web interface. - [Your Dataset Provider] for the token classification dataset. ## Author - AdilHayat - [Hugging Face Profile](https://huggingface.co/AdilHayat173) - [GitHub Profile](https://github.com/AdilHayat21173) ## Feedback If you have any feedback, please reach out to us at hayatadil300@gmail.com.