YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Dataset: Use the Data Science Salaries 2023 dataset available on Kaggle: Data Science Salaries 2023. Tasks and Requirements:
- Data Exploration and Preprocessing: o Load the dataset and perform exploratory data analysis (EDA). o Clean the data, handle missing values, and encode categorical variables. o Split the data into training and testing sets.
- Model Training: o Train multiple machine learning models (e.g., Linear Regression, Decision Trees, Random Forest, Gradient Boosting). o Use MLflow to track experiments, including parameters, metrics, and artifacts. o Evaluate the models using appropriate metrics (e.g., RMSE, MAE, R²).
- Model Selection and Optimization: o Compare the performance of different models. o Optimize the best-performing model using hyperparameter tuning. o Record all experiments and their results using MLflow.
- Streamlit Application: o Create a Streamlit app to interact with the trained model. o The app should allow users to input features and get salary predictions. o Display relevant model performance metrics and visualizations in the app.
- Model Registration and Deployment: o Register the best model in the MLflow Model Registry. o Deploy the model using Hugging Face Spaces. o Ensure the deployed model is accessible via an API for inference.