Example prompt for create a good classification model

#8
by OREO5ITY - opened

😃: This is train dataset train_df = pd.read_csv('/content/train.csv')
This is dataset without y(target) test_df = pd.read_csv('/content/test.csv')
āļŠāļĢāđ‰āļēāļ‡āđ‚āļĄāđ€āļ”āļĨāļŠāļģāļŦāļĢāļąāļšāļ—āļģāļ™āļēāļĒ activity āļˆāļēāļāļ‚āđ‰āļ­āļĄāļđāļĨ accelerometer

  1. Use tag_identification,x, y, z to predict activity
  2. Train the Classification model with Auto Gluon (TabularPredictor), label='activity', time_budget=600
  3. Show the evaluation result
  4. Use Model to predict on test_df
  5. export to csv in id,activity columns

This is columns in train_df
tag_identification : āļŦāļĄāļēāļĒāđ€āļĨāļ‚āļĢāļ°āļšāļļāļ­āļļāļ›āļāļĢāļ“āđŒāļŠāļ§āļĄāđƒāļŠāđˆ
x : āļ‚āđ‰āļ­āļĄāļđāļĨ accelerometer āđƒāļ™āđāļāļ™ x
y : āļ‚āđ‰āļ­āļĄāļđāļĨ accelerometer āđƒāļ™āđāļāļ™ y
z : āļ‚āđ‰āļ­āļĄāļđāļĨ accelerometer āđƒāļ™āđāļāļ™ z
activity : āļāļīāļˆāļāļĢāļĢāļĄāļ—āļĩāđˆāļ—āļģāļ­āļĒāļđāđˆ {walking,falling,'lying down',lying,'sitting down',sitting,'standing up from lying','on all fours','sitting on the ground','standing up from sitting', etc.}
ðŸĪ–: To accomplish your request, you need to follow these steps:

Firstly, let's load the data and check the format:

import pandas as pd
from autogluon.tabular import TabularPredictor

train_df = pd.read_csv('/content/train.csv')
test_df = pd.read_csv('/content/test.csv')

print(train_df.head())
print(test_df.head())

Then we will prepare the predictor:

label = 'activity'
time_budget = 600 # Time in seconds

predictor = TabularPredictor(label=label).fit(
    train_df, 
    time_limit=time_budget,
)

After training, we can evaluate our model:

performance = predictor.evaluate_predictions(
    y_true=train_df[label],
    y_pred=predictor.predict(train_df),
    auxiliary_metrics=True
)
print(performance)

Next, we use the model to predict on test_df:

predictions = predictor.predict(test_df)

Finally, we export predictions to a CSV file:

output_df = pd.DataFrame({'id': test_df['tag_identification'], 'activity': predictions})
output_df.to_csv('submission.csv', index=False)

Please note that this code assumes that 'tag_identification' is the ID column in your test set. If it is not, please replace 'tag_identification' with the correct ID column name. Also, please ensure that the label 'activity' is present in your training set and absent in your testing set as you've mentioned.

Remember to install AutoGluon if you haven't done so already by running !pip install autogluon.tabular. This package requires a lot of computational resources, so make sure your environment can handle it. The time_budget parameter limits the time for training, but keep in mind that the more time you give it, the better your model might perform.

Sign up or log in to comment