이전 글
Kaggle Courses - Random Forests
Exercises
Train a model for the competition
The code cell above trains a Random Forest model on train_X
and train_y
.
Use the code cell below to build a Random Forest model and train it on all of X
and y
.
# To improve accuracy, create a new Random Forest model which you will train on all training data
rf_model_on_full_data = ____
# fit rf_model_on_full_data on all data from the training data
____
Now, read the file of “test” data, and apply your model to make predictions.
# path to file you will use for predictions
test_data_path = '../input/test.csv'
# read test data file using pandas
test_data = ____
# create test_X which comes from test_data but includes only the columns you used for prediction.
# The list of columns is stored in a variable called features
test_X = ____
# make predictions which we will submit.
test_preds = ____
Before submitting, run a check to make sure your test_preds have the right format.
정답
# To improve accuracy, create a new Random Forest model which you will train on all training data
rf_model_on_full_data = RandomForestRegressor(random_state=1)
# fit rf_model_on_full_data on all data from the training data
rf_model_on_full_data.fit(train_X, train_y)
# path to file you will use for predictions
test_data_path = '../input/test.csv'
# read test data file using pandas
test_data = pd.read_csv(test_data_path)
# create test_X which comes from test_data but includes only the columns you used for prediction.
# The list of columns is stored in a variable called features
test_X = test_data[features]
# make predictions which we will submit.
test_preds = rf_model_on_full_data.predict(test_X)
Exercise를 그대로 따라하면 21488정도의 score가 나온다. 더 낮은 score의 코드를 보면 여러 전처리를 거친 것을 알 수 있다.