AITC Wiki

practice Predict solution

预测练习参考答案

practice Predict solution

中文版:预测练习参考答案

Weather Forecast: Logistic Regression and Naive Bayes

🎯 Task Description

You are provided with a synthetic weather forecast dataset containing the following features:

  • Temperature (°C)
  • Humidity (%)
  • WindSpeed (km/h)
  • Pressure (hPa)
  • RainToday (Yes/No)
  • RainTomorrow (0 = No, 1 = Yes) — This is the target variable.

Your objectives are:

  1. Preprocess the data:
  • Convert categorical variables to numeric.
  • Split the dataset into training and testing sets.
  • Apply feature scaling to numerical columns.
  1. Train two predictive models:
  • Linear Regression
  • Naive Bayes Classifier
  1. Evaluate the models:
  • Make predictions on the test set.
  • Calculate accuracy scores.
  • Compare the performance。

Try to predict the ‘Temperature’ with ‘Humidity’, ‘WindSpeed’, ‘Pressure’

1. Preprocess the data

a. use pd.read_csv() to save the data in df

b. use df.sample() and df.drop() to randomly split the dataset into a training set (70%) and a testing set (30%)

c. define features (X) and target variable (y) for both training and test sets

e. standardization and normalization

import pandas as pd
from sklearn.preprocessing import StandardScaler, LabelEncoder
 
# Load and encode categorical feature
df = pd.read_csv("weather_data.csv")
# Shuffle and split manually using sample()
df_train = df.sample(frac=0.7, random_state=42)
df_test = df.drop(df_train.index)
# Feature and target separation
X_train = df_train[['Humidity', 'WindSpeed', 'Pressure']]
y_train = df_train['Temperature']
X_test = df_test[['Humidity', 'WindSpeed', 'Pressure']]
y_test = df_test['Temperature']
# Feature scaling
scaler = StandardScaler()
 # Use fit_transform() on the training set to learn and apply scaling.
X_train_scaled = scaler.fit_transform(X_train)
 # Use transform() (only) on the test set to avoid data leakage.
X_test_scaled = scaler.transform(X_test)

2. Train Linear Regression

from sklearn.linear_model import LinearRegression
 
lr = LinearRegression()
lr.fit(X_train_scaled, y_train)

3. Predict the ‘Temperature’ for Test dataset

y_pred_lr = lr.predict(X_test_scaled)
print(y_pred_lr)

4. Evaluate the model, by checking the distance between prediction and ground truth

from sklearn.metrics import mean_absolute_error
 
mae = mean_absolute_error(y_test, y_pred_lr)
print("MAE:", mae)

Try to predict if it will rain (‘RainTomorrow’) with ‘Temperature’, ‘Humidity’, ‘WindSpeed’, ‘Pressure’, ‘RainToday’

  • 0 No
  • 1 Yes

1. Preprocessing

Define features (X) and target variable (y) for both training and test sets

Standardization and normalization

# Feature and target separation
X_train = df_train[['Temperature', 'Humidity', 'WindSpeed', 'Pressure', 'RainToday']]
y_train = df_train['RainTomorrow']
X_test = df_test[['Temperature', 'Humidity', 'WindSpeed', 'Pressure', 'RainToday']]
y_test = df_test['RainTomorrow']
 
# Feature scaling
scaler = StandardScaler()
 # Use fit_transform() on the training set to learn and apply scaling.
X_train_scaled = scaler.fit_transform(X_train)
 # Use transform() (only) on the test set to avoid data leakage.
X_test_scaled = scaler.transform(X_test)
 
print(y_test)

2. Train Naive Bayes

from sklearn.naive_bayes import GaussianNB
 
nb = GaussianNB()
nb.fit(X_train, y_train) # Naive Bayes works well with raw features (without Feature scaling)

3. Predict

y_pred_nb = nb.predict(X_test)
print(y_pred_nb)

4. Evaluate the model

from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
 
acc_nb = accuracy_score(y_test, y_pred_nb)
 
print("Naive Bayes Accuracy",acc_nb)

5* Visualize the results using confusion matrices (additional)

cm_nb = confusion_matrix(y_test, y_pred_nb)
 
sns.heatmap(cm_nb, annot=True, cmap="Greens")
plt.show()