practice Predict solution
中文版:预测练习参考答案
Weather Forecast: Logistic Regression and Naive Bayes
🎯 Task Description
You are provided with a synthetic weather forecast dataset containing the following features:
- Temperature (°C)
- Humidity (%)
- WindSpeed (km/h)
- Pressure (hPa)
- RainToday (Yes/No)
- RainTomorrow (0 = No, 1 = Yes) — This is the target variable.
Your objectives are:
- Preprocess the data:
- Convert categorical variables to numeric.
- Split the dataset into training and testing sets.
- Apply feature scaling to numerical columns.
- Train two predictive models:
- Linear Regression
- Naive Bayes Classifier
- Evaluate the models:
- Make predictions on the test set.
- Calculate accuracy scores.
- Compare the performance。
Try to predict the ‘Temperature’ with ‘Humidity’, ‘WindSpeed’, ‘Pressure’
1. Preprocess the data
a. use pd.read_csv() to save the data in df
b. use df.sample() and df.drop() to randomly split the dataset into a training set (70%) and a testing set (30%)
c. define features (X) and target variable (y) for both training and test sets
e. standardization and normalization
import pandas as pd
from sklearn.preprocessing import StandardScaler, LabelEncoder
# Load and encode categorical feature
df = pd.read_csv("weather_data.csv")# Shuffle and split manually using sample()
df_train = df.sample(frac=0.7, random_state=42)
df_test = df.drop(df_train.index)# Feature and target separation
X_train = df_train[['Humidity', 'WindSpeed', 'Pressure']]
y_train = df_train['Temperature']
X_test = df_test[['Humidity', 'WindSpeed', 'Pressure']]
y_test = df_test['Temperature']# Feature scaling
scaler = StandardScaler()
# Use fit_transform() on the training set to learn and apply scaling.
X_train_scaled = scaler.fit_transform(X_train)
# Use transform() (only) on the test set to avoid data leakage.
X_test_scaled = scaler.transform(X_test)2. Train Linear Regression
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train_scaled, y_train)3. Predict the ‘Temperature’ for Test dataset
y_pred_lr = lr.predict(X_test_scaled)
print(y_pred_lr)4. Evaluate the model, by checking the distance between prediction and ground truth
from sklearn.metrics import mean_absolute_error
mae = mean_absolute_error(y_test, y_pred_lr)
print("MAE:", mae)Try to predict if it will rain (‘RainTomorrow’) with ‘Temperature’, ‘Humidity’, ‘WindSpeed’, ‘Pressure’, ‘RainToday’
- 0 ⇒ No
- 1 ⇒ Yes
1. Preprocessing
Define features (X) and target variable (y) for both training and test sets
Standardization and normalization
# Feature and target separation
X_train = df_train[['Temperature', 'Humidity', 'WindSpeed', 'Pressure', 'RainToday']]
y_train = df_train['RainTomorrow']
X_test = df_test[['Temperature', 'Humidity', 'WindSpeed', 'Pressure', 'RainToday']]
y_test = df_test['RainTomorrow']
# Feature scaling
scaler = StandardScaler()
# Use fit_transform() on the training set to learn and apply scaling.
X_train_scaled = scaler.fit_transform(X_train)
# Use transform() (only) on the test set to avoid data leakage.
X_test_scaled = scaler.transform(X_test)
print(y_test)2. Train Naive Bayes
from sklearn.naive_bayes import GaussianNB
nb = GaussianNB()
nb.fit(X_train, y_train) # Naive Bayes works well with raw features (without Feature scaling)3. Predict
y_pred_nb = nb.predict(X_test)
print(y_pred_nb)4. Evaluate the model
from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt
acc_nb = accuracy_score(y_test, y_pred_nb)
print("Naive Bayes Accuracy",acc_nb)5* Visualize the results using confusion matrices (additional)
cm_nb = confusion_matrix(y_test, y_pred_nb)
sns.heatmap(cm_nb, annot=True, cmap="Greens")
plt.show()