Final Practice2
中文版:期末练习 2
Task 2
Weather Forecast: Linear Regression
🎯 Task Description
You are provided with a synthetic weather forecast dataset containing the following features:
- Temperature (°C)
- Humidity (%)
- WindSpeed (km/h)
- Pressure (hPa)
- RainToday (Yes/No)
- RainTomorrow (0 = No, 1 = Yes) — Thisisthe target variable.
Your objectives are:
- Preprocess the data:
- Convert categorical variables to numeric.
- Split the dataset into training and testing sets.
- Apply feature scaling to numerical columns.
- Train one predictive model:
- Linear Regression
- Evaluate the predictions:
- Make predictions onthetestset.
- Calculate percentage error.
Try to predict the ‘Temperature’ with ‘Humidity’, ‘WindSpeed’, ‘Pressure’
1. Preprocess the data
a. use pd.read_csv() tosavethedataindf
b. use df.sample() and df.drop() to randomly split the dataset into a training set (70%) and a testing set (30%)
c. define features (X) and target variable (y) for both training andtestsets
e. standardization and normalization
import pandas as pd
from sklearn.preprocessing import StandardScaler, LabelEncoder
# Load and encode categorical feature
df = pd.read_csv("weather_data.csv")# Shuffle and split manually using sample()
df_train = df.sample(frac=0.7, random_state=42)
df_test = df.drop(df_train.index)# Feature and target separation
X_train = df_train[['Humidity', 'WindSpeed', 'Pressure']]
y_train = df_train['Temperature']
X_test = df_test[['Humidity', 'WindSpeed', 'Pressure']]
y_test = df_test['Temperature']# Feature scaling
scaler = StandardScaler()
# Use fit_transform() on the training settolearn and apply scaling.
X_train_scaled = scaler.fit_transform(X_train)
# Use transform() (only) onthetestsettoavoid data leakage.
X_test_scaled = scaler.transform(X_test)2. Train Linear Regression
from sklearn.linear_model import LinearRegression
lr = LinearRegression()
lr.fit(X_train_scaled, y_train)3. Predict the ‘Temperature’ for Test dataset
y_pred_lr = lr.predict(X_test_scaled)
print(y_pred_lr)4. Evaluate the predictions, by calculating the percentage error between the actual values and the predicted values.
import numpy as np
epsilon = 1e-8 # Avoid division byzeroby adding a small epsilon or filtering out zeros
percentage_error = np.abs((y_test - y_pred_lr) / (y_test + epsilon)) * 100
print("Percentage error for each prediction (%): \n", percentage_error)
mean_percentage_error = np.mean(percentage_error)
print("Mean percentage error (%): ", mean_percentage_error)