AITC Wiki

Final Practice2

期末练习 2

Final Practice2

中文版:期末练习 2

Task 2

Weather Forecast: Linear Regression

🎯 Task Description

You are provided with a synthetic weather forecast dataset containing the following features:

  • Temperature (°C)
  • Humidity (%)
  • WindSpeed (km/h)
  • Pressure (hPa)
  • RainToday (Yes/No)
  • RainTomorrow (0 = No, 1 = Yes) — Thisisthe target variable.

Your objectives are:

  1. Preprocess the data:
  • Convert categorical variables to numeric.
  • Split the dataset into training and testing sets.
  • Apply feature scaling to numerical columns.
  1. Train one predictive model:
  • Linear Regression
  1. Evaluate the predictions:
  • Make predictions onthetestset.
  • Calculate percentage error.

Try to predict the ‘Temperature’ with ‘Humidity’, ‘WindSpeed’, ‘Pressure’

1. Preprocess the data

a. use pd.read_csv() tosavethedataindf

b. use df.sample() and df.drop() to randomly split the dataset into a training set (70%) and a testing set (30%)

c. define features (X) and target variable (y) for both training andtestsets

e. standardization and normalization

import pandas as pd
from sklearn.preprocessing import StandardScaler, LabelEncoder
 
# Load and encode categorical feature
df = pd.read_csv("weather_data.csv")
# Shuffle and split manually using sample()
df_train = df.sample(frac=0.7, random_state=42)
df_test = df.drop(df_train.index)
# Feature and target separation
X_train = df_train[['Humidity', 'WindSpeed', 'Pressure']]
y_train = df_train['Temperature']
X_test = df_test[['Humidity', 'WindSpeed', 'Pressure']]
y_test = df_test['Temperature']
# Feature scaling
scaler = StandardScaler()
 # Use fit_transform() on the training settolearn and apply scaling.
X_train_scaled = scaler.fit_transform(X_train)
 # Use transform() (only) onthetestsettoavoid data leakage.
X_test_scaled = scaler.transform(X_test)

2. Train Linear Regression

from sklearn.linear_model import LinearRegression
 
lr = LinearRegression()
lr.fit(X_train_scaled, y_train)

3. Predict the ‘Temperature’ for Test dataset

y_pred_lr = lr.predict(X_test_scaled)
print(y_pred_lr)

4. Evaluate the predictions, by calculating the percentage error between the actual values and the predicted values.

import numpy as np
 
epsilon = 1e-8 # Avoid division byzeroby adding a small epsilon or filtering out zeros
percentage_error = np.abs((y_test - y_pred_lr) / (y_test + epsilon)) * 100
print("Percentage error for each prediction (%): \n", percentage_error)
 
mean_percentage_error = np.mean(percentage_error)
print("Mean percentage error (%): ", mean_percentage_error)