AITC Wiki

Final Practice1

期末练习 1

Final Practice1

中文版:期末练习 1

Task 1

  1. Import the pandas library in Python andusepdasanalias.
  2. Readthenba.csvfileusing pandas and store itina DataFrame named df
import pandas as pd #hint: import ... as ...
df = pd.read_csv('nba.csv') # pd.read_csv('name.csv')
  1. Print the shape of df (number ofrowsand columns) andlistall column names.

Answer the questions: how many persons are listed inthedf, andhowmanytypes of information for each person inthedf, andwhatarethey?

# hint: df.shape df.columns.tolist()
print("sizes of dimensions(rows, cols):", df.shape)
print("types of information:", df.columns.tolist())
  1. Print NameandAge columns
print(df[["Name", "Age"]])
  1. Print thedatatypes of each column (e.g., int, float, string), and check the memory usage of df.
# hint: df.dtypes
print("datatypeforcols:\n", df.dtypes)
print("memory usage:")
df.info(memory_usage="deep")
  1. Print all information ofthefirst two persons (= the first two rows).
# hint: df.head(number) or df.iloc[index]
 
print("information of first two persons:\n", df.head(2))
# Or
print("information of first two persons:\n", df.iloc[0:2])
  1. Print thelastfiverows.
# hint: df.tail() or df.iloc[:]
 
print("thelastfiverows:\n", df.tail(5))
# Or
print("thelastfiverows:\n", df.iloc[-5:])
  1. Calculate the average height and maximum ageofall players.
# hint df["type.name"].mean() df["type.name"].max()
average_height = df["Height"].mean()
max_age = df["Age"].max()
 
# Print the results
print("Average Height:", average_height)
print("Maximum Age:", max_age)
  1. Delete all player records with missing College information.
# hint df.dropna()
df = df.dropna(subset=["College"])
  1. Handle missing values in the Salary column (represented as ”—”): replace themwiththe average salary of all players.
# Replace "--" with pd.NA to prepare for numeric conversion
# pd.NA can represent missing value in numeric type
df["Salary"] = df["Salary"].replace("--", pd.NA)
 
# Convert Salary column to float type
df["Salary"] = pd.to_numeric(df["Salary"], errors="coerce")
 
# Calculate average salary using df["type.name"].mean()
mean_salary = df["Salary"].mean()
 
# Fill missing Salary values with the average, using df["type.name"].fillna(filled_value)
df["Salary"] = df["Salary"].fillna(mean_salary)
  1. Foreachteam, calculate the average age, average height, and average salary of its players
# hint: a.groupby('type_name').[['type_name1','type_name2','type_name3']].mean()
print(df.groupby('Team')['Age'].mean())
print(df.groupby('Team')['Height'].mean())
print(df.groupby('Team')['Salary'].mean())
  1. Select the teams Boston Celtics, Chicago Bulls, Miami Heat, New York Knicks, and Washington Wizards. Create abarchart toshowthe average age of players ineachteam.
import matplotlib.pyplot as plt
 
# Select target teams
teams_of_interest = [
 "Boston Celtics", "Chicago Bulls", "Miami Heat",
 "New York Knicks", "Washington Wizards"
]
 
# Filter the DataFrame
filtered_df = df[df["Team"].isin(teams_of_interest)]
 
# Group byteamand calculate average age
avg_age = filtered_df.groupby("Team")["Age"].mean()
print(avg_age)
 
# Plotthebarchart
plt.figure()
plt.bar(teams_of_interest, avg_age)
plt.title("Average Age of Players by Team")
plt.xlabel("Team")
plt.ylabel("Average Age")
# plt.xticks(rotation=45) # you could try
Attention!

Complete all plotting operations within thesamecell.

Otherwise:

plt.title("Average Age of Players by Team")
  1. Group all players byageand display the average, maximum, and minimum salary foreachagegroup. Create alinechart toshowthe relationship between age and (average,max,min) salary. Besuretosetan appropriate title,axis labels and legend.
age_salary_stats = df.groupby('Age')['Salary'].agg(['mean', 'max', 'min']).reset_index()
 
# Print the resulting DataFrame
print("Average, Max, and Min Salary by Age:")
print(age_salary_stats)
 
# Create thelinechart
plt.figure(figsize=(10, 5))
# Plot mean, max, and min salary by age
plt.plot(age_salary_stats['Age'], age_salary_stats['mean'], label='Average Salary', marker='o')
plt.plot(age_salary_stats['Age'], age_salary_stats['max'], label='Max Salary', marker='^')
plt.plot(age_salary_stats['Age'], age_salary_stats['min'], label='Min Salary', marker='s')
 
# Set chart title and axis labels
plt.title('Relationship between Age and Salary (Average / Max / Min)')
plt.xlabel('Age')
plt.ylabel('Salary (USD)')
# Add legend
plt.legend()
 
# Showthechart
plt.show()