Final Practice1
中文版:期末练习 1
Task 1
- Import the pandas library in Python andusepdasanalias.
- Readthenba.csvfileusing pandas and store itina DataFrame named df
import pandas as pd #hint: import ... as ...
df = pd.read_csv('nba.csv') # pd.read_csv('name.csv')- Print the shape of df (number ofrowsand columns) andlistall column names.
Answer the questions: how many persons are listed inthedf, andhowmanytypes of information for each person inthedf, andwhatarethey?
# hint: df.shape df.columns.tolist()
print("sizes of dimensions(rows, cols):", df.shape)
print("types of information:", df.columns.tolist())- Print NameandAge columns
print(df[["Name", "Age"]])- Print thedatatypes of each column (e.g., int, float, string), and check the memory usage of df.
# hint: df.dtypes
print("datatypeforcols:\n", df.dtypes)
print("memory usage:")
df.info(memory_usage="deep")- Print all information ofthefirst two persons (= the first two rows).
# hint: df.head(number) or df.iloc[index]
print("information of first two persons:\n", df.head(2))
# Or
print("information of first two persons:\n", df.iloc[0:2])- Print thelastfiverows.
# hint: df.tail() or df.iloc[:]
print("thelastfiverows:\n", df.tail(5))
# Or
print("thelastfiverows:\n", df.iloc[-5:])- Calculate the average height and maximum ageofall players.
# hint df["type.name"].mean() df["type.name"].max()
average_height = df["Height"].mean()
max_age = df["Age"].max()
# Print the results
print("Average Height:", average_height)
print("Maximum Age:", max_age)- Delete all player records with missing College information.
# hint df.dropna()
df = df.dropna(subset=["College"])- Handle missing values in the Salary column (represented as ”—”): replace themwiththe average salary of all players.
# Replace "--" with pd.NA to prepare for numeric conversion
# pd.NA can represent missing value in numeric type
df["Salary"] = df["Salary"].replace("--", pd.NA)
# Convert Salary column to float type
df["Salary"] = pd.to_numeric(df["Salary"], errors="coerce")
# Calculate average salary using df["type.name"].mean()
mean_salary = df["Salary"].mean()
# Fill missing Salary values with the average, using df["type.name"].fillna(filled_value)
df["Salary"] = df["Salary"].fillna(mean_salary)- Foreachteam, calculate the average age, average height, and average salary of its players
# hint: a.groupby('type_name').[['type_name1','type_name2','type_name3']].mean()
print(df.groupby('Team')['Age'].mean())
print(df.groupby('Team')['Height'].mean())
print(df.groupby('Team')['Salary'].mean())- Select the teams Boston Celtics, Chicago Bulls, Miami Heat, New York Knicks, and Washington Wizards. Create abarchart toshowthe average age of players ineachteam.
import matplotlib.pyplot as plt
# Select target teams
teams_of_interest = [
"Boston Celtics", "Chicago Bulls", "Miami Heat",
"New York Knicks", "Washington Wizards"
]
# Filter the DataFrame
filtered_df = df[df["Team"].isin(teams_of_interest)]
# Group byteamand calculate average age
avg_age = filtered_df.groupby("Team")["Age"].mean()
print(avg_age)
# Plotthebarchart
plt.figure()
plt.bar(teams_of_interest, avg_age)
plt.title("Average Age of Players by Team")
plt.xlabel("Team")
plt.ylabel("Average Age")
# plt.xticks(rotation=45) # you could tryAttention!
Complete all plotting operations within thesamecell.
Otherwise:
plt.title("Average Age of Players by Team")- Group all players byageand display the average, maximum, and minimum salary foreachagegroup. Create alinechart toshowthe relationship between age and (average,max,min) salary. Besuretosetan appropriate title,axis labels and legend.
age_salary_stats = df.groupby('Age')['Salary'].agg(['mean', 'max', 'min']).reset_index()
# Print the resulting DataFrame
print("Average, Max, and Min Salary by Age:")
print(age_salary_stats)
# Create thelinechart
plt.figure(figsize=(10, 5))
# Plot mean, max, and min salary by age
plt.plot(age_salary_stats['Age'], age_salary_stats['mean'], label='Average Salary', marker='o')
plt.plot(age_salary_stats['Age'], age_salary_stats['max'], label='Max Salary', marker='^')
plt.plot(age_salary_stats['Age'], age_salary_stats['min'], label='Min Salary', marker='s')
# Set chart title and axis labels
plt.title('Relationship between Age and Salary (Average / Max / Min)')
plt.xlabel('Age')
plt.ylabel('Salary (USD)')
# Add legend
plt.legend()
# Showthechart
plt.show()