Q&A 16 How do you group and summarize data in Python and R?
16.1 Explanation
Grouping data lets you compare subsets — like computing the average petal length for each species. This is a powerful technique for understanding patterns and trends.
16.2 Python Code
import pandas as pd
# Load dataset
df = pd.read_csv("data/iris.csv")
# Group by species and compute average petal length
grouped = df.groupby("species")["petal_length"].mean().reset_index()
print(grouped) species petal_length
0 setosa 1.462
1 versicolor 4.260
2 virginica 5.552
16.3 R Code
library(readr)
library(dplyr)
# Load dataset
df <- read_csv("data/iris.csv")
# Group by species and summarize
df_summary <- df %>%
group_by(species) %>%
summarise(avg_petal_length = mean(petal_length, na.rm = TRUE))
df_summary# A tibble: 3 × 2
species avg_petal_length
<chr> <dbl>
1 setosa 1.46
2 versicolor 4.26
3 virginica 5.55
✅ Grouping and summarizing help uncover relationships across categories in your data.