Q&A 16 How do you group and summarize data in Python and R?

16.1 Explanation

Grouping data lets you compare subsets — like computing the average petal length for each species. This is a powerful technique for understanding patterns and trends.


16.2 Python Code

import pandas as pd

# Load dataset
df = pd.read_csv("data/iris.csv")

# Group by species and compute average petal length
grouped = df.groupby("species")["petal_length"].mean().reset_index()

print(grouped)
      species  petal_length
0      setosa         1.462
1  versicolor         4.260
2   virginica         5.552

16.3 R Code

library(readr)
library(dplyr)

# Load dataset
df <- read_csv("data/iris.csv")

# Group by species and summarize
df_summary <- df %>%
  group_by(species) %>%
  summarise(avg_petal_length = mean(petal_length, na.rm = TRUE))

df_summary
# A tibble: 3 × 2
  species    avg_petal_length
  <chr>                 <dbl>
1 setosa                 1.46
2 versicolor             4.26
3 virginica              5.55

✅ Grouping and summarizing help uncover relationships across categories in your data.