Q&A 18 How do you subset specific columns in Python and R?

18.1 Explanation

You may want to work with only a few columns at a time — for visualization, inspection, or modeling. This helps reduce clutter and focus on key variables.


18.2 Python Code

import pandas as pd

# Load dataset
df = pd.read_csv("data/iris.csv")

# Select specific columns
subset = df[["sepal_length", "sepal_width", "species"]]

print(subset.head())
   sepal_length  sepal_width species
0           5.1          3.5  setosa
1           4.9          3.0  setosa
2           4.7          3.2  setosa
3           4.6          3.1  setosa
4           5.0          3.6  setosa

18.3 R Code

library(readr)
library(dplyr)

# Load dataset
df <- read_csv("data/iris.csv")

# Select specific columns
subset <- df %>%
  select(sepal_length, sepal_width, species)

head(subset)
# A tibble: 6 × 3
  sepal_length sepal_width species
         <dbl>       <dbl> <chr>  
1          5.1         3.5 setosa 
2          4.9         3   setosa 
3          4.7         3.2 setosa 
4          4.6         3.1 setosa 
5          5           3.6 setosa 
6          5.4         3.9 setosa 

✅ Subsetting lets you focus your analysis on the most relevant columns.