Q&A 19 How do you sample rows randomly in Python and R?

19.1 Explanation

Sampling is useful when working with large datasets, doing quick checks, or creating training/test splits. You can randomly select a few rows for inspection.


19.2 Python Code

import pandas as pd

# Load dataset
df = pd.read_csv("data/iris.csv")

# Random sample of 5 rows
sampled = df.sample(n=5, random_state=42)

print(sampled)
     sepal_length  sepal_width  petal_length  petal_width     species
73            6.1          2.8           4.7          1.2  versicolor
18            5.7          3.8           1.7          0.3      setosa
118           7.7          2.6           6.9          2.3   virginica
78            6.0          2.9           4.5          1.5  versicolor
76            6.8          2.8           4.8          1.4  versicolor

19.3 R Code

library(readr)
library(dplyr)

# Load dataset
df <- read_csv("data/iris.csv")

# Random sample of 5 rows
set.seed(42)
sampled <- df %>%
  sample_n(5)

sampled
# A tibble: 5 × 5
  sepal_length sepal_width petal_length petal_width species   
         <dbl>       <dbl>        <dbl>       <dbl> <chr>     
1          5.3         3.7          1.5         0.2 setosa    
2          5.6         2.9          3.6         1.3 versicolor
3          6.1         2.8          4.7         1.2 versicolor
4          6.7         3            5.2         2.3 virginica 
5          5.6         2.8          4.9         2   virginica 

✅ Sampling allows you to explore or test your data without loading the entire dataset.