Q&A 15 How do you convert variable types in Python and R?

15.1 Explanation

Sometimes your data has columns in the wrong type — for example, a numeric column stored as text, or a categorical variable treated as a string. This can affect grouping, plotting, or modeling.

In this example, we’ll convert:

  • The species column to a categorical variable
  • A numeric column to string (for labeling)

15.2 Python Code

import pandas as pd

# Load dataset
df = pd.read_csv("data/iris.csv")

# Convert species to categorical
df["species"] = df["species"].astype("category")

# Convert sepal_length to string (optional use case)
df["sepal_length_str"] = df["sepal_length"].astype(str)

# Confirm types
print(df.dtypes.head())
sepal_length     float64
sepal_width      float64
petal_length     float64
petal_width      float64
species         category
dtype: object

15.3 R Code

library(readr)
library(dplyr)

# Load dataset
df <- read_csv("data/iris.csv")

# Convert species to factor
df <- df %>%
  mutate(species = as.factor(species))

# Convert sepal_length to character
df <- df %>%
  mutate(sepal_length_str = as.character(sepal_length))

# Confirm structure
str(df)
tibble [150 × 6] (S3: tbl_df/tbl/data.frame)
 $ sepal_length    : num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ sepal_width     : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ petal_length    : num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ petal_width     : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ species         : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ sepal_length_str: chr [1:150] "5.1" "4.9" "4.7" "4.6" ...

✅ Converting variable types ensures that each column behaves correctly in your analysis or visualization.