Part 5: Updating Rows and Columns¶
This notebook shows how to modify DataFrames: renaming columns, updating values, applying transformations, and adding/removing rows.
Changing labels and values is how we clean and enrich data. This section shows patterns for renaming, mass-updating, safe assignment, and applying transformations.
Core goals:
Rename columns globally or partially.
Update specific rows/cells without chained-assignment issues.
Apply string and custom functions to Series/DataFrames.
Use
mapandreplacefor value recoding.Combine and split columns.
Append and remove rows in controlled ways.
1. Renaming columns in df_small¶
Clean, consistent column names improve readability and reduce downstream bugs.
import pandas as pd
people = {
"first": ["Alice", "Bob", "Carol", "Sarah"],
"last": ["Smith", "Jones", "Lee", "Jones"],
"email": [
"alice@example.com",
"bob@example.com",
"carol@example.com",
"sarah@example.com",
],
"uid": ["AS100293", "BJ240806", "CL150510", "SJ251203"],
"year": [1993, 2006, 2010, 2003],
"age": [32, 19, 15, 22],
}
df_small_orig = pd.DataFrame(people)# We create a copy that we can freely modify without modifying the original dataframe
df_small = df_small_orig.copy()
# 1a. Directly replace all column names
df_small.columns = ["first_name", "last_name", "email_address", "id_code", "byear", "age_2025"]
print("After assigning new column list:")
df_small.columns# 1b. Uppercase all column names via list comprehension
df_small.columns = [col.upper() for col in df_small.columns]
print("\nAfter uppercasing column names:")
df_small.columns# 1c. Modify names more flexibly: replace underscores with spaces (simulate if needed)
# First set back to some with underscores to demo
df_small.columns = [col.replace("_", " ") for col in df_small.columns]
print("\nAfter replacing underscores with spaces:")
df_small.columns# 1d. Rename only some columns (partial) using .rename
df_small = df_small.rename(columns={"FIRST NAME": "first", "LAST NAME": "last"})
print("\nAfter selective rename:")
df_small.columns2. Updating row values safely¶
These methods perform explicit assignment and avoid ambiguous chained operations.
# Reload the original dataframe
df_small = df_small_orig.copy()
# Update entire row 1 (zero-based index) with new data using .loc
df_small.loc[1] = ["Robert", "Jones", "robert@example.com", "RJ010100", 2000, 25]
print("After replacing all of row 1:")
df_small# Update only some columns in a row
df_small.loc[2, ["email"]] = ["carol.modified@example.com"]
print("\nAfter updating only email for row 2:")
df_small# Update a single value using .at
df_small.at[0, "last"] = "Smythe"
print("\nAfter changing Alice's last name via .at:")
df_smallWarning:
Avoid chained assignment like this (can produce SettingWithCopyWarning and unreliable behavior):
mask = df_small["last"] == "Jones"
# This is unsafe / may not persist:
df_small[mask]["last"] = "J."Correct:
df_small.loc[mask, "last"] = "J."3. String methods on Series¶
df["email"].str.lower() applies lowercase conversion elementwise across the entire column.
Vectorized .str operations are concise, readable, and usually faster than manual loops.
# Uppercase/lowercase transformations safely
df_small["email"] = df_small["email"].str.lower() # vectorized lowercase
print("\nEmails after lowercasing:")
df_small["email"]4. apply / applymap / map¶
Where to use each:
Series.apply(func): apply a function to each value in one column.DataFrame.apply(func): apply a function to each column (or row) as a Series.DataFrame.applymap(func): apply a function to every cell (elementwise).Series.map(mapping_or_func): map values using a dictionary or function.
These tools are useful because they handle custom transformations when built-in vectorized methods are not enough.
# apply on a Series: length of each email
email_lengths = df_small["email"].apply(len)
print("\nLength of each email (Series.apply):")
email_lengths# apply with a user-defined function
def shout(email):
return email.upper()print("\nEmail shouted via apply:")
df_small["email"].apply(shout)# with lambda
print("\nEmail first 5 chars using lambda:")
df_small["email"].apply(lambda x: x[:5])# Difference: df.apply(len) applies to each column (Series) -> gives number of non-null entries per column
print("\nUsing df.apply(len):")
df_small.apply(len)# applymap applies to every element of the DataFrame
print("\nApplying str.lower to every string cell (applymap):")
df_small.applymap(lambda x: x.lower() if isinstance(x, str) else x)5. map (elementwise with a mapping dict)¶
map vs replace for recoding values
map({...})on a Series recodes matching values; unmatched values becomeNaN.replace({...})changes only matched values and keeps unmatched values unchanged.
Choose map when you want strict remapping, replace when you want partial substitution without introducing NaN.
# Suppose we want to abbreviate first names
df_small["first_abbrev"] = df_small["first"].map({"Alice": "A.", "Robert": "R.", "Carol": "C."})
print("\nAfter mapping first names to abbreviations:")
df_small[["first", "first_abbrev"]]# Note: unmapped values become NaN; to avoid that, you can fill or use replace
df_small["first_abbrev_fallback"] = df_small["first"].replace({"Alice": "A.", "Robert": "R.", "Carol": "C."})
print("\nUsing replace (no NaNs for unmatched since exact replacements):")
df_small[["first", "first_abbrev_fallback"]]6. Big DataFrame: rename column and use map¶
This section demonstrates two practical cleaning steps on a large dataset:
Renaming
CompTotaltoSalaryfor clearer semantics.Mapping country names to short codes (
US,CH,DE, ...).
These operations make analysis outputs easier to read and standardize.
df = pd.read_csv("data/survey_results_public.csv")
# Rename 'CompTotal' to 'Salary'
df = df.rename(columns={"CompTotal": "Salary"})
print("Columns after renaming compensation:")
print([c for c in df.columns if "Salary" in c])
# Example map usage: create a simplified label for some countries
country_abbrev = {
"United States of America": "US",
"Switzerland": "CH",
"Germany": "DE",
"India": "IN",
"Canada": "CA",
}
df["country_code"] = df["Country"].map(country_abbrev)
print("\nSample country codes:")
df.loc[df["country_code"].notna(), ["Country", "country_code"]].head()7. Combine first & last into a full name, then split back¶
Combine columns:
df["full_name"] = df["first"] + " " + df["last"]Remove no-longer-needed columns:
drop(columns=[...])Split back with:
.str.split(" ", expand=True)
Feature engineering often requires building intermediate text fields and then re-normalizing structure.
# Combine into full_name
df_small["full_name"] = df_small["first"] + " " + df_small["last"]
print("After combining:")
df_small# Drop the original first & last columns
df_small.drop(columns=["first", "last"], inplace=True)
print("\nAfter dropping first & last:")
df_small# Split full_name back into first & last
df_small[["first", "last"]] = df_small["full_name"].str.split(" ", expand=True)
print("\nAfter splitting full_name:")
df_small8. Appending rows & the ignore_index=True quirk¶
Two common patterns:
pd.concat([df, new_df], ignore_index=True)to append one or many rows.df.loc[new_index] = {...}for quick insertion of a single row.
Why ignore_index=True matters: it rebuilds a clean continuous index after concatenation.
df_small = df_small_orig.copy()
# Add a new row using pd.concat
new_row = pd.DataFrame([{"first": "Tony"}])
# Method 1: Concatenate, resetting the index
df_small = pd.concat([df_small, new_row], ignore_index=True)
df_small# Method 2: Using .loc to append a new row
new_idx = len(df_small)
df_small.loc[new_idx] = {"first": "Tony"}
df_small9. Appending another DataFrame¶
df_small = df_small_orig.copy()
# Create a smaller DataFrame with same columns
new_people = pd.DataFrame({
"first": ["Diana", "Eli"],
"last": ["Prince", "Musk"],
"email": ["diana@themiscira.com", "eli@space.com"],
"full_name": ["Diana Prince", "Eli Musk"],
})
# Append new_people to df_small (resetting index for a clean result)
df_small = pd.concat([df_small, new_people], ignore_index=True)
print("\nCombined DataFrame:")
df_small10. Removing rows by index and by condition¶
drop(index=...)removes rows by index label.Combine with a mask to drop conditionally, e.g. rows where
last == "Lee".
Selective removal is a key cleanup step before statistics or model training.
# Remove a row by its index label, e.g., drop index 1
df_small = df_small.drop(index=1)
print("\nAfter dropping index=1:")
df_small# Remove rows where last name == "Lee"
mask_lee = df_small["last"] == "Lee"
df_no_lee = df_small.drop(index=df_small[mask_lee].index)
print("\nAfter dropping rows where last == 'Lee':")
df_no_leeExercise for Part 5¶
Exercise 5.1¶
Transform every column name so that it starts with an uppercase letter (e.g.
first→First,email→Email).Replace each name and last name with the first letter followed by a dot using
apply().Extract the domain (after the
@) from each email.
Exercise 5.2¶
Create an
Adultcolumn to beTruewhen the numeric age is > 18, andFalseotherwise (including NaN ages).
Solutions¶
#### YOUR CODE HERE ####