Source Of The Section

Sec1_DM_ANU_Spring2025.pdf

Data Preprocessing

Goals of Data Preprocessing:

Stages of Data Preprocessing:

  1. Data Cleaning
  2. Data Integration
  3. Data Transformation
  4. Data Reduction

Creating Data

import pandas as pd

# Define the data as a dictionary
data_dict = {
    "CustomerID": [1001, 1002, 1003, 1004, 1005, 1005, 1006],
    "Gender": ["M", "F", None, "M", "F", "F", "F"],
    "Income": [75000, 40000, 10000000, 50000, 99999, 99999, 45000],
    "Age": [30, 40, 45, 20, 30, 30, None],
    "MaritalStatus": ["M", "W", "s", "S", "D", "D", "M"],
    "Transaction Amount": [5000, 4000, 7000, None, 3000, 3000, 1000],
    "Date": ["12/1/2020", "12/2/2020", "12/3/2020", "12/4/2020", "12/5/2020", "12/5/2020", "12/6/2020"]
}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data_dict)

# Display the DataFrame
df

Exploratory Data Analysis

# Check dataset dimensions
df.shape