Work World
124,040 views
31 min · 3 min read
9 steps
Advanced

How to clean and format messy data in Excel for analysis

Messy data is frustrating but fixable. With a few systematic steps in Excel you can turn inconsistent, duplicate, and misformatted data into tidy tables ready for analysis in 10–45 minutes depending on size. This guide walks you through practical, repeatable actions so your dataset becomes accurate, consistent, and easy to analyze.

Verified by pleasexplain editors
  1. Step 1: Make a backup copy

    Immediately save a duplicate of the original file (File > Save As) or create a timestamped copy like Data_backup_2026-05-03.xlsx. Working on a copy prevents accidental data loss and lets you compare cleaned results to the raw source later.

    [Illustration: two Excel files side-by-side, original and backup with timestamped name]

  2. Step 2: Scan and document issues

    Quickly scan columns and 50–200 random rows to note common problems: blanks, mixed types, extra spaces, merged cells, or inconsistent date formats. Write a one-page checklist of problems to fix so you don’t miss recurring issues during cleaning.

    [Illustration: spreadsheet with highlighted cells and a sticky note checklist beside it]

  3. Step 3: Convert text to proper types

    Select columns and use Data > Text to Columns or Value/DATEVALUE formulas to convert numbers stored as text and strings that represent dates into native numeric or date types. Proper types allow sorting, filtering, and calculations to work correctly.

    [Illustration: Excel ribbon open on Text to Columns dialog with column of date strings being converted]

  4. Step 4: Trim, clean, and normalize text

    Use TRIM to remove extra spaces and CLEAN to delete nonprinting characters; combine with UPPER/LOWER/PROPER to normalize case. Apply formulas to a new column, verify results for 20–50 rows, then replace originals using Paste Values to avoid formula fragility.

    [Illustration: column before and after applying TRIM and PROPER, showing cleaned names]

  5. Step 5: Split and combine columns

    Use Text to Columns (delimiter or fixed width) to split concatenated fields like Full Name into First/Last, or use CONCAT/CONCATENATE/TEXTJOIN to combine fields. Keep copies of original columns until validation is complete.

    [Illustration: Full Name column split into First and Last name columns with delimiter dialog visible]

  6. Step 6: Remove duplicates and validate uniques

    Use Data > Remove Duplicates to eliminate exact duplicates, then apply COUNTIFS to check for unexpected duplicates on key fields. Inspect 5–10 example records flagged as duplicates before deleting to avoid removing legitimate entries.

    [Illustration: Remove Duplicates dialog open and a sample results pane showing duplicate counts]

  7. Step 7: Standardize dates and numbers

    Convert all dates to a single consistent format (ISO yyyy-mm-dd recommended) using TEXT or Format Cells, and ensure numeric columns use a consistent decimal separator and rounding (ROUND to 2 decimals if needed). Consistency prevents aggregation errors during analysis.

    [Illustration: column formatted to yyyy-mm-dd with numbers rounded to two decimal places]

  8. Step 8: Fill or mark missing values

    Decide on treatment: fill missing numeric values with median or group mean, fill categorical blanks with 'Unknown' or use interpolation for time series. Document the rule and create an indicator column (e.g., Price_missing=1) so analyses can account for imputed values.

    [Illustration: spreadsheet showing imputed values and an adjacent indicator column labeled _missing]

  9. Step 9: Create a clean output sheet and save

    Copy validated columns to a new worksheet named Clean_Data, remove helper columns and formulas by pasting values, then save as a new file (CSV or XLSX) and note cleaning steps in a README sheet for reproducibility.

    [Illustration: Excel workbook with a Clean_Data tab and a README sheet listing cleaning steps]


  • Work in short 20–30 minute sessions to avoid mistakes from fatigue.
  • Use Excel Tables (Ctrl+T) early so formulas and filters auto-expand on added rows.
  • Record common transformations as formulas in a helper column first, then convert to values when verified.
  • Use Find & Replace for consistent fixes like changing ‘N/A’, ‘na’, and ‘-’ to blank or a standard token.
  • Keep a small sample (100–500 rows) to prototype cleaning steps before applying to the full dataset.
  • Use conditional formatting to visually spot outliers or inconsistent formats quickly.

  • Avoid irreversible actions on the original file; always keep a backup copy.
  • Don’t blindly delete rows flagged as duplicates; visually inspect matches to avoid losing legitimate distinct records.
  • Be cautious when imputing missing data—improper imputation can bias results; document the method and percentage imputed.
  • When converting text to numbers or dates, check for locale issues (comma vs period) to prevent wrong conversions.

Was this guide helpful?