4

Why Data Cleaning?

 2 years ago
source link: https://dev.to/codewithsom/why-data-cleaning-2eof
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

~ "Garbage In, Garbage Out": Bad data will lead to bad results, plain and simple.
~ It's hard for computers to judge whether the data makes sense or not.
~ To get accurate results, you need to remove errors from you data which confuses the algorithms.
~ It's time-consuming process but important.

What are the causes?

  • Input Errors
  • Duplicates
  • Mangled Data
  • Malfunctioning Sensors
  • Lack of Standardization

Identifying Problems

  • Range Constraints
  • Data-Type
  • Compulsory Constraints
  • Unique Constraints
  • Cross Field Constraints

Data Cleaning Techniques

  • Removing missing data
  • Direct correction
  • Normalization
  • Syntax errors
  • Data Imputation
  • Spell Check
  • Filter Unwanted Outliers
  • Remove Irrelevant Values
  • Fix structural errors

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK