A Practical Guide on Missing Values with Pandas
source link: https://towardsdatascience.com/a-practical-guide-on-missing-values-with-pandas-8fb3e0b46c24?gi=bcc4ba6a908b
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Photo by Zach Lucero on Unsplash
Missing values indicate we do not have the information about a feature (column) of a particular observation (row). Why not just remove that observation from the dataset and go ahead? We can but should not. The reasons are:
- We typically have many features of an observation so we don’t want to lose the observation just because of one missing feature. Data is valuable.
- We typically have more than one observation with missing values. In some cases, we cannot afford to remove many observations from the dataset. Again, data is valuable.
In this post, we will go through how to detect and handle missing values as well as some key points to keep in mind.
The outline of the post:
- Missing value markers
- Detecting missing values
- Calculations with missing values
- Handling missing values
As always, we start with importing numpy and pandas.
import numpy as np import pandas as pd
Missing value markers
The default missing value representation in Pandas is NaN but Python’s None is also detected as missing value.
s = pd.Series([1, 3, 4, np.nan, None, 8]) s
Although we created a series with integers, the values are upcasted to float because np.nan is float. A new representation for missing values is introduced with Pandas 1.0 which is <NA> . It can be used with integers without causing upcasting. We need to explicitly request the dtype to be pd.Int64Dtype().
s = pd.Series([1, 3, 4, np.nan, None, 8], dtype=pd.Int64Dtype()) s
The integer values are not upcasted to float.
Another missing value representation is NaT which is used to represent datetime64[ns] datatypes.
Note: np.nan’s do not compare equal whereas None’s are considered as equal.
Note: Not all missing values come in nice and clean np.nan or None format. For example, the dataset we work on may include “?” and “- -“ values in some cells. We can convert them to np.nan representation when reading the dataset into a pandas dataframe. We just need to pass these values to na_values
parameter.
Recommend
-
58
In this article we will discuss how to sort rows in ascending and descending order based on values in a single or multiple columns . Also, how to sort columns based on values in rows using DataFrame.sort_values()
-
30
3 Highly Practical Operations of Pandas Sample, where, isin explained in detail with examples. Photo by
-
9
Replace NaN Values with Zeros in Pandas DataFrame Last Updated: 03-07-2020 NaN stands for Not A Number and is one of the common ways to represent the missing value in the data. It is a special...
-
6
Create a pandas data frame with the date index and the random values in the column advertisements How do I create a pandas dataframe with da...
-
8
Count Unique Values in a Column – thisPointer.comThis article will discuss different ways to Count unique values in a Dataframe Column in Python. First of all, we will create a sample Dataframe from a list of tuples i.e. ...
-
2
Count Unique Values in all Columns of Pandas Dataframe – thisPointer.comSkip to content This article will discuss different ways to...
-
7
Pandas | Count non-zero values in Dataframe Column This article will discuss how to count the number of non-zero values in one or more Dataframe columns in Pandas. Let’s first create a Dataframe from a...
-
10
Pandas – Count True Values in a Dataframe Column In this article, we will discuss different ways to count True values in a Dataframe Column. First of all, we will create a Dataframe from a list of tuples...
-
13
Pandas – Check if all values in a Column are Equal This article will discuss how to check if all values in a DataFrame Column are the same. First of all, we will create a DataFrame from a list of tuples,
-
4
Replacing Pandas with Polars. A Practical Guide...
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK