Select and Filter Data Using Pandas in Python
source link: https://www.journaldev.com/54101/select-and-filter-data-pandas-python
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
As we deal with data all day, we know how good is Pandas library for data manipulation and analysis in python. Most of our time is being spent on data cleaning and analysis.
While analyzing the data to gather key insights, you should go in and out of the data.
For this, Pandas offers many methods which will help you to manipulate the data to find patterns in it. In this story, we will be focusing on methods to select and filter data using Pandas in python.
Select and Filter Data Operations using Pandas
- The data select operations using pandas include accessing the data we are interested in.
- You can make use of square brackets ” [ ] “ to access the data in particular column.
- The other method to access the data is using loc and iloc in pandas.
- When it comes to data filtering, you can use ” [ ] ” brackets to pass the data threshold to access it.
- You can also make use of logical conditions to filter the data using logical operators such as AND ( & ).
Installing Pandas
First things first! We need to install and import the Pandas library which is mostly used for data manipulation and analysis.
#Install pandas
pip install pandas
#load the library
import
pandas as pd
Well, we have successfully installed and imported the library. Now, let’s load the mtcars dataset.
#import the data
data
=
pd.read_csv(
'mtcars.csv'
)
data.head(
5
)
- This dataset has 32 rows and 12 attributes.
Select Data Methods
In the data selection operations, we are going to select particular data attributes/columns. We are going to see –
- ” [ ] ” method.
- loc and iloc method.
1. Select the Data using ” [ ] “
If you have to access/select data from a particular attribute/column, you have to mention the attribute name in the brackets as shown below.
#Select particular column
data[
'model'
]
0 Mazda RX4 1 Mazda RX4 Wag 2 Datsun 710 3 Hornet 4 Drive 4 Hornet Sportabout 5 Valiant 6 Duster 360 7 Merc 240D 8 Merc 230 9 Merc 280 10 Merc 280C 11 Merc 450SE 12 Merc 450SL 13 Merc 450SLC 14 Cadillac Fleetwood 15 Lincoln Continental 16 Chrysler Imperial 17 Fiat 128 18 Honda Civic 19 Toyota Corolla 20 Toyota Corona 21 Dodge Challenger 22 AMC Javelin 23 Camaro Z28 24 Pontiac Firebird 25 Fiat X1-9 26 Porsche 914-2 27 Lotus Europa 28 Ford Pantera L 29 Ferrari Dino 30 Maserati Bora 31 Volvo 142E Name: model, dtype: object
We got the data we have selected! It’s much simple right!
2. Select the Data without brackets
Yes! you can select the data from a particular column without brackets. Observe/run the below code to understand better.
#select data without brackets
data.model.head(
5
)
0 Mazda RX4 1 Mazda RX4 Wag 2 Datsun 710 3 Hornet 4 Drive 4 Hornet Sportabout Name: model, dtype: object
I have limited the rows to 5 to manage space. Because the previous output went too long :P. You can observe that even without brackets we are getting the same result.
3. Select the Data using loc
loc method in pandas is used to select a particular range of data. You can select the range of both rows and columns using loc. Let’s see how it works.
#using the loc method
data.loc[
1
:
5
,
'model'
]
0 Mazda RX4 1 Mazda RX4 Wag 2 Datsun 710 3 Hornet 4 Drive 4 Hornet Sportabout Name: model, dtype: object
The loc method in pandas is a label-based method where it takes labels ( row and column name) as input and returns the relevant range as shown above.
#Single row with all columns
data.loc[
5
, :]
model Valiant mpg 18.1 cyl 6 disp 225 hp 105 drat 2.76 wt 3.46 qsec 20.22 vs 1 am 0 gear 3 carb 1 Name: 5, dtype: object
Here, I am selecting the 5th row with all the columns using loc method.
4. Select the Data using iloc
The iloc method is based on integer-based and all the inputs should be in the form of integers. You can learn more from the below examples.
#5 rows with 2 columns
data.iloc[
1
:
5
,
2
:
4
]
cyl disp
1 6 160.0
2 4 108.0
3 6 258.0
4 8 360.0
You can observe that the inputs are in the form of integers.
#All rows of first column
data.iloc[:,
0
].head(
5
)
0 Mazda RX4 1 Mazda RX4 Wag 2 Datsun 710 3 Hornet 4 Drive 4 Hornet Sportabout Name: model, dtype: object
Here, we have selected all the rows of the first column. Don’t mind that I have limited the output display to 5 to save space 😛
Filter Data Methods
Filtering the data is often useful as you can get the specific range of the data easily. To filter the data, we mostly use logical operators. Don’t worry if you didn’t get it. You will get better with examples.
- Equal to ( == )
- Less than ( < ) and Greater than ( > )
- AND ( & ) operators.
We will be using the above three methods to filter the data.
1. Equal To ( == )
You can filter the data using the equal to operator as shown below.
#data equal to model Datsun 710
data[data.model
=
=
'Datsun 710'
]
model mpg cyl disp hp drat wt qsec vs am gear carb
2 Datsun 710 22.8 4 108.0 93 3.85 2.32 18.61 1 1 4 1
That’s cool! We have got all the information about a particular model – ‘ Datsun 710’
2. Less than ( < ) and Greater than ( > )
We can use the less than and greater than conditions to filter the data as shown in the below example.
#Less than operator
data[data.mpg <
15
]
Well, we have 5 models whose mpg is less than 15 miles per gallon.
#Greater than
data[data.hp >
175
]
We got 10 models whose HP is greater than 175. You can observe that our index is shuffled. So, you can use the reset_index function to reset the index. The code for resetting index –
#reset index
data[data.hp >
175
].reset_index()
AND ( & ) Filter Method
The AND logical operator is more useful when you are using multiple filters. Suppose if you want to use less than and greater than filters at once, you can make use of AND ( & ) operator to facilitate the cause.
#AND ( & ) operator
data[(data.mpg >
15
) & (data.cyl
=
=
8
) & (data.hp >
175
)]
So we got 4 models whose mpg is greater than 15 miles, cyl are equal to 8 and hp is greater than 175. That’s what we got and that’s amazing!!!
Select and Filter data – Conclusion
That’s why I love Pandas. It offers more than enough for your analysis and data manipulation. We got many methods for selecting and filtering data. In this story, there are many things to learn and apply in your day-to-day data operations. I hope you love this story and find it useful too.
That’s all for now. Happy Python!!!
More read: Data filtering
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK