6

Select and Filter Data Using Pandas in Python

 2 years ago
source link: https://www.journaldev.com/54101/select-and-filter-data-pandas-python
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

As we deal with data all day, we know how good is Pandas library for data manipulation and analysis in python. Most of our time is being spent on data cleaning and analysis.

While analyzing the data to gather key insights, you should go in and out of the data.

For this, Pandas offers many methods which will help you to manipulate the data to find patterns in it. In this story, we will be focusing on methods to select and filter data using Pandas in python. 


Select and Filter Data Operations using Pandas

  • The data select operations using pandas include accessing the data we are interested in.
  • You can make use of square brackets ” [ ] “ to access the data in particular column.
  • The other method to access the data is using loc and iloc in pandas.
  • When it comes to data filtering, you can use ” [ ] ” brackets to pass the data threshold to access it.
  • You can also make use of logical conditions to filter the data using logical operators such as AND ( & ).

Installing Pandas

First things first! We need to install and import the Pandas library which is mostly used for data manipulation and analysis.

#Install pandas
pip install pandas
#load the library
import pandas as pd

Well, we have successfully installed and imported the library. Now, let’s load the mtcars dataset.

#import the data
data = pd.read_csv('mtcars.csv')
data.head(5)
Mtcars 1
  • This dataset has 32 rows and 12 attributes.

Select Data Methods

In the data selection operations, we are going to select particular data attributes/columns. We are going to see –

  • ” [ ] ” method.
  • loc and iloc method.

1. Select the Data using ” [ ] “

If you have to access/select data from a particular attribute/column, you have to mention the attribute name in the brackets as shown below.

#Select particular column
data['model']
0               Mazda RX4
1           Mazda RX4 Wag
2              Datsun 710
3          Hornet 4 Drive
4       Hornet Sportabout
5                 Valiant
6              Duster 360
7               Merc 240D
8                Merc 230
9                Merc 280
10              Merc 280C
11             Merc 450SE
12             Merc 450SL
13            Merc 450SLC
14     Cadillac Fleetwood
15    Lincoln Continental
16      Chrysler Imperial
17               Fiat 128
18            Honda Civic
19         Toyota Corolla
20          Toyota Corona
21       Dodge Challenger
22            AMC Javelin
23             Camaro Z28
24       Pontiac Firebird
25              Fiat X1-9
26          Porsche 914-2
27           Lotus Europa
28         Ford Pantera L
29           Ferrari Dino
30          Maserati Bora
31             Volvo 142E
Name: model, dtype: object

We got the data we have selected! It’s much simple right!


2. Select the Data without brackets

Yes! you can select the data from a particular column without brackets. Observe/run the below code to understand better.

#select data without brackets
data.model.head(5)
0            Mazda RX4
1        Mazda RX4 Wag
2           Datsun 710
3       Hornet 4 Drive
4    Hornet Sportabout
Name: model, dtype: object

I have limited the rows to 5 to manage space. Because the previous output went too long :P. You can observe that even without brackets we are getting the same result.


3. Select the Data using loc

loc method in pandas is used to select a particular range of data. You can select the range of both rows and columns using loc. Let’s see how it works.

#using the loc method
data.loc[1:5, 'model']
0            Mazda RX4
1        Mazda RX4 Wag
2           Datsun 710
3       Hornet 4 Drive
4    Hornet Sportabout
Name: model, dtype: object

The loc method in pandas is a label-based method where it takes labels ( row and column name) as input and returns the relevant range as shown above.

#Single row with all columns
data.loc[5, :]
model    Valiant
mpg         18.1
cyl            6
disp         225
hp           105
drat        2.76
wt          3.46
qsec       20.22
vs             1
am             0
gear           3
carb           1
Name: 5, dtype: object

Here, I am selecting the 5th row with all the columns using loc method.


4. Select the Data using iloc

The iloc method is based on integer-based and all the inputs should be in the form of integers. You can learn more from the below examples.

#5 rows with 2 columns
data.iloc[1:5, 2:4]
cyl  disp
1   6   160.0
2   4   108.0
3   6   258.0
4   8   360.0

You can observe that the inputs are in the form of integers.

#All rows of first column
data.iloc[:, 0].head(5)
0            Mazda RX4
1        Mazda RX4 Wag
2           Datsun 710
3       Hornet 4 Drive
4    Hornet Sportabout
Name: model, dtype: object

Here, we have selected all the rows of the first column. Don’t mind that I have limited the output display to 5 to save space 😛


Filter Data Methods

Filtering the data is often useful as you can get the specific range of the data easily. To filter the data, we mostly use logical operators. Don’t worry if you didn’t get it. You will get better with examples.

  • Equal to ( == )
  • Less than ( < ) and Greater than ( > )
  • AND ( & ) operators.

We will be using the above three methods to filter the data.

1. Equal To ( == )

You can filter the data using the equal to operator as shown below.

#data equal to model Datsun 710
data[data.model == 'Datsun 710']
model      mpg    cyl  disp    hp  drat    wt  qsec    vs  am  gear carb
2   Datsun 710  22.8    4   108.0   93  3.85    2.32    18.61   1   1   4   1

That’s cool! We have got all the information about a particular model – ‘ Datsun 710’


2. Less than ( < ) and Greater than ( > )

We can use the less than and greater than conditions to filter the data as shown in the below example.

#Less than operator
data[data.mpg < 15]
Select and Filter data

Well, we have 5 models whose mpg is less than 15 miles per gallon.

#Greater than
data[data.hp > 175]
Select and Filter data

We got 10 models whose HP is greater than 175. You can observe that our index is shuffled. So, you can use the reset_index function to reset the index. The code for resetting index –

#reset index
data[data.hp > 175].reset_index()

AND ( & ) Filter Method

The AND logical operator is more useful when you are using multiple filters. Suppose if you want to use less than and greater than filters at once, you can make use of AND ( & ) operator to facilitate the cause.

#AND ( & ) operator
data[(data.mpg > 15) & (data.cyl == 8) & (data.hp > 175)]
Select and Filter data

So we got 4 models whose mpg is greater than 15 miles, cyl are equal to 8 and hp is greater than 175. That’s what we got and that’s amazing!!!


Select and Filter data – Conclusion

That’s why I love Pandas. It offers more than enough for your analysis and data manipulation. We got many methods for selecting and filtering data. In this story, there are many things to learn and apply in your day-to-day data operations. I hope you love this story and find it useful too.

That’s all for now. Happy Python!!!

More read: Data filtering


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK