2

Pandas Dataframe.loc[] – thisPointer

 2 years ago
source link: https://thispointer.com/pandas-dataframe-loc/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Pandas Dataframe.loc[] – thisPointer Skip to content

In this article, we will discuss how to use the loc property of the Dataframe with examples.

In Pandas, the Dataframe provides a property loc[], to select the subset of Dataframe based on row and column names/labels. We can choose single or multiple rows & columns using it. Let’s learn more about it,

Syntax:

Dataframe.loc[row_segment , column_segment]
Dataframe.loc[row_segment]
Dataframe.loc[row_segment , column_segment]
Dataframe.loc[row_segment]

The column_segment argument is optional. Therefore, if column_segment is not provided, loc [] will select the subset of Dataframe based on row_segment argument only.

Arguments:

  • row_segement:
    • It contains information about the rows to be selected. Its value can be,
      • A single label like ‘A’ or 7 etc.
        • In this case, it selects the single row with given label name.
        • For example, if ‘B’ only is given, then only the row with label ‘B’ is selected from Dataframe.
      • A list/array of label names like, [‘B’, ‘E’, ‘H’]
        • In this case, multiple rows will be selected based on row labels given in the list.
        • For example, if [‘B’, ‘E’, ‘H’] is given as argument in row segment, then the rows with label name ‘B’, ‘E’ and ‘H’ will be selected.
      • A slice object with ints like -> a:e .
        • This case will select multiple rows i.e. from row with label a to one before the row with label e.
        • For example, if ‘B’:’E’ is provided in the row segment of loc[], it will select a range of rows from label ‘B’ to one before label ‘E’
        • For selecting all rows, provide the value ( : )
      • A boolean sequence of same size as number of rows.
        • In this case, it will select only those rows for which the corresponding value in boolean array/list is True.
      • A callable function :
        • It can be a lambda function or general function, which accepts the calling dataframe as an argument and returns valid label names in any one of the formats mentioned above.
  • column_segement:
    • It is optional.
    • It contains information about the columns to be selected. Its value can be,
      • A single label like ‘A’ or 7 etc.
        • In this case, it selects the single column with given label name.
        • For example, if ‘Age’ only is given, then only the column with label ‘Age’ is selected from Dataframe.
      • A list/array of label names like, [‘Name’, ‘Age’, ‘City’]
        • In this case, multiple columns will be selected based on column labels given in the list.
        • For example, if [‘Name’, ‘Age’, ‘City’] is given as argument in column segment, then the columns with label names ‘Name’, ‘Age’, and ‘City’ will be selected.
      • A slice object with ints like -> a:e .
        • This case will select multiple columns i.e. from column with label a to one before the column with label e.
        • For example, if ‘Name’:’City’ is provided in the column segment of loc[], it will select a range of columns from label ‘Name’ to one before label ‘City’
        • For selecting all columns, provide the value ( : )
      • A boolean sequence of same size as number of columns.
        • In this case, it will select only those columns for which the corresponding value in boolean array/list is True.
      • A callable function :
        • It can be a lambda function or general function that accepts the calling dataframe as an argument and returns valid label names in any one of the formats mentioned above.

Returns :

It returns a reference to the selected subset of the dataframe based on the provided row and column names.
Also, if column_segment is not provided, it returns the subset of the Dataframe containing only selected rows based on the row_segment argument.

Error scenarios:

Dataframe.loc[row_sgement, column_segement] will give KeyError, if any label name provided is invalid.

Let’s understand more about it with some examples,

Pandas Dataframe.loc[] – Examples

We have divided examples in three parts i.e.

Let’s look at these examples one by one. But before that we will create a Dataframe from list of tuples,

import pandas as pd
# List of Tuples
students = [('jack', 34, 'Sydeny', 'Australia'),
('Riti', 30, 'Delhi', 'India'),
('Vikas', 31, 'Mumbai', 'India'),
('Neelu', 32, 'Bangalore', 'India'),
('John', 16, 'New York', 'US'),
('Mike', 17, 'las vegas', 'US')]
# Create a DataFrame from list of tuples
df = pd.DataFrame( students,
columns=['Name', 'Age', 'City', 'Country'],
index=['a', 'b', 'c', 'd', 'e', 'f'])
print(df)
import pandas as pd

# List of Tuples
students = [('jack',  34, 'Sydeny',    'Australia'),
            ('Riti',  30, 'Delhi',     'India'),
            ('Vikas', 31, 'Mumbai',    'India'),
            ('Neelu', 32, 'Bangalore', 'India'),
            ('John',  16, 'New York',   'US'),
            ('Mike',  17, 'las vegas',  'US')]

# Create a DataFrame from list of tuples
df = pd.DataFrame( students,
                   columns=['Name', 'Age', 'City', 'Country'],
                   index=['a', 'b', 'c', 'd', 'e', 'f'])

print(df)

Output:

Name Age City Country
a jack 34 Sydeny Australia
b Riti 30 Delhi India
c Vikas 31 Mumbai India
d Neelu 32 Bangalore India
e John 16 New York US
f Mike 17 las vegas US
    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

Select a few rows from Dataframe

Here we will provide only row segment argument to the Dataframe.loc[]. Therefore it will select rows based on given names and all columns.

Select a single row of Dataframe

To select a row from the dataframe, pass the row name to the loc[]. For example,

# Select row at with label name 'c'
row = df.loc['c']
print(row)
# Select row at with label name 'c'
row = df.loc['c']

print(row)

Output:

Name Vikas
Age 31
City Mumbai
Country India
Name: c, dtype: object
Name        Vikas
Age            31
City       Mumbai
Country     India
Name: c, dtype: object

It returned the row with label name ‘c’ from the Dataframe, as a Series object.

Select multiple rows from Dataframe based on list of names

Pass a list of row label names to the row_segment of loc[]. It will return a subset of the Dataframe containing only mentioned rows. For example,

# Select multiple rows from Dataframe by label names
subsetDf = df.loc[ ['c', 'f', 'a'] ]
print(subsetDf)
# Select multiple rows from Dataframe by label names
subsetDf = df.loc[ ['c', 'f', 'a'] ]

print(subsetDf)

Output:

Name Age City Country
c Vikas 31 Mumbai India
f Mike 17 las vegas US
a jack 34 Sydeny Australia
    Name  Age       City    Country
c  Vikas   31     Mumbai      India
f   Mike   17  las vegas         US
a   jack   34     Sydeny  Australia

It returned a subset of the Dataframe containing only three rows with labels ‘c’, ‘f’ and ‘a’.

Select multiple rows from Dataframe based on name range

Pass an name range -> start:end in row segment of loc. It will return a subset of the Dataframe containing only the rows from name start to end from the original dataframe. For example,

# Select rows of Dataframe based on row label range
subsetDf = df.loc[ 'b' : 'f' ]
print(subsetDf)
# Select rows of Dataframe based on row label range
subsetDf = df.loc[ 'b' : 'f' ]

print(subsetDf)

Output:

Name Age City Country
b Riti 30 Delhi India
c Vikas 31 Mumbai India
d Neelu 32 Bangalore India
e John 16 New York US
f Mike 17 las vegas US
    Name  Age       City Country
b   Riti   30      Delhi   India
c  Vikas   31     Mumbai   India
d  Neelu   32  Bangalore   India
e   John   16   New York      US
f   Mike   17  las vegas      US

It returned a subset of the Dataframe containing only five rows from the original dataframe i.e. rows from label ‘b’ to label ‘f’.

Select rows of Dataframe based on bool array

Pass a boolean array/list in the row segment of loc[]. It will return a subset of the Dataframe containing only the rows for which the corresponding value in the boolean array/list is True. For example,

# Select rows of Dataframe based on bool array
subsetDf = df.loc[ [True, False, True, False, True, False] ]
print(subsetDf)
# Select rows of Dataframe based on bool array
subsetDf = df.loc[ [True, False, True, False, True, False] ]

print(subsetDf)

Output:

Name Age City Country
a jack 34 Sydeny Australia
c Vikas 31 Mumbai India
e John 16 New York US
    Name  Age      City    Country
a   jack   34    Sydeny  Australia
c  Vikas   31    Mumbai      India
e   John   16  New York         US

Select rows of Dataframe based on Callable function

Create a lambda function that accepts a dataframe as an argument, applies a condition on a column, and returns a bool list. This bool list will contain True only for those rows where the condition is True. Pass that lambda function to loc[] and returns only those rows will be selected for which condition returns True in the list.

For example, select only those rows where column ‘Age’ has a value of more than 25,

# Select rows of Dataframe based on callable function
subsetDf = df.loc[ lambda x : (x['Age'] > 25).tolist() ]
print(subsetDf)
# Select rows of Dataframe based on callable function
subsetDf = df.loc[ lambda x : (x['Age'] > 25).tolist() ]

print(subsetDf)

Output:

Name Age City Country
a jack 34 Sydeny Australia
b Riti 30 Delhi India
c Vikas 31 Mumbai India
d Neelu 32 Bangalore India
    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India

Select a few Columns from Dataframe

Here we will provide the (:) in the row segment argument of the Dataframe.loc[]. Therefore it will select all rows, but only a few columns based on the names provided in column_segement.

Select a single column of Dataframe

To select a column from the dataframe, pass the column name to the loc[]. For example,

# Select single column from Dataframe by column name
column = df.loc[:, 'Age']
print(column)
# Select single column from Dataframe by column name
column = df.loc[:, 'Age']

print(column)

Output:

Name: Age, dtype: int64
a    34
b    30
c    31
d    32
e    16
f    17
Name: Age, dtype: int64

It returned the column ‘Age’ from Dataframe, as a Series object.

Select multiple columns from Dataframe based on list of names

Pass a list of column names to the column_segment of loc[]. It will return a subset of the Dataframe containing only mentioned columns. For example,

# Select multiple columns from Dataframe based on list of names
subsetDf = df.loc[:, ['Age', 'City', 'Name']]
print(subsetDf)
# Select multiple columns from Dataframe based on list of names
subsetDf = df.loc[:, ['Age', 'City', 'Name']]

print(subsetDf)

Output:

Age City Name
a 34 Sydeny jack
b 30 Delhi Riti
c 31 Mumbai Vikas
d 32 Bangalore Neelu
e 16 New York John
f 17 las vegas Mike
   Age       City   Name
a   34     Sydeny   jack
b   30      Delhi   Riti
c   31     Mumbai  Vikas
d   32  Bangalore  Neelu
e   16   New York   John
f   17  las vegas   Mike

It returned a subset of the Dataframe containing only three columns.

Select multiple columns from Dataframe based on name range

Pass an name range -> start:end in column segment of loc. It will return a subset of the Dataframe containing only the columns from name start to end, from the original dataframe. For example,

# Select multiple columns from Dataframe by name range
subsetDf = df.loc[:, 'Name' : 'City']
print(subsetDf)
# Select multiple columns from Dataframe by name range
subsetDf = df.loc[:, 'Name' : 'City']

print(subsetDf)

Output:

Name Age City
a jack 34 Sydeny
b Riti 30 Delhi
c Vikas 31 Mumbai
d Neelu 32 Bangalore
e John 16 New York
f Mike 17 las vegas
    Name  Age       City
a   jack   34     Sydeny
b   Riti   30      Delhi
c  Vikas   31     Mumbai
d  Neelu   32  Bangalore
e   John   16   New York
f   Mike   17  las vegas

It returned a subset of the Dataframe containing only three columns, i.e., ‘Name’ to ‘City’.

Select columns of Dataframe based on bool array

Pass a boolean array/list in the column segment of loc[]. It will return a subset of the Dataframe containing only the columns for which the corresponding value in the boolean array/list is True. For example,

# Select columns of Dataframe based on bool array
subsetDf = df.iloc[:, [True, True, False, False]]
print(subsetDf)
# Select columns of Dataframe based on bool array
subsetDf = df.iloc[:, [True, True, False, False]]

print(subsetDf)

Output:

Name Age
a jack 34
b Riti 30
c Vikas 31
d Neelu 32
e John 16
f Mike 17
    Name  Age
a   jack   34
b   Riti   30
c  Vikas   31
d  Neelu   32
e   John   16
f   Mike   17

Select a subset of Dataframe

Here we will provide the row and column segment arguments of the Dataframe.loc[]. It will return a subset of Dataframe based on the row and column names provided in row and column segments of loc[].

Select a Cell value from Dataframe

To select a single cell value from the dataframe, just pass the row and column name in the row and column segment of loc[]. For example,

# Select a Cell value from Dataframe by row and column name
cellValue = df.loc['c','Name']
print(cellValue)
# Select a Cell value from Dataframe by row and column name
cellValue = df.loc['c','Name']

print(cellValue)

Output:

Vikas
Vikas

It returned the cell value at (‘c’,’Name’).

Select subset of Dataframe based on row/column names in list

Select a subset of the dataframe. This subset should include the following rows and columns,

  • Rows with names ‘b’, ‘d’ and ‘f’
  • Columns with name ‘Name’ and ‘City’
# Select sub set of Dataframe based on row/column indices in list
subsetDf = df.loc[['b', 'd', 'f'],['Name', 'City']]
print(subsetDf)
# Select sub set of Dataframe based on row/column indices in list
subsetDf = df.loc[['b', 'd', 'f'],['Name', 'City']]

print(subsetDf)

Output:

Name City
b Riti Delhi
d Neelu Bangalore
f Mike las vegas
    Name       City
b   Riti      Delhi
d  Neelu  Bangalore
f   Mike  las vegas

It returned a subset from the calling dataframe object.

Select subset of Dataframe based on row/column name range

Select a subset of the dataframe. This subset should include the following rows and columns,

  • Rows from name ‘b’ to ‘e’
  • Columns from name ‘Name’ to ‘City’
# Select subset of Dataframe based on row and column label name range.
subsetDf = df.loc['b':'e', 'Name':'City']
print(subsetDf)
# Select subset of Dataframe based on row and column label name range.
subsetDf = df.loc['b':'e', 'Name':'City']

print(subsetDf)

Output:

Name Age City
b Riti 30 Delhi
c Vikas 31 Mumbai
d Neelu 32 Bangalore
e John 16 New York
    Name  Age       City
b   Riti   30      Delhi
c  Vikas   31     Mumbai
d  Neelu   32  Bangalore
e   John   16   New York

It returned a subset from the calling dataframe object.

Pro Tip: Changing the values of Dataframe using loc[]

loc[] returns a view object, so any changes made in the returned subset will be reflected in the original Dataframe object. For example, let’s select the row with label ‘c’ from the dataframe using loc[] and change its content,

print(df)
# Change the contents of row 'C' to 0
df.loc['c'] = 0
print(df)
print(df)

# Change the contents of row 'C' to 0
df.loc['c'] = 0

print(df)

Output:

Name Age City Country
a jack 34 Sydeny Australia
b Riti 30 Delhi India
c Vikas 31 Mumbai India
d Neelu 32 Bangalore India
e John 16 New York US
f Mike 17 las vegas US
Name Age City Country
a jack 34 Sydeny Australia
b Riti 30 Delhi India
c 0 0 0 0
d Neelu 32 Bangalore India
e John 16 New York US
f Mike 17 las vegas US
    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c  Vikas   31     Mumbai      India
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US


    Name  Age       City    Country
a   jack   34     Sydeny  Australia
b   Riti   30      Delhi      India
c      0    0          0          0
d  Neelu   32  Bangalore      India
e   John   16   New York         US
f   Mike   17  las vegas         US

Changes made to view object returned by loc[], will also change the content of the original dataframe.

Summary:

We learned about how to use the Dataframe.loc[] with several examples.

Advertisements


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK