4

Understand map() function to manipulate pandas Series

 4 years ago
source link: https://mc.ai/understand-map-function-to-manipulate-pandas-series/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Data Conversion (arg Argument)

The fun part of the map() function is mostly about how we can paly with the arg argument. Specifically, the arg argument gives the instruction to the function on how to map the existing data to the new data. This argument can be set either as a function or as a dictionary. Let’s see how each works.

Using a Lambda Function

Actually, in the above example about the na_action option, you saw that we used a lambda function to map the data. If you don’t know what a Python lambda function is, please refer to my previous article on this topic.

Briefly, lambda functions are known as anonymous functions and their basic syntax is lambda arguments: expression . Specifically, the declaration is signaled by the lambda keyword, followed by the list of arguments (0 or more), and the expression specifying what operations are to be performed on these arguments.

To give you a more practical use case about lambda functions, let’s consider the following example. In real-life data, we may have percentage data expressed as strings. We can write a lambda function to convert the string data to numeric data, a more analysis-friendly format.

Lambda Function in map()

In the example above, we take the value of the original data and use the slicing technique to get a substring. The relevant knowledge point is that the last character in a string has an index of -1. So the expression x[:-1] will create a string from the original string’s start to the last character (non-inclusive).

Using a Regular Function

Besides the use of lambda functions, we can also use built-in or custom-defined functions. These functions need to be capable of mapping each of the data points in the Series, otherwise, an error will occur.

Built-in and Custom Functions in map()

In the diagram above, we first created a Series that consists of a list of names. The first task is to create the initials for these people. To do that, we write a function called initials() . This function takes in the value of the Series. We use the list comprehension technique that creates a list of the initials from each name and join these letters to create the final output as the mapped value. As you can see, after mapping, the new Series has all people’s initials.

Besides using the custom functions, we can also use some built-in functions to generate a new Series. For example, in the code above, we can simply use the len() function, which will calculate the length of the names (spaces are included) in the new Series.

Using a Dictionary

In most cases, we should be able to set a function as above to the arg argument to fulfill our needs. However, we can also use a dictionary as the mapping function, although we don’t use it very frequently. Here’s a trivial example of this usage.

Dictionary in map()

As shown above, we created a Series with integers from 0 to 4. We then created a dictionary that has keys from 1 to 4, and this dictionary was used as the arg argument for the map() function call.

One thing to note is that if the values are not found in the dictionary, the mapped values will be NaN . However, we can use the defaultdict data type instead of the dict data type (for more information on the defaultdict , please refer to my previous article ). In this case, the NaN values will be replaced with the default values generated from the defaultdict object. See the following example for this feature.

Defaultdict in map()

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK