Be careful when you use “isin()” method in Pandas
source link: http://www.donghao.org/2021/04/09/be-careful-when-you-use-isin-method-in-pandas/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Be careful when you use “isin()” method in Pandas
import pandas as pd df_excl = pd.DataFrame({"id": ["12345"]}) df = pd.DataFrame({"id": ["12345", "67890"]}) result = df[~df.id.isin(df_excl[["id"]])] print(result)
xxxxxxxxxx
import pandas as pd
df_excl = pd.DataFrame({"id": ["12345"]})
df = pd.DataFrame({"id": ["12345", "67890"]})
result = df[~df.id.isin(df_excl[["id"]])]
print(result)
Guess what’s the result of above snippet? Just a dataframe with “67890”? No, the result is
id 0 12345 1 67890
xxxxxxxxxx
id
0 12345
1 67890
Why the “12345” has not been excluded? The reason is quite tricky: df_excl[["id"]]
is a DataFrame but what we need in isin()
is Series! So we shouldn’t use [[]]
here, but []
The correct code should use df_excl["id"]
, as below:
... result = df[~df.id.isin(df_excl["id"])] print(result)
xxxxxxxxxx
...
result = df[~df.id.isin(df_excl["id"])]
print(result)
Like this:
pandas
Leave a comment
Leave a Reply Cancel reply
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK