Removing duplicate records in a CSV file using Pandas

2 years ago

source link: https://gist.github.com/AlanBinu007/ad86b575ef72532aaa6a6fc0aeb47de2
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Removing duplicate records in a CSV file using Pandas · GitHub

Instantly share code, notes, and snippets.

Removing duplicate records in a CSV file using Pandas

import pandas as pd

d = pd.read_csv('CSV_FILE.csv', keep_default_na = False) d.drop_duplicates(subset = ['COMPOSITE_KEY1', 'COMPOSITE_KEY2', 'COMPOSITE_KEY3', 'COMPOSITE_KEY4', 'COMPOSITE_KEY5', 'COMPOSITE_KEY6', 'COMPOSITE_KEY7', 'COMPOSITE_KEY8', 'COMPOSITE_KEY9', 'COMPOSITE_KEY10'], inplace = True, keep = 'first') d.to_csv('CSV_FILE_PROCESSED.csv', index = False)

Recommend

Removing duplicate records in a CSV file using Pandas

Recommend

Grant a user Amazon S3 console access to only a certain bucket

284: C# 10 Global & Implicit Usings

企业微信可以拉黑外部联系人吗？企业微信如何拉黑外部联系人？

没有了华为麒麟，继续在骁龙8 Gen 1上摆烂的高通还能躺赢多久？

Episode 022 - AWS Construct Hub, CDK V2, and CDK Watch with Danielle Kucera and...

New Totallee Hybrid MagSafe cases for iPhone 13 add a bit of brawn

AGL 239: In The Lead with Dr. Janet Polach

豆瓣、唱吧等106款软件违规收集用户信息，已被工信部下架整改

Rider 2021.3.1 and ReSharper 2021.3.1 Released

A script to remove all docker images and containers

About Joyk