5

Removing duplicate records in a CSV file using Pandas

 2 years ago
source link: https://gist.github.com/AlanBinu007/ad86b575ef72532aaa6a6fc0aeb47de2
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Removing duplicate records in a CSV file using Pandas · GitHub

Instantly share code, notes, and snippets.

Removing duplicate records in a CSV file using Pandas

import pandas as pd

d = pd.read_csv('CSV_FILE.csv', keep_default_na = False) d.drop_duplicates(subset = ['COMPOSITE_KEY1', 'COMPOSITE_KEY2', 'COMPOSITE_KEY3', 'COMPOSITE_KEY4', 'COMPOSITE_KEY5', 'COMPOSITE_KEY6', 'COMPOSITE_KEY7', 'COMPOSITE_KEY8', 'COMPOSITE_KEY9', 'COMPOSITE_KEY10'], inplace = True, keep = 'first') d.to_csv('CSV_FILE_PROCESSED.csv', index = False)


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK