8

Pandas For Beginners — Combining Dataframes — Part 2

 9 months ago
source link: https://ujjwal-dalmia.medium.com/pandas-for-beginners-combining-dataframes-part-2-dd4470306386
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Pandas For Beginners — Combining Dataframes — Part 2

Replicating SQL type joins using Pandas.

0*QryKyMABouQrkN6C

Photo by Janita Sumeiko on Unsplash

In the previous tutorial, we looked at the append and concat functions for combining dataframes. A limitation of using these functions is, they allow concatenation operations only based on row indexes or column names. If one has to join two dataframes on a specific column (the way we typically do in databases), these functions fall flat. To overcome this challenge, Pandas offers simple yet powerful functionality. In this tutorial, we will go through this function and learn to implement SQL like joins.

Assumption and Recommendation

Being hands-on is the key to master programming. We recommend that you continue to implement the codes as you follow through with the tutorial. The sample data and the associated Jupiter notebook is available in the Scenario_15 folder of this GitHub link.

If you are new to GitHub and want to learn it, please go through this tutorial. To set up a new Python environment on your system, please go through this tutorial.

Following is the list of Python concepts and pandas functions/ methods used in the tutorial:

Pandas functions

  • read_csv
  • merge

Getting Started

Step 1 — Keeping the data ready

For this tutorial, we are using slightly modified data files we had used in the previous one. This time, the first population file contains details on ten countries from the year 2010 to 2014. The second file has the population for the same countries but the period starting 2015 to 2019. In the third file, we have the GDP growth rate from the period starting 2010 to 2019 but only for eight of these ten countries. A slight variation to the GDP file is that the name of the column containing countries is Country_code. The dictionary of these data sets and the sample data snapshot is as follows:

  • Country — Name of the country
  • 2010… 2014 — Population & GDP from the year 2010 to 2014
  • 2015… 2019 — Population & GDP from the year 2015 to 2019

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK