4

Your Friendly Neighborhood — idxmax

 9 months ago
source link: https://ujjwal-dalmia.medium.com/your-friendly-neighbourhood-idxmax-9aa824d58eb
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Your Friendly Neighborhood — idxmax

Pandas solution to a common data wrangling challenge

0*GKOVd684cGjxDNol

Photo by Polina Razorilova on Unsplash

When working on a data science project, we spend more than 70% of our time adjusting data to our needs. While munging data, we encounter many scenarios for which identifying solutions can sometimes get tricky. One of these is to identify the column containing maximum value among others. The image presented below will help in better understanding the problem statement:

1*MmHUeccr3gDw3bDg-4snLg.png

Sample Scenario (Image by Author)

In this scenario, we have a dataframe containing the monthly expenses incurred by a family. The problem at hand is to identify the expense category where the family has incurred maximum monthly expenditure. To handle scenarios like these, Pandas offer a ready-made dataframe function idxmax. A step by step approach to implement this solution is detailed below:

Assumption and Recommendation

Being hands-on is the key to master programming. We recommend that you continue to implement the codes as you follow through with the tutorial. The sample data and the associated Jupiter notebook is available in the Scenario_9 folder of this GitHub link.

If you are new to GitHub and want to learn it, please go through this tutorial. To set up a new Python environment on your system, please go through this tutorial.

Following is the list of Python concepts and pandas functions/ methods used in the tutorial:

Pandas functions

  • read_csv
  • idxmax

Solution

Step 1 — Keeping the data ready

For this tutorial, we have created a dummy dataset containing the monthly expenses across different expense categories. The dictionary for this data set and the sample data snapshot is as follows:

  • Year — Calendar Year
  • Month— Month of the year
  • Grocery— Expenses incurred by the family to buy grocery
  • Travel— Expenses incurred by the family on travel

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK