Calculating the percentile for specific groups

advertisements

I have 3 columns. Product Id, Price, Group (values A, B, C, D)

I want to get price percentile for each group and I am running the following code.

for group, price in df.groupby(['group']):
    df['percentile'] = np.percentile(df['price'],60)

the column percentile has only one value 3.44 for each group. The expected values for each group were 2.12, 3.43, 3.65, 4.76. 8.99.

What is going wrong here, please let me know.

I think you can use in loop not all DataFrame df with column price, but group price with column price:

import pandas as pd
import numpy as np

np.random.seed(1)
df = pd.DataFrame(np.random.randint(10, size=(5,3)))
df.columns = ['Product Id','group','price']
print df
   Product Id  group  price
0           5      8      9
1           5      0      0
2           1      7      6
3           9      2      4
4           5      2      4

for group, price in df.groupby(['group']):
    print np.percentile(df['price'],60)
4.8
4.8
4.8
4.8
group   

for group, price in df.groupby(['group']):
    print np.percentile(price['price'],60)
0.0
4.0
6.0
9.0

Another solution for np.percentile where is output Serie:

print df.groupby(['group'])['price'].apply(lambda x: np.percentile(x,60))
group
0    0.0
2    4.0
7    6.0
8    9.0
Name: price, dtype: float64

Solution with DataFrameGroupBy.quantile:

print df.groupby(['group'])['price'].quantile(.6)
group
0    0.0
2    4.0
7    6.0
8    9.0
Name: price, dtype: float64

EDIT by comment:

If you need new column use transform, docs:

>>> np.random.seed(1)
>>> df = pd.DataFrame(np.random.randint(10,size=(20,3)))
>>> df.columns = ['Product Id','group','price']
>>> df
    Product Id  group  price
0            5      8      9
1            5      0      0
2            1      7      6
3            9      2      4
4            5      2      4
5            2      4      7
6            7      9      1
7            7      0      6
8            9      9      7
9            6      9      1
10           0      1      8
11           8      3      9
12           8      7      3
13           6      5      1
14           9      3      4
15           8      1      4
16           0      3      9
17           2      0      4
18           9      2      7
19           7      9      8
>>> df['percentil'] = df.groupby(['group'])['price'].transform(lambda x: x.quantile(.6))

>>> df
    Product Id  group  price  percentil
0            5      8      9        9.0
1            5      0      0        4.4
2            1      7      6        4.8
3            9      2      4        4.6
4            5      2      4        4.6
5            2      4      7        7.0
6            7      9      1        5.8
7            7      0      6        4.4
8            9      9      7        5.8
9            6      9      1        5.8
10           0      1      8        6.4
11           8      3      9        9.0
12           8      7      3        4.8
13           6      5      1        1.0
14           9      3      4        9.0
15           8      1      4        6.4
16           0      3      9        9.0
17           2      0      4        4.4
18           9      2      7        4.6
19           7      9      8        5.8

Calculating the percentile for specific groups

Calculating the percentile for specific groups

Recommend

KnockoutJS - if binding performance

How to check if a short wordpress $ atts is equal to something

How to get the Excel date using the PHPExcel library

JQuery - select all & lt; Span & gt; Elements and delete their text

Learn C / C ++ and Java

未找到页面 | AlloyTeam

Node 嵌入式数据库——NeDB

图文理解 CSS3 3D Transform

基于 Javascript 和 HTML5 的 Flash Player 播放器 (Mozilla)

Home Alone: a Post-Incident Review

About Joyk