1

Ranking Metrics · GitHub

 1 year ago
source link: https://gist.github.com/bwhite/3726239
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

teh commented on Jul 27, 2015

Hey, this is a nice implementation for those metrics! Would you mind clarifying the license by sticking a license header on top? Thanks in any case :)

great work!

nicely done!

nice work

kn45 commented on Oct 31, 2016

great!! and im looking for pairwise AUC

Thanks, nice work!

But NDCG for recommender systems evaluation needs to account whole list of user ratings i.e. ratings that where not included in recommendation delivery. My implementation of recsys NDCG.

peustr commented on May 24, 2017

edited

Here's my small contribution; nDCG as defined by Kaggle: https://www.kaggle.com/wiki/NormalizedDiscountedCumulativeGain
The only difference is: r → 2r - 1

import numpy as np


def dcg_at_k(r, k):
    r = np.asfarray(r)[:k]
    if r.size:
        return np.sum(np.subtract(np.power(2, r), 1) / np.log2(np.arange(2, r.size + 2)))
    return 0.


def ndcg_at_k(r, k):
    idcg = dcg_at_k(sorted(r, reverse=True), k)
    if not idcg:
        return 0.
    return dcg_at_k(r, k) / idcg

Hey I am a beginner and I was trying to find NDCG score for my similarity matrix for 5 iterations.But the error coming is" the truth value of an array with more than one element is ambiguous. Use a.any() or a.all()" .Could somebody help me out by what this means?Thanks a ton

I agree with Kell18. That does not only hold for recommender systems but also for IR: the ideal ranking should be the ranking of all judged items in the collection for the query. Not only the retrieved ones.

Can you give me an idea of how to use your function if I have a vector of binary (ground truth) labels and then an output from an ALS model, for example: [ 1.09253478e-01 1.97033856e-01 5.51080274e-01 ..., 1.77992064e-03 1.90066773e-12 1.74711004e-04]

When evaluation my model using AUC, I can just feed in the binary ground truth vector and the output from my ALS model as the predicted scores as is, but I am wondering how this would work with your model if I am considering, for example, k=10 recommendations and would like to use NDCG to evaluate the output.

The reference URL in the comments for ndcg_at_k does not work. I believe this is the current URL for the referenced document: http://web.stanford.edu/class/cs276/handouts/EvaluationNew-handout-6-per.pdf

+1 to kell18 and suzanv. quote from wiki on ideal DCG:

sorting all relevant documents in the corpus by their relative relevance, producing the maximum possible DCG through position p, also called Ideal DCG (IDCG) through that position.

how about recall@K ?

lucidyan commented on Mar 3, 2019

edited

@bwhite
I think you have an error in the average_precision metric.

So for example:

from rank_metrics import average_precision

relevance_list = [[1, 1, 1], [1, 1, 0], [1, 0, 0]]

for r in relevance_list:
    print(average_precision(r))

Will print:

1.0
1.0
1.0

Instead of:

1.
0.6666666666666666
0.3333333333333333

So in the metric's return you should replace np.mean(out) with np.sum(out) / len(r)

there is bug in dcg_at_k

print(ndcg_at_k([14, 2, 0, 0], 5)) # output would be 1
print(ndcg_at_k([2, 14, 0, 0], 5)) #output would be 1

correct dcg_at_k would be

def dcg_at_k(r, k, method=0):
    r = np.asfarray(r)[:k]
    if r.size:
        if method == 0:
            return r[0] + np.sum(r[1:] / np.log2(np.arange(3, r.size + 2))) ### fix here
        elif method == 1:
            return np.sum(r / np.log2(np.arange(2, r.size + 2)))
        else:
            raise ValueError('method must be 0 or 1.')
    return 0.

@bwhite
I think you have an error in the average_precision metric.

So for example:

from rank_metrics import average_precision

relevance_list = [[1, 1, 1], [1, 1, 0], [1, 0, 0]]

for r in relevance_list:
    print(average_precision(r))

Will print:

1.0
1.0
1.0

Instead of:

1.
0.6666666666666666
0.3333333333333333

So in the metric's return you should replace np.mean(out) with np.sum(out) / len(r)

@bwhite
I think you have an error in the average_precision metric.

So for example:

from rank_metrics import average_precision

relevance_list = [[1, 1, 1], [1, 1, 0], [1, 0, 0]]

for r in relevance_list:
    print(average_precision(r))

Will print:

1.0
1.0
1.0

Instead of:

1.
0.6666666666666666
0.3333333333333333

So in the metric's return you should replace np.mean(out) with np.sum(out) / len(r)

@lucidyan , @cuteapi

The code is correct if you assume that the ranking list contains all the relevant documents that need to be retrieved. In your example, the query with ranking list r=[1,0,0] retrieves 3 documents, but only one is relevant, which is in the top position, so your Average Precision is 1.0. Note that Mean Average Precision assumes that each query is independent of each other, and in your example, there is no reason to believe that every query has to retrieve always 3 relevant documents. For your example r=[1,1,0] and r=[1,0,0] the relevant documents are 2 and 1 respectively, because the code assumes that the total number of 1's in your list is the total number of relevant documents (there are supposed to be no misses in the ranked list).

This is a strong assumption in this code, but it does not make the implementation incorrect. You need to be aware that the ranking list that you pass has to contain all the positions where relevant documents appear. If that is not the case, you need to use another implementation that takes into account recall, which is the missing piece in this code.

is IDCG calculated across all the queries or IDCG for each query for calculating NDCG?

Has anyone made this into a pypi package? If not @bwhite, would you mind if I went ahead and made it into one? I'd of course accredit you. I'm just tired of copying and pasting this code because it is super useful haha

Made this into a cute little pypi package if anyone is interested: https://github.com/ncoop57/cute_ranking


Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK