4

Jaccard Similarity and Distance in Python

 2 years ago
source link: https://www.journaldev.com/58039/jaccard-similarity-distance-python
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

In this tutorial, we will explore how to calculate the Jaccard similarity and Jaccard distance in Python. Let us start off by understanding what the two terms mean and how do we compute them.


What is Jaccard Similarity and Distance?

Jaccard Similarity is a popular proximity measurement that determines the similarity of two items, such as two text texts. If we have two sets A and B, the formula below helps to compute the similarity (or index) between the two sets:

Jaccard Similarity Formula

The Jaccard distance, as opposed to the Jaccard similarity (Jaccard index), is a measure of dissimilarity between two sets. The distance is calculated mathematically as the ratio of the difference between set union and set intersection over the set union. Then their distance is calculated as follows:

Jaccard Distance Formula

Code Implementation in Python

Now that we know what both the terms mean and we also have the formulas for both the similarity index and distance. We can move to code implementation for both using the Python programming language.

Take User Input for both the sets

We will make sure the user has control over the input and they enter the values for the two sets. The same happens using the code below.

S1 = set(map(int,input("Enter elements of set 1: ").split()))
S2 = set(map(int,input("Enter elements of set 2: ").split()))
print("The two sets are : \n",S1,"\n",S2)

Computing the Jaccard Similarity and Distance

As the next step we will construct a function that takes both the input sets as parameters and then computes the similarity and distance using set operations and returns both the values:

def jaccard_similarity_n_distance(A, B):
# Compute Jaccard Similarity
nominator = A.intersection(B)
denominator = A.union(B)
Jacc_similarity = len(nominator)/len(denominator)
# Compute Jaccard Distance
nominator = A.symmetric_difference(B)
denominator = A.union(B)
Jacc_distance = len(nominator)/len(denominator)
return (Jacc_similarity,Jacc_distance)
Result = jaccard_similarity_n_distance(S1,S2)
print("Jaccard Similarity : ",Result[0])
print("Jaccard Distance : ",Result[1])

The Complete Code for Jaccard Similarity and Distance

S1 = set(map(int,input("Enter elements of set 1: ").split()))
S2 = set(map(int,input("Enter elements of set 2: ").split()))
print("The two sets are : \n",S1,"\n",S2)
def jaccard_similarity_n_distance(A, B):
# Compute Jaccard Similarity
nominator = A.intersection(B)
denominator = A.union(B)
Jacc_similarity = len(nominator)/len(denominator)
# Compute Jaccard Distance
nominator = A.symmetric_difference(B)
denominator = A.union(B)
Jacc_distance = len(nominator)/len(denominator)
return (Jacc_similarity,Jacc_distance)
Result = jaccard_similarity_n_distance(S1,S2)
print()
print("Jaccard Similarity : ",Result[0])
print("Jaccard Distance : ",Result[1])

Some Sample Outputs

Now that the code implementation is complete, we will look at some sample outputs below.

Enter elements of set 1: 3 5 2 1
Enter elements of set 2: 5 3 2 6
The two sets are :
{1, 2, 3, 5}
{2, 3, 5, 6}
Jaccard Similarity :  0.6
Jaccard Distance :  0.4
Enter elements of set 1: 5 3 4 7
Enter elements of set 2: 6 3 1 6
The two sets are :
{3, 4, 5, 7}
{1, 3, 6}
Jaccard Similarity :  0.16666666666666666
Jaccard Distance :  0.8333333333333334

Conclusion

We looked at Jaccard similarity (index) and Jaccard distance, as well as how to compute them in Python. If you have any questions or recommendations, please post them in the comments section below.

Thank you for reading!

I recommend you to read the following tutorials as well:



About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK