Jaccard指数

Jaccard index & Jaccard distanced


  • 定义:
    The Jaccard index, also known as Intersection over Union and the Jaccard similarity coefficient (originally coined coefficient de communauté by Paul Jaccard), is a statistic used for comparing the similarity and diversity of sample sets. The Jaccard coefficient measures similarity between finite sample sets, and is defined as the size of the intersection divided by the size of the union of the sample sets:

Jaccard index:Jaccard指数就是两个data sets的交集(intersection)除以两个data sets的并集(union)。 所以Jaccard指数应该在(0,1)。0,完全不同,1,完全相同
特殊情况:当连个data sets都是空集时候,定义这两个data sets 的Jaccard指数是1。

Jaccard指数
Screen Shot 2017-10-17 at 14.00.31.png

The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from 1, or, equivalently, by dividing the difference of the sizes of the union and the intersection of two sets by the size of the union:

Jaccard distance:Jaccard距离指的是两个data sets的!交集除以两个data sets的并集。也就是1减去两个data sets的Jaccard指数

Jaccard指数
Screen Shot 2017-10-17 at 14.15.49.png

实例:

Jaccard指数
Screen Shot 2017-10-17 at 13.51.25.png
;