.. highlight:: java .. _label-similarity-metrics: ######################## Similarity Metrics Guide ######################## If you've already checked out the ``/compare`` endpoint of our **Compare API** in our |swagger_api| you'll know that this function returns a ``Metric`` object containing several similarity and distance metrics:: class Metric { cosineSimilarity: 0.34935777200667767 euclideanDistance: 0.6797804208600183 jaccardDistance: 0.809368191721133 overlappingAll: 175 overlappingLeftRight: 0.5335365853658537 overlappingRightLeft: 0.22875816993464052 sizeLeft: 328 sizeRight: 765 weightedScoring: 56.1925405151255 } .. |swagger_api| raw:: html interactive API documentation These metrics each display a different perspective on similarity. It is up to you, the user, to decide which one (or which combination of metrics) is most appropriate for your use case. There is also a ``/compare/bulk`` endpoint, allowing multiple pair-wise comparisons with one call and returning a list of ``Metric`` objects corresponding to the input list of pairs to compare. On this page you can find a brief explanation of each of the metrics along with some links to further reading. Cosine Similarity ################# Specifically cosine similarity is defined as a measure of similarity between two vectors of an inner product space that measures the cosine of the angle between them. Generally speaking the closer this value is to **1.0** the more similar the input terms are to each other. As a guide to using **Cortical.io's Retina** we find that a value of around **0.3** indicates a level of similarity which is sufficient for general purposes. You may, of course, require a stricter definition of similarity (or perhaps a looser one) - this is entirely up to you. **Cortical.io's** **Compare API** (which can be found via the **Compare** button in our |swagger_api|) lets you decide where to place the threshold. You can find more detailed information about `cosine similarity on Wikipedia `_. Euclidean Distance ################## In mathematics, the Euclidean distance or Euclidean metric is the *ordinary* distance between two points that one would measure with a ruler. This means the closer this value is to **0.0** (zero), the closer the two items are with respect to similarity. For more details see the `Euclidean Distance Wikipedia entry `_. Jaccard Distance ################ The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard coefficient and is obtained by subtracting the Jaccard coefficient from **1.0**. The closer the Jaccard distance is to **1.0** the more dissimilar the two items are. More detailed information can found on `Wikipedia's Jaccard Distance page `_. Overlapping ########### *Overlapping* is a **Cortical.io** defined similarity measure that shows the number of overlapping points between the items to compare. *Left* being the first item in the comparison and *right* being the second item. **overlappingLeftRight** refers to the percentage of positions of the left side included in the right side (using the example values above this would be the result of the following calculation: 175/328), and **overlappingRightLeft** refers to the percentage of positions of the right side included in the left side (using the example values above this would be the result of the following calculation: 175/765). When combined with the image view of a **Semantic Fingerprint** the overlapping positions enable a qualitative measure of similarity, in that it is then possible to identify the overlapping regions and to isolate the semantics of a particular region. Weighted Scoring ################ This is a **Cortical.io** defined weighting for similarity measures. The higher the weighting, the more similar the terms. This measure can be used in one-to-many comparisons where one side of the comparison remains constant - for instance, an email filter, where the filter remains constant but is compared to many different emails. API Clients ########### The ``FullClient`` object available in the `Java `_, `Python `_, and `JavaScript `_ client libraries has the following methods for calling the ``compare`` endpoints: * ``compare`` * ``compareBulk``