# Similarity Metrics Guide¶

If you’ve already checked out the `/compare` endpoint of our **Compare API** in our interactive API documentation
you’ll know that this function returns a `Metric` object containing several similarity and distance metrics:

```
class Metric {
cosineSimilarity: 0.34935777200667767
euclideanDistance: 0.6797804208600183
jaccardDistance: 0.809368191721133
overlappingAll: 175
overlappingLeftRight: 0.5335365853658537
overlappingRightLeft: 0.22875816993464052
sizeLeft: 328
sizeRight: 765
weightedScoring: 56.1925405151255
}
```

These metrics each display a different perspective on similarity. It is up to you, the user, to decide which one (or which combination of metrics) is most appropriate for your use case.

There is also a `/compare/bulk` endpoint, allowing multiple pair-wise comparisons with one call and returning a list of `Metric` objects corresponding to the input list of pairs to compare.

On this page you can find a brief explanation of each of the metrics along with some links to further reading.

## Cosine Similarity¶

Specifically cosine similarity is defined as a measure of similarity between two vectors of an inner product space that measures the cosine of
the angle between them. Generally speaking the closer this value is to **1.0** the more similar the input terms are to each other. As a guide
to using **Cortical.io’s Retina** we find that a value of around **0.3** indicates a level of similarity which is sufficient
for general purposes. You may, of course, require a stricter definition of similarity (or perhaps a looser one) - this
is entirely up to you.

**Cortical.io’s** **Compare API** (which can be found via the **Compare** button in our interactive API documentation) lets you decide where to place the threshold.
You can find more detailed information about cosine similarity on Wikipedia.

## Euclidean Distance¶

In mathematics, the Euclidean distance or Euclidean metric is the *ordinary* distance between two points that one would
measure with a ruler. This means the closer this value is to **0.0** (zero), the closer the two items are with respect to similarity.

For more details see the Euclidean Distance Wikipedia entry.

## Jaccard Distance¶

The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard coefficient and
is obtained by subtracting the Jaccard coefficient from **1.0**. The closer the Jaccard distance is to **1.0** the more dissimilar the two
items are.

More detailed information can found on Wikipedia’s Jaccard Distance page.

## Overlapping¶

*Overlapping* is a **Cortical.io** defined similarity measure that shows the number of overlapping points between the
items to compare. *Left* being the first item in the comparison and *right* being the second item. **overlappingLeftRight** refers to
the percentage of positions of the left side included in the right side (using the example values above this would be the result of the following calculation: 175/328), and
**overlappingRightLeft** refers to the percentage of positions of the right side included in the left side (using the example values above this would be the result of the following calculation: 175/765).

When combined with the image view of a **Semantic Fingerprint** the overlapping positions enable a qualitative measure of similarity,
in that it is then possible to identify the overlapping regions and to isolate the semantics of a particular region.

## Weighted Scoring¶

This is a **Cortical.io** defined weighting for similarity measures. The higher the weighting, the more similar the terms.
This measure can be used in one-to-many comparisons where one side of the comparison remains constant - for instance,
an email filter, where the filter remains constant but is compared to many different emails.

## API Clients¶

The `FullClient` object available in the Java,
Python, and
JavaScript client libraries has the following methods for calling the `compare` endpoints:

`compare``compareBulk`