.. highlight:: java
.. _label-similarity-metrics:
########################
Similarity Metrics Guide
########################
If you've already checked out the ``/compare`` endpoint of our **Compare API** in our |swagger_api|
you'll know that this function returns a ``Metric`` object containing several similarity and distance metrics::
class Metric {
cosineSimilarity: 0.34935777200667767
euclideanDistance: 0.6797804208600183
jaccardDistance: 0.809368191721133
overlappingAll: 175
overlappingLeftRight: 0.5335365853658537
overlappingRightLeft: 0.22875816993464052
sizeLeft: 328
sizeRight: 765
weightedScoring: 56.1925405151255
}
.. |swagger_api| raw:: html
interactive API documentation
These metrics each display a different perspective on similarity. It is up to you, the user, to decide which one (or which combination of metrics) is
most appropriate for your use case.
There is also a ``/compare/bulk`` endpoint, allowing multiple pair-wise comparisons with one call and returning a list of ``Metric`` objects corresponding to the input list of pairs to compare.
On this page you can find a brief explanation of each of the metrics along with some links to further reading.
Cosine Similarity
#################
Specifically cosine similarity is defined as a measure of similarity between two vectors of an inner product space that measures the cosine of
the angle between them. Generally speaking the closer this value is to **1.0** the more similar the input terms are to each other. As a guide
to using **Cortical.io's Retina** we find that a value of around **0.3** indicates a level of similarity which is sufficient
for general purposes. You may, of course, require a stricter definition of similarity (or perhaps a looser one) - this
is entirely up to you.
**Cortical.io's** **Compare API** (which can be found via the **Compare** button in our |swagger_api|) lets you decide where to place the threshold.
You can find more detailed information about `cosine similarity on Wikipedia `_.
Euclidean Distance
##################
In mathematics, the Euclidean distance or Euclidean metric is the *ordinary* distance between two points that one would
measure with a ruler. This means the closer this value is to **0.0** (zero), the closer the two items are with respect to similarity.
For more details see the `Euclidean Distance Wikipedia entry `_.
Jaccard Distance
################
The Jaccard distance, which measures dissimilarity between sample sets, is complementary to the Jaccard coefficient and
is obtained by subtracting the Jaccard coefficient from **1.0**. The closer the Jaccard distance is to **1.0** the more dissimilar the two
items are.
More detailed information can found on `Wikipedia's Jaccard Distance page `_.
Overlapping
###########
*Overlapping* is a **Cortical.io** defined similarity measure that shows the number of overlapping points between the
items to compare. *Left* being the first item in the comparison and *right* being the second item. **overlappingLeftRight** refers to
the percentage of positions of the left side included in the right side (using the example values above this would be the result of the following calculation: 175/328), and
**overlappingRightLeft** refers to the percentage of positions of the right side included in the left side (using the example values above this would be the result of the following calculation: 175/765).
When combined with the image view of a **Semantic Fingerprint** the overlapping positions enable a qualitative measure of similarity,
in that it is then possible to identify the overlapping regions and to isolate the semantics of a particular region.
Weighted Scoring
################
This is a **Cortical.io** defined weighting for similarity measures. The higher the weighting, the more similar the terms.
This measure can be used in one-to-many comparisons where one side of the comparison remains constant - for instance,
an email filter, where the filter remains constant but is compared to many different emails.
API Clients
###########
The ``FullClient`` object available in the `Java `_,
`Python `_, and
`JavaScript `_ client libraries has the following methods for calling the ``compare`` endpoints:
* ``compare``
* ``compareBulk``