Introduction

The introduction addresses the following questions:

  • What is the Retina?
  • What is a fingerprint?
  • Advantages of the Retina

Brevity is king. We value your time. Using the Retina is meant to be as easy and swift as possible. The same goes for the documentation.

What is the Retina?

In short, the Retina is a sparse distributed semantic space (also referred to as Distributional Memory [Baroni2010]). This is analogous to the state of the art research from the fields of cognitive neuroscience and psycho-linguistics. It is assumed that language is stored in the human brain in the form of a distributed memory (contrary to earlier models built on the idea of symbolic storage).

_images/journal.pone.0008622.g001.png

Observed word related activation patterns in cortical areas [JUST2010]

The figure above outlines the principle of a distributed memory. Words are not stored as a single symbolic unit (i.e. there is no single neuron storing the term ‘dog’). Rather, the word representation is composed of activation patterns of neurons. The same applies to the Retina. Words are not stored and processed on the character level, but rather represented in the form of a semantic fingerprint.

What is a Semantic Fingerprint?

A semantic fingerprint is a sparse distributed representation.

_images/fingerprints_en_associative_mouse_rat.png

Semantic fingerprints of the terms mouse and rat.

As can be seen in the figure above, the representations of mouse and rat are based on the composition of sparse semantic bits. Active bits can be interpreted as activated neurons. It takes only a single glance to verify that both semantic fingerprints contain similar activation patterns.

_images/fingerprints_en_associative_mouse_rat_overlay.png

Common semantic bits of the terms mouse and rat.

The figure above highlights the common patterns of mouse and rat, and emphasizes one of the main advantages of semantic spaces. Terms can be compared at the level of meaning. On the symbolic (i.e. character) level, the difficulty of measuring word similarity is limited by the vocabulary mismatch problem. Vocabulary mismatch means that the same or similar meaning can be expressed in many different ways (using different vocabulary or inflections). This poses problems in the context of search, where a user might enter a query containing the keyword mammal. Based on matching on the character level an Information Retrieval (IR) system might not be able to retrieve result documents containing the term mammals. A commonly applied technique to mitigate this problem consists of the use of stemming algorithms (Porter Stemming Algorithm). Porter stemming would transform mammals into mammal. However it is not able to solve the problem of matching mouse and mice, or rat and mouse.

_images/fingerprints_en_associative_mice_rat_mammals.png

Semantic fingerprints of the terms mice, rat, mammals.

The above figure highlights how the Retina solves the vocabulary mismatch problem. The figure shows the fingerprints of the terms mice, rat, and mammals. All three terms are closely related to the term mouse. By representing terms as semantic fingerprints the Retina can accurately compute the similarity of these terms. The three fingerprints also allow us to highlight that the sparse distributed representation has a consistent topology. That is, the location of the bits in the fingerprint is meaningful. All three fingerprints share a cluster in the top-right corner (as you view it). This cluster is the main cluster in the right-most mammals fingerprint, representing meaning that relates to an animal-mammal sense.

To summarize: Fingerprints are the Retina’s way of representing meaning. The representation consists of a sparse distributed set of bits (neurons), and its topology is meaningful. As we will later see, these attributes not only allow for comparing words, but can also be expanded to a semantic view on text, and to do meaningful (and interesting!) computations with semantic fingerprints.

Advantages of the Retina

We expect this technology to be easy and intuitive to start using. No prior knowledge of semantics, ontologies or Natural Language Processing is required.

As we have just seen above, the Retina lets us compare the meaning of terms on the semantic level. Using the terms as the basic unit, the fingerprint representation also extends to representing the semantic content of texts in the same space. This gives the ability to compare the semantic content of two texts with a single call to the API.

Using these simple tools, it would be possible to make a semantic search engine by indexing the documents in the database with their semantic fingerprint. The index could then be queried with a fingerprint created from another text or term, and semantic related documents could be retrieved.

The Retina API works at a high processing speed and will also allow you to parallelize your API requests as well as combining multiple queries in a bulk request.

In addition, meaningful semantic operations can be performed on the fingerprints, as for example removing the computer hardware related meaning of the term mouse by performing simple operations on the fingerprints.

Furthermore we offer functionality for listing similar terms for a given fingerprint, as well as performing a disambiguation of any term into its respective contexts.

Now, let’s test drive the Retina API, go to Quick Start.

References

[JUST2010]Just, M.A. et al., (2010) A neurosemantic theory of concrete noun representation based on the underlying brain codes. PLoS ONE, 5(1).
[Baroni2010]Baroni, M. & Lenci, A., (2010). Distributional Memory: A General Framework for Corpus-Based Semantics. Computational Linguistics, 36(4), p.673-721.