Cosine Similarity (Bag of Words Approach)

Input Data #statistics

Cosine Similarity is a measure of the similarity between two non-zero vectors of an inner product space. It is useful in determining just how similar two datasets are. Fundamentally it does not factor in the magnitude of the vectors; it only calculates the angular distance between them.

This calculator can be used to calculate the Cosine distance between:

  1. Two comma-separated datasets made up of numerals e.g. Dataset A = { 0, 1, 2, 3, ... n } and Dataset B = { 100, 110, 120, ... k } or,
  2. Two strings e.g Dataset A = 'Far far away, behind the word mountains, far from the countries Vokalia and Consonantia', and Dataset B = 'Separated they live in Bookmarksgrove right at the coast of the Semantics, a large language ocean.'

NB: The datasets are converted into vectors internally before calculation i.e. this calculator uses the "Bag of Words" approach to vectorize the datasets before calculation.

NB: To calculate the Cosine Similarity of two vectors use this calculator.

Output Result #statistics