Skip to contents

Calculate Jaccard Similarity of two character vectors

Usage

jaccard_similarity(a, b, ngram_width = 2, nthread = NULL)

Arguments

a

the first character vector

b

the first character vector

ngram_width

the length of the shingles / ngrams used in the similarity calculation

nthread

Maximum number of threads to use. If NULL (default), Rayon's global thread pool is used, which typically uses all logical CPU cores available.

Value

a vector of jaccard similarities of the strings

Examples

jaccard_similarity(
  c("the quick brown fox", "jumped over the lazy dog"),
  c("the quck bron fx", "jumped over hte lazy dog")
)
#> [1] 0.5714286 0.7692308