This is a port of the
lsh_probability
function from the
textreuse
package, with arguments changed to reflect the hyperparameters in this
package. It gives the probability that two strings of jaccard similarity
similarity
will be matched, given the chosen bandwidth and number of
bands.
Arguments
- similarity
the similarity of the two strings you want to compare
- n_bands
The number of LSH bands used in hashing.
- band_width
The number of hashes in each band.