Help Choose the Appropriate LSH Hyperparameters
Source:R/lsh_properties.R
jaccard_hyper_grid_search.Rd
Runs a grid search to find the hyperparameters that will achieve an (s1,s2,p1,p2)-sensitive locality sensitive hash. A locality sensitive hash can be called (s1,s2,p1,p2)-sensitive if to strings with a similarity less than s1 have a less than p1 chance of being compared, while two strings with similarity s2 have a greater than p2 chance of being compared. As an example, a (.1,.7,.001,.999)-sensitive LSH means that strings with similarity less than .1 will have a .1% chance of being compared, while strings with .7 similarity have a 99.9% chance of being compared.
Value
a named vector with the hyperparameters that will meet the LSH criteria, while reducing runitme.
Examples
# Help me find the parameters that will minimize runtime while ensuring that
# two strings with similarity .1 will be compared less than .1% of the time,
# strings with .8 similaity will have a 99.95% chance of being compared:
jaccard_hyper_grid_search(.1, .9, .001, .995)
#> band_width n_bands
#> 4 5