Skip to contents

Fuzzy joins for Jaccard distance

Fuzzy joins for Hamming distance

hamming_inner_join() hamming_anti_join() hamming_left_join() hamming_right_join() hamming_full_join()
Fuzzy joins for Hamming distance using Locality Sensitive Hashing

Fuzzy joins for Euclidean distance

euclidean_anti_join() euclidean_inner_join() euclidean_left_join() euclidean_right_join() euclidean_full_join()
Fuzzy joins for Euclidean distance using Locality Sensitive Hashing

Probabilistic Matching Algorithms

em_link()
Fit a Probabilistic Matching Model using Naive Bayes + E.M.

String deduplication

jaccard_string_group()
Fuzzy String Grouping Using Minhashing

Utilities

jaccard_similarity()
Calculate Jaccard Similarity of two character vectors
hamming_distance()
Calculate Hamming distance of two character vectors

Diagnostics

jaccard_curve()
Plot S-Curve for a LSH with given hyperparameters
jaccard_probability()
Find Probability of Match Based on Similarity
jaccard_hyper_grid_search()
Help Choose the Appropriate LSH Hyperparameters
euclidean_curve()
Plot S-Curve for a LSH with given hyperparameters
euclidean_probability()
Find Probability of Match Based on Similarity
hamming_probability()
Find Probability of Match Based on Similarity

Data

dime_data
Donors from DIME Database