Autores
William W Cohen, Pradeep Ravikumar, Stephen E Fienberg
Fecha de publicación
2003/8/9
Revista
IIWeb
Volumen
3
Páginas
73-78
Descripción
Using an open-source, Java toolkit of name-matching methods, we experimentally compare string distance metrics on the task of matching entity names. We investigate a number of different metrics proposed by different communities, including edit-distance metrics, fast heuristic string comparators, token-based distance metrics, and hybrid methods. Overall, the best-performing method is a hybrid scheme combining a TFIDF weighting scheme, which is widely used in information retrieval, with the Jaro-Winkler string-distance scheme, which was developed in the probabilistic record linkage community.
Citas totales
20032004200520062007200820092010201120122013201420152016201720182019202020215376591140121112150172168142137931077075665542
Artículos de Google Académico