Autores
William Cohen, Pradeep Ravikumar, Stephen Fienberg
Fecha de publicación
2003/8/24
Revista
Kdd workshop on data cleaning and object consolidation
Volumen
3
Páginas
73-78
Descripción
We describe an open-source Java toolkit of methods for matching names and records. We summarize results obtained from using various string distance metrics on the task of matching entity names: these metrics include distance functions proposed by several different communities, including edit-distance metrics, fast heuristic string comparators, token-based distance metrics, and hybrid methods. We then describe an extension to the toolkit which allows records to be compared. We discuss some issues involved in performing a similar comparison for record-matching techniques, and finally present results for some baseline record-matching algorithms which are based on string comparisons between fields.
Citas totales
Artículos de Google Académico
W Cohen, P Ravikumar, S Fienberg - Kdd workshop on data cleaning and object …, 2003