ResearchInconsistent Data Management
Overview
Managing data quality, and particularly data inconsistency, has been one of the major challenges in the research and practice of database management. Sources of inconsistency include imprecise processes of data generation such as mistakes in manual form filling and noisy sensing equipment, as well as data integration where different source databases may contain conflicting information. This problem becomes even more important and central to data management in modern times, where data repositories are increasingly based on imprecise processes (e.g., crowdsourcing and information extraction from natural language) and integration of repositories with varying levels of reliability. In our research, we aim to develop fundamental approaches to managing data quality, including ways to clean, query, and measure the error level in inconsistent databases.