ResearchPreference Data Management

Overview

Picture of Preference Data Management Preference data is prevalent variety of domains. The past decade has seen a proliferation in the volume of preference data, the diversity of applications based on preference data, and the richness of preference analysis methods. Examples include rank aggregation in genomic data, management and analysis of elections, and recommendation systems in e-commerce. Management of preference data entails special requirements and opportunities that cast it substantially distinct from general (e.g., relational or graph) data management: certain atomic operations are not supported by standard database engines, yet are very common in preference data analytics; statistical models of preferences fall outside the standard formalisms of “probabilistic databases;” election data requires out-of-databases analysis of the space of outcomes for a given incomplete picture of the voter preferences. Our research develops fundamental frameworks for the management of preference data, with emphasis on the associated complexity and algorithms.

People

Prof. Benny Kimelfeld

Prof. Oded Shmueli

Batya Kenig

Selected Publications

Batya Kenig, Lovro Ilijasic, Haoyue Ping, Benny Kimelfeld, Julia Stoyanovich, "Probabilistic Inference Over Repeated Insertion Models", AAAI 2018 abstractpaper

Distributions over rankings are used to model user preferences in various settings including political elections and electronic commerce. The Repeated Insertion Model (RIM) gives rise to various known probability distributions over rankings, in particular to the popular Mallows model. However, probabilistic inference on RIM is computationally challenging, and provably intractable in the general case. In this paper we propose an algorithm for computing the marginal probability of an arbitrary partially ordered set over RIM. We analyze the complexity of the algorithm in terms of properties of the model and the partial order, captured by a novel measure termed the “cover width”. We also conduct an experimental study of the algorithm over serial and parallelized implementations. Building upon the relationship between inference with rank distributions and counting linear extensions, we investigate the inference problem when restricted to partial orders that lend themselves to efficient counting of their linear extensions.

Benny Kimelfeld, Phokion G. Kolaitis, Julia Stoyanovich, "Computational Social Choice Meets Databases", IJCAI 2018: 317-323 abstractpaper

We develop a novel framework that aims to create bridges between the computational social choice and the database management communities. This framework enriches the tasks currently supported in computational social choice with relational database context, thus making it possible to formulate sophisticated queries about voting rules, candidates, voters, issues, and positions. At the conceptual level, we give rigorous semantics to queries in this framework by introducing the notions of necessary answers and possible answers to queries. At the technical level, we embark on an investigation of the computational complexity of the necessary answers. We establish a number of results about the complexity of the necessary answers of conjunctive queries involving positional scoring rules that contrast sharply with earlier results about the complexity of the necessary winners.

Uzi Cohen, Batya Kenig, Haoyue Ping, Benny Kimelfeld, Julia Stoyanovich, "A Query Engine for Probabilistic Preferences", SIGMOD Conference 2018: 1509-1524 abstractpaper

Models of uncertain preferences, such as Mallows, have been extensively studied due to their plethora of application domains. In a recent work, a conceptual and theoretical framework has been proposed for supporting uncertain preferences as first-class citizens in a relational database. The resulting database is probabilistic, and, consequently, query evaluation entails inference of marginal probabilities of query answers. In this paper, we embark on the challenge of a practical realization of this framework. We first describe an implementation of a query engine that supports querying probabilistic preferences alongside relational data. Our system accommodates preference distributions in the general form of the Repeated Insertion Model (RIM), which generalizes Mallows and other models. We then devise a novel inference algorithm for conjunctive queries over RIM, and show that it significantly outperforms the state of the art in terms of both asymptotic and empirical execution cost. We also develop performance optimizations that are based on sharing computation among different inference tasks in the workload. Finally, we conduct an extensive experimental evaluation and demonstrate that clear performance benefits can be realized by a query engine with built-in probabilistic inference, as compared to a stand-alone implementation with a black-box inference solver.

Benny Kimelfeld, Ester Livshits, Liat Peterfreund, "Detecting Ambiguity in Prioritized Database Repairing", ICDT 2017: 17:1-17:20 abstractpaper

In its traditional definition, a repair of an inconsistent database is a consistent database that differs from the inconsistent one in a “minimal way.” Often, repairs are not equally legitimate, as it is desired to prefer one over another; for example, one fact is regarded more reliable than another, or a more recent fact should be preferred to an earlier one. Motivated by these considerations, researchers have introduced and investigated the framework of preferred repairs, in the context of denial constraints and subset repairs. There, a priority relation between facts is lifted towards a priority relation between consistent databases, and repairs are restricted to the ones that are optimal in the lifted sense. Three notions of lifting (and optimal repairs) have been proposed: Pareto, global, and completion. In this paper we investigate the complexity of deciding whether the priority relation suffices to clean the database unambiguously, or in other words, whether there is exactly one optimal repair. We show that the different lifting semantics entail highly different complexities. Under Pareto optimality, the problem is coNP-complete, in data complexity, for every set of functional dependencies (FDs), except for the tractable case of (equivalence to) one FD per relation. Under global optimality, one FD per relation is still tractable, but we establish Pi-2-p-completeness for a relation with two FDs. In contrast, under completion optimality the problem is solvable in polynomial time for every set of FDs. In fact, we present a polynomial-time algorithm for arbitrary conflict hypergraphs. We further show that under a general assumption of transitivity, this algorithm solves the problem even for global optimality. The algorithm is extremely simple, but its proof of correctness is quite intricate.

Batya Kenig, Benny Kimelfeld, Haoyue Ping, Julia Stoyanovich, "Querying Probabilistic Preferences in Databases", PODS 2017: 21-36 abstractpaper

We propose a novel framework wherein probabilistic preferences can be naturally represented and analyzed in a probabilistic relational database. The framework augments the relational schema with a special type of a relation symbol—a preference symbol. A deterministic instance of this symbol holds a collection of binary relations. Abstractly, the probabilistic variant is a probability space over databases of the augmented form (i.e., probabilistic database). Effectively, each instance of a preference symbol can be represented as a collection of parametric preference distributions such as Mallows. We establish positive and negative complexity results for evaluating Conjunctive Queries (CQs) over databases where preferences are represented in the Repeated Insertion Model (RIM), Mallows being a special case. We show how CQ evaluation reduces to a novel inference problem (of independent interest) over RIM, and devise a solver with polynomial data complexity.

back to index