Bi-Weekly: Probabilistic Databases with an Infinite Open-World Assumption

Tuesday, 10.07.2018, 10:15am

Location: RWTH Aachen University, Department of Computer Science - Ahornstr. 55, building E3, room 9u10


Speaker: Peter Lindner


Probabilistic databases introduce uncertainty into relational databases by specifying probabilities for several possible instances. That is, they are traditionally finite probability spaces over database instances. Such probabilistic databases inherently make a closed-world assumption - non-occurring facts are assumed to be impossible, rather than just unlikely. As convincingly argued by Ceylan and others, this results in implausibilities and clashes with intuition.

An open-world assumption, where facts not explicitly listed may have a small positive probability can yield more reasonable results. The corresponding open-world model of Ceylan et al. however assumes that all entities in the probabilistic database come from a fixed finite universe.

In this work, we take one further step and propose a model of „truly“ open-world probabilistic databases with an infinite universe. This is natural when we for example consider entities to be integers, real numbers or strings. While the probability space might become infinitely large, all instances of a probabilistic database remain finite. We provide a sound mathematical framework for infinite probabilistic databases in generalization of the existing theory on finite probabilistic databases. Our main results are concerned with tuple-independent probabilistic databases; we present a generic construction showing that such probabilistic databases exist in the infinite and provide a characterization of their existence in general. This model can be used to apply open-world semantics to finite probabilistic databases. The construction can also be extended to so-called block-independent-disjoint probabilistic databases.