Peter Lindner: The Theory of Infinite Probabilistic Databases
Statistical models of real world data typically involve continuous probability distributions such as normal distributions. The traditional theoretical framework of probabilistic databases focusses entirely on finite probabilistic databases. Peter Lindner was the first to develop a theory of probabilistic databases that support probability distributions with infinite samples spaces. Formally, a probabilistic database (PDB) is a probability distribution whose sample spaces consists of database instances, where each instance can be viewed as a finite set or bag (multiset) of facts. Describing continuous probability distributions over spaces consisting of finite sets is non-trivial and requires advanced concepts from measure theory. Lindner developed a framework for such PDBs and proved that queries in standard query languages have a well-defined semantics. The most intensely studied model of finite PDBs is that of tuple-independent PDBs. He studied countable tuple-independent PDBs and gave necessary and sufficient criteria for their existence. Lindner even proposed a natural extension of tuple independence to the general setting of PDBs with continuous distributions. Furthermore, he studied various approaches to finitely representing infinite PDBs, most importantly via generative models specified in a probabilistic version of Datalog.