Bi-Weekly Talk: Peter Lindner: Generative Datalog with Continuous Distributions

Wednesday, Jan 29, 2020, 10:30am

Location: RWTH Aachen University, Department of Computer Science - Ahornstr. 55, building E3, room 9u10

Speaker: Peter Lindner

Probabilistic Databases (PDBs) are a formal model of uncertainty in relational databases, as might occur in a variety of practical application scenarios such as noisy or unreliable input data, data integration or data cleaning. Quite recently, Bárány et al. (TODS 2017) proposed a language called "Probabilistic Programming Datalog (PPDL)" which uses classic Datalog rules that are extended by random sampling. In a nutshell, PPDL is a declarative probabilistic programming language with very close ties to database applications and can be seen as a tool to specify PDBs. In this talk, we focus on the generative part of the language, "Generative Datalog". While the original language of Bárány et al. only supported discrete probability distributions, we allow using probability density functions and inputs that are already PDBs themselves. We present the formal semantics of the language and discuss various properties and consequences, most notably, the support of PDB inputs and robustness with respect to the order of rule applications.