# Learning Definable Relations in Graphs

Copyright: Martin BraunWe study classification problems for elements from simple structures. That is, given an element of a graph we want to estimate, whether it fulfils some property. The property is not given directly, but via the use of examples which consist of such an element from a graph structure and the correct classification. As in other machine learning scenarios from supervised learning, we assume that we can access a given amount of examples but may not ask for the correct classification of a specific node.

We only consider learning algorithms, where the output is a logical formula and a number of constants called parameters. In this framework, we accept an element from the graph if and only if the underlying graph satisfies the formula. This way, every formula describes a hypothesis that is a function that estimates a given property.

It is known that any hypothesis that classifies a training set correctly will also generalize well in the PAC learning model. We therefore aim at finding hypotheses that are consistent with a given training set. We work with formulas from first-order and monadic second-order logic, where the latter extends the former by allowing for additional quantification over unary relations, i.e. over sets of nodes. We also consider restrictions of first-order logic such as quantifier-free and existential or universal formulas which are formulas using only one type quantifiers, or in the quantifier-free case, no quantifiers at all. Those formulas are then evaluated over simple structures, such as strings, trees and structures of bounded degree.

The goal is to design learning algorithms that run in time polynomial in the size of the training set, independently of or at least sublinear in the size of the whole data set. In case that this is not possible, we consider algorithms that first create an index structure for the underlying structure in linear (or polynomial) time and then find a consistent hypothesis in polylogarithmic time.

We also consider extensions of the above learning problem, especially regarding higher-arity learning problems, where instead of positions in the graph, we instead consider tuples of the graph and try to find hypotheses based on logical formulas which are consistent with the training examples. Since we are learning to predict membership of tuples in a higher arity relation, this means that we learn new relations over the graph.

We study classification problems for elements from simple structures. That is, given an element of a graph we want to estimate, whether it fulfils some property. The property is not given directly, but via the use of examples which consist of such an element from a graph structure and the correct classification. As in other machine learning scenarios from supervised learning, we assume that we can access a given amount of examples but may not ask for the correct classification of a specific node.

We only consider learning algorithms, where the output is a logical formula and a number of constants called parameters. In this framework, we accept an element from the graph if and only if the underlying graph satisfies the formula . This way, every formula describes a hypothesis that is a function that estimates a given property.

It is known that any hypothesis that classifies a training set correctly will also generalize well in the PAC learning model. We therefore aim at finding hypotheses that are consistent with a given training set. We work with formulas from first-order and monadic second-order logic, where the latter extends the former by allowing for additional quantification over unary relations, i.e. over sets of nodes. We also consider restrictions of first-order logic such as quantifier-free and existential or universal formulas which are formulas using only one type quantifiers, or in the quantifier-free case, no quantifiers at all. Those formulas are then evaluated over simple structures, such as strings, trees and structures of bounded degree.

The goal is to design learning algorithms that run in time polynomial in the size of the training set, independently of or at least sublinear in the size of the whole data set. In case that this is not possible, we consider algorithms that first create an index structure for the underlying structure in linear (or polynomial) time and then find a consistent hypothesis in polylogarithmic time.

We also consider extensions of the above learning problem, especially regarding higher-arity learning problems, where instead of positions in the graph, we instead consider tuples of the graph and try to find hypotheses based on logical formulas which are consistent with the training examples. Since we are learning to predict membership of tuples in a higher arity relation, this means that we learn new relations over the graph.