Guest Talk: Data generation for programming by example

Wednesday, 28.11.2018, 10:30am

Location: RWTH Aachen University, Department of Computer Science - Ahornstr. 55, building E3, room 9u10

Speaker: NathanaŽl Fijalkow

Programming by example is the problem of synthesising a program from a small set of pairs input and output. Despite having found applications in several areas it is notoriously computationally expensive. Recent works have considered hybrid approaches combining ML and PL based techniques. These techniques require generating a training dataset, which leads to significant difficulties related to finding the most informative inputs to characterise a given programme. In this talk we show that the data generation procedure has a significant impact on performance. The novelty of our approach relies on using an SMT solver to synthesize meaningful inputs with varied behaviour for a given program. By testing against several distributions, we show that our constraint-based approach improves on the generalizability of the models. Our results are consistent across two common learning architectures used in previous work.