RMSS 2017
Nuala Sheehan
Pedigree Reconstruction from Genetic Marker Data
The problem of estimating relationships amongst a group of individuals from genetic marker data (‘pedigree reconstruction’) is of interest in many diverse areas. Vast numbers of genetic markers are now routinely genotyped on large population cohorts (e.g. UK Biobank) of purportedly unrelated individuals. These cohorts undoubtedly contain relatives and dense marker sets are hugely informative for relatedness. Standard marker-based estimators of pairwise relatedness are often used to adjust association analyses for cryptic relatedness which is thus treated as a nuisance factor. Full relationship information, as provided by a pedigree, could perhaps be exploited to improve inference if it could be reliably recovered. Pedigrees are also important for identifying rare disease alleles via linkage analysis, are essential to understanding parent-of-origin genetic effects, and inform the structure of human populations.
​
In theory, estimating the pedigree for a given set of individuals from genetic marker data requires consideration of all possible relationships amongst them and computing the likelihood for each. For large problems, brute force enumeration is clearly impractical. The reconstruction problem can be formulated as a problem of graphical structure estimation and is known to be NP-hard. We propose an integer linear programming (ILP) approach to graphical structure estimation which is adapted to find valid pedigrees by imposing appropriate constraints. Our method, unlike others, is guaranteed to return a maximum likelihood pedigree for the standard situation where all individuals are observed at unlinked marker loci. The more realistic situation, where observed individuals are typically connected by (possibly many) missing individuals poses a far harder problem, however. Such applications will require efficient formulations of general purpose and graph learning algorithms. In particular, a Bayesian approach enabling the incorporation of additional prior information in a principled way would seem appropriate.