Geib, Manuela; Peter F. Stadler and Marc Hellmuth

Reciprocal best matches play an important role in numerous applications in computational biology, in particular as the basis of many widely used tools for orthology assessment. Nevertheless, very little is known about their mathematical structure. Here, we investigate the structure of reciprocal best match graphs (RBMGs). In order to abstract from the details of measuring distances, we define reciprocal best matches here as pairwise most closely related leaves in a gene tree, arguing that conceptually this is the notion that is pragmatically approximated by distance- or similarity-based heuristics. We start by showing that a graph G is an RBMG if and only if its quotient graph w.r.t. a certain thinness relation is an RBMG. Furthermore, it is necessary and sufficient that all connected components of G are RBMGs. The main result of this contribution is a complete characterization of RBMGs with 3 colors/species that can be checked in polynomial time. For 3 colors, there are three distinct classes of trees that are related to the structure of the phylogenetic trees explaining them. We derive an approach to recognize RBMGs with an arbitrary number of colors; it remains open however, whether a polynomial-time for RBMG recognition exists. In addition, we show that RBMGs that at the same time are cographs (co-cRBMGs) can be recognized in polynomial time. Co-cRBMGs are characterized in terms of hierarchically colored cographs, a particular class of vertex colored cographs that is introduced here. The (least resolved) trees that explain co-cRBMGs can be constructed in polynomial time.