Accurate single-nucleotide polymorphisms (SNP) calls are crucial for robust evolutionary and population genetic inferences in genomic analyses. Such inferences can reveal the time-scales and processes associated with the emergence and spread of pandemic plant pathogens, such as the rice blast fungus Magnaporthe oryzae (Syn. Pyricularia oryzae). However, the specificity and sensitivity of SNP calls depend on the filtering parameters applied to the data. Here, we used a benchmarking approach to evaluate the impact of SNP calling on different population genetic analyses of the rice blast fungus, namely genetic clustering, topology of phylogenetic reconstructions and estimation of evolutionary rates.
To benchmark SNP calling parameters, we generated a gold standard set of validated SNPs by sequencing nine M. oryzae genomes with both Illumina short-reads and Oxford Nanopore Technologies (ONT). We used the gold standard set of SNPs to identify the SNP calling parameter configuration that maximizes sensitivity and specificity. We found that the choice of parameter configurations can substantially change the number of ascertained SNPs, preferentially affecting SNPs segregating at low population frequency. However, SNP calling parameter configurations did not significantly affect the clustering of isolates in clonal lineages, the monophyly of each clonal lineage, and the estimation of evolutionary rates. We leverage the evolutionary rates obtained from each SNP calling parameter configuration to generate divergence time estimates that take into account the uncertainty associated with both the estimation of evolutionary rates and SNP calling. Our analysis indicates that M. oryzae clonal lineage expansions took place ~300 years ago.