The polyploidy index (also named the P-index) is a statistical approach for quantifying fluctuations in gene retention differences and provides a mathematical method to evaluate the similarity between homoeologous chromosomes in polyploids (M. Li et al., 2019). The P-index value represents the degree of differentiation between polyploid subgenomes, where a larger value indicates a greater difference between subgenomes. The P-index value should fall in the range of 0 to 1. When the P-index value is infinitely close to 0, it indicates that there is almost the same genomic fractionation level between subgenomes; conversely, when the value is infinitely close to 1, it indicates that one of the subgenomes has an absolute fractionation advantage. Generally, a P-index demarcation line of 0.3 distinguishes between known or previously inferred allopolyploidy or autopolyploidy (M. Li et al., 2019). Therefore, we could infer the evolutionary types of polyploids and assess their evolutionary impact based on a P-index threshold of 0.3.
First, when calculating the P-index, a reference genome was employed to infer orthologous regions with the selected genomes and to identify the gene losses or translocations in every inferred subgenome produced by WGD events. Different genomes could be selected as the reference genome, where a well-assembled, rare, specific genomic rearrangement and evolutionarily close genome could serve as an ideal reference genome for estimating the P-index value of the considered genome. Then, the subgenomes of a considered polyploidy-affected genome were mapped onto a selected reference genome, which no affected by the polyploidization event. Assuming that there were K chromosomes in the reference genome, the subgenomes A and B identified in the considered genome. Regardless of whether one dominates, each pair of homoeologous chromosomes were was divided into N_c Windows with M (such as 50) genes. For the i-th window of a specific homoeologous chromosome pair, the gene retention rate Ai and Bi relative to the reference genome were was obtained, thus the P-index value was conferred as:
which would fall in the range of [0-1]. To avoid the shorter chromosomes in reference genome have fewer collinear genes may lead to greater volatility, we set a weight for each chromosome that is estimated with:
Sliding windows with highly similar retention rates were removed by defining the evaluation coefficient as:
in which the gene retention difference level is defined as:
and the represents the number of windows with
In our study, when calculating the P-index values of WGD events in the A. coerulea, C. chinensis and T. sinense genomes, the V. vinifera genome was selected as the reference genome. When calculating the P-index value of the recent PST event in the P. somniferum genome, the A. coerulea genome was selected as the reference. Among the P-index calculations, the number of windows, \delta\left(N_c\right), was 50, and the gene retention levels of subgenomes were estimated with a sliding window of 100. More details have been provided in a previous article (M. Li et al., 2019).