GET THE APP

On Evaluation of Rankings in Analysis of NGS Data | 13363

下一代测序与应用杂志

国际标准期刊号 - 2469-9853

抽象的

On Evaluation of Rankings in Analysis of NGS Data

Donald MR, Wilson SR

A ranked list of genes (or proteins or regions) is a common output from analysis of NGS data. Many choices will have been made in the analysis (either explicitly or implicitly) and there is no ‘correct’ method to use for the analysis. So if two different and appropriate methods are used, an important question is the following: How similar are the two rankings? Allowing a looser definition of agreement than ‘exact’ agreement, and using a Bayesian logit model with O’Sullivan penalized splines, a useful visualisation has been developed giving the probability of agreement at each point and the credible interval at which the sequence degenerates into noise. The approach is illustrated on some typical RNA-seq data. The estimate of the point at which the agreement between the rankings degenerates into noise, as well as the credible interval, will be over-estimates of their true values. From a practical perspective, it is usually better to estimate a slightly larger set of top-ranked data than one that is smaller. Even so, the estimates found for NGS data are relatively small compared with the total length of the sequence.