On Evaluation of Rankings in Analysis of NGS Data

Donald MR; Wilson SR

抽象的

On Evaluation of Rankings in Analysis of NGS Data

Donald MR, Wilson SR

A ranked list of genes (or proteins or regions) is a common output from analysis of NGS data. Many choices will have been made in the analysis (either explicitly or implicitly) and there is no ‘correct’ method to use for the analysis. So if two different and appropriate methods are used, an important question is the following: How similar are the two rankings? Allowing a looser definition of agreement than ‘exact’ agreement, and using a Bayesian logit model with O’Sullivan penalized splines, a useful visualisation has been developed giving the probability of agreement at each point and the credible interval at which the sequence degenerates into noise. The approach is illustrated on some typical RNA-seq data. The estimate of the point at which the agreement between the rankings degenerates into noise, as well as the credible interval, will be over-estimates of their true values. From a practical perspective, it is usually better to estimate a slightly larger set of top-ranked data than one that is smaller. Even so, the estimates found for NGS data are relatively small compared with the total length of the sequence.

免责声明: 此摘要通过人工智能工具翻译，尚未经过审核或验证

下一代测序与应用杂志

抽象的

On Evaluation of Rankings in Analysis of NGS Data

期刊亮点

期刊索引于