Analysis of 2211 samples analyzed in 94 assays

Below is an analysis of a hypothetical pathogen screen of 1122 patient samples carried out in 4 different ways:

Single sample per well: This is the traditional way assays are run where each sample is tested with a single assay. The patient sample is assumed to be positive or negative. if the assay is positive or negative respectively. In all cases, this assay requires 1122 tests.
36 sample pool with simple retest Here samples are pooled into group of 36 and tested. If the test comes back positive, each member of that pool is tested alone in a second round of testing. The number of assays here varies depending on the number of positive pools.
XL3 This is the pure multiplex Origami Assay XL3 that tests all compounds in just 94 wells. In this case, the design is conservatively decoded such that if a sample is in a positive well 2x or 3x times it is called as a positive. This strategy is known to cause false positive results, particularly at higher prevalence, but is less likely to make a false negative call.
aXL3 with simple retest This is the same multiplex Origami Assay XL3 above followed by a one by one retest of all positive results. Because of the second re-test step, the number of assays required will change for each condition.
Phase diagram The phase diagram identifies which pooling design is best for each set of error and prevalence. The best design must:
1. Have a Matthew's correlation coefficient (MCC)>0.95. This corresponds to an condition where the assay results are maximally informative of the sample status.
2. If multiple designs satisfy the MCC criteria, the model with the fewest expected number of assays is selected.

The metrics used to describe each experimental design performance are as follows:

Matthew's correlation coefficient (MCC) (color shades): The performance of each condition is measured using the Matthew's correlation coefficient, abbreviated as MCC. MCC is used in bioinformatics and machine learning because it is better able to handle skewed cases where there are few positive or negative results. An MCC score of 1.0 indicates perfect agreement between the assay and the sample state, while an MCC score of 0.0 would indicate a complete lack of association.
Assay number (stipples): For the retest cases, we have shown the number of tests expected for each condition. In general, a higher number of dots indicates more assays are required.

To create these diagrams, we randomly created 500 experiments, each with 1122 samples for each prevalence and single assay accuracy rate. Next we ran each assay in silico and decoded the results. For each experiment we could then compute the true positive, true negative, false positive, and false negative counts. Furthermore, each virtual run allowed us to record the number of assays that would need to be performed.

The MCC data and number of assays required are median values from the full list of 500. For each condition, we also have a high resolution description of the error distributions as is shown in in Figure 2 at the bottom for the condition prevalence=0.001, assay accuracy rate 0.99, and at prevalence 0.02 and accuracy 0.95.

Figure 1: Matthew's correlation coefficient and assay size requirements for four different models. The phase diagram shows which models work best under what conditions. The actual values in the figure are shown by hovering over the cell to display the underlying data.

Figure 2: Detailed comparison of statistics for two cases. (a) High accuracy, low prevalence case (prevalence=0.001, assay accuracy rate 0.99), and (b) moderate accuracy, and moderate prevalence case (prevalence=0.02, assay accuracy rate 0.95).

Discussion

Overall these results show that pure multiplex assay (XL3) performs well only at low prevalence (0.1% or just over 1 positive sample in the 1122 group). The reason the MCC falls off at higher prevalence is because the XL3 assay begins to produce more false positives (as is shown in Figure 2).

The second stage of retesting yields significantly better results across the board, with the aXL3 design showing the greatest robustness to assay error.

Traditional single assay per sample designs only become appropriate when the prevalence is high (>12%).

Not surprisingly, the single assay accuracy rate is important. At accuracies below 0.98, none of the designs can achieve an MCC score of >0.95. While I suspect that a model with a greater number of internal replicates could overcome this assay error, it would do so at the cost of the design compression--possibly a worthwhile tradeoff.