# Analysis of 270 samples

Below is an analysis of a hypothetical pathogen screen of 270 patient samples carried out in 6 different ways:

• Single sample per well: This is the traditional way assays are run where each sample is tested with a single assay. The patient sample is assumed to be positive or negative. if the assay is positive or negative respectively. In all cases, this assay requires 1122 tests.
• 18 sample pool with simple retest Here samples are pooled into group of 18 and tested. If the test comes back positive, each member of that pool is tested alone in a second round of testing. The number of assays here varies depending on the number of positive pools.
• XL5 This is the pure multiplex Origami Assay XL5 that tests all compounds in 94 wells. In this case, the design is conservatively decoded such that if a sample is in a positive well 4x or 5x times it is called as a positive. This strategy is known to cause false positive results, particularly at higher prevalence, but is less likely to make a false negative call.
• aXL5 with simple retest This is the same multiplex Origami Assay XL5 above followed by a one by one retest of all positive results. Because of the second re-test step, the number of assays required will change for each condition.
• L3 This is the pure multiplex Origami Assay XL5 that tests all compounds in 46 wells. In this case, the design is conservatively decoded such that if a sample is in a positive well 2x or 3x times it is called as a positive. This strategy is known to cause false positive results, particularly at higher prevalence, but is less likely to make a false negative call.
• aL3 with simple retest This is the same multiplex Origami Assay L3 above followed by a one by one retest of all positive results. Because of the second re-test step, the number of assays required will change for each condition.
• Phase diagram The phase diagram identifies which pooling design is best for each set of error and prevalence. The best design must:
1. Have a Matthew's correlation coefficient (MCC)>0.95. This corresponds to an condition where the assay results are maximally informative of the sample status.
2. If multiple designs satisfy the MCC criteria, the model with the fewest expected number of assays is selected.

The metrics used to describe each experimental design performance are as follows:

• Matthew's correlation coefficient (MCC) (color shades): The performance of each condition is measured using the Matthew's correlation coefficient, abbreviated as MCC. MCC is used in bioinformatics and machine learning because it is better able to handle skewed cases where there are few positive or negative results. An MCC score of 1.0 indicates perfect agreement between the assay and the sample state, while an MCC score of 0.0 would indicate a complete lack of association.
• Assay number (stipples): For the retest cases, we have shown the number of tests expected for each condition. In general, a higher number of dots indicates more assays are required.

To create these diagrams, we randomly created 500 experiments, each with 270 samples for each prevalence and single assay accuracy rate. Next we ran each assay in silico and decoded the results. For each experiment we could then compute the true positive, true negative, false positive, and false negative counts. Furthermore, each virtual run allowed us to record the number of assays that would need to be performed.

The MCC data and number of assays required are median values from the full list of 500. For each condition, we also have a high resolution description of the error distributions as is shown in in Figure 2 at the bottom for three cases.

Figure 1: Matthew's correlation coefficient and assay size requirements for four different models. The phase diagram shows which models work best under what conditions. The actual values in the figure are shown by hovering over the cell to display the underlying data.

Figure 2: Detailed comparison of statistics for three cases. (a) High accuracy, low prevalence, (b) medium accuracy and prevalence, (c) low accuracy and high prevalence.

### Discussion

Overall these results show that pure multiplex assays (L3) performs well at lower prevalence (0.4% or just over 1 positive sample in the 270 group), while the larger multiplex design (XL5) works well up to 2% prevalence (over 5 positives out of 270). The reason the MCC falls off at higher prevalence is because the multiplex asays begins to produce more false positives (as is shown in Figure 2).

Interestingly, the XL5 shows very robust error correction, particularly at low prevalence levels. For example, at the lowest prevalence of 0.1% the XL5 design is has a near perfect MCC score even if the underlying assay is only 85% accurate. Note too that when we have simulated assay accuracy, we include both postive and negative error. Thus an 85% accurate assay is will call a positive a negative 15% of the time, and a negative a positie 15% of the time. An assay with an 85% accuracy rate would be only marginally useful when used with standard designs, but becomes useful when multiple internal replicates are included.

The second stage of retesting yields significantly better results across the board, with the aXL5 design showing the greatest robustness to assay error.

Traditional single assay per sample designs only become appropriate when the prevalence is high (>17%).

Examining the phase diagram, the Dorfman pool (pool_18) performs well across many prevalence values, however this design is significantly more sensitive to assay errors. This lack of robustness to error is likely due to two factors:
1. The Dorfman pool design tests each pool once, and those that report a negative are assumed to be negative based on this single measurement. This single measurement assignment means that Dorfman pool designs are particularly sensitive to false negative results, as they have no means to identify or correct for these kinds of errors.
2. The Dorfman pool has few internal replicates. At most, sample will be tested twice if is in a positve pool, while most samples are only tested once.