Matrix Design

The pooling design matrices presented in Origami Assays are based on a well defined branch of mathematics and operations research variously called group testing, coding theory, or simply information theory.

Below we describe some of the underlying theory about these designs and how they are constructed.

Preliminaries

n= number of samples
k= maximum number of samples that are positive
p= prevalence rate
m= number of tests or assays to run

Counting

Before delving into designs, lets first do some counting to figure out what is even possible. Our overall goal is to map some big space of possible sample configurations down into a smaller space of assay configurations. However if our smaller assay space is too small, then there is no way it can hold the answers we want.

Given a fixed number of assays (m), we can compute the number of binary outcomes that assay set can produce as \[ 2^m\] This value is the number of labels that the assay is capable of producing.

For the samples, we can compute the number of combinations of 0,1, 2,..k positive results by the sum of the combinations \[1+ \sum_{j=1}^k \frac{ n! }{ j! (n-j)!} \] For a design to be feasible, the number of assay states must exceed the number of sample states: \[2^m \geq 1+ \sum_{j=1}^k \frac{ n! }{ j! (n-j)!} \] If solve for m we can define the theoretically smallest experiment for each value of k.
k=0 \[ m_{min}=1 \]
k=1

\[ m_{min}= \frac{\log{(n+1)}}{\log{2}} \]

k=2
\[ m_{min}= \frac{\log{ \frac{1}{2} (2 + n + n^2) }}{\log{2}} \]


k=3
\[ m_{min}= \frac{\log{ \frac{1}{6} (6 + 5 n + n^3) }}{\log{2}} \] k=4
\[ m_{min}= \frac{\log{ \frac{1}{24} (24 + 14 n + 11 n^2 - 2 n^3 + n^4) }}{\log{2}} \]
In the limit of of a perfect adaptive test design, and a fixed and known k and n, we can compute the minimum number of expected tests as:

\[ m_{min}= \log_2{ (n!/ ( k!(n-k)!) ) } \]
If we assume a large value of n and replace the k with the prevalence (p), then using Stirling's approximation we can define the lower bound as

\[ m_{min}= ((1-p) n^p \log{n} )/(\log{2}) \]
Note that these are strong bounds that apply in all cases of nonadaptive and adaptive designs. No assay can express more bits of information than the assay can produce.

For a more precise bound, we can approximate the size of an achievable Shifted Transversal Design (STD) for a group test as (with n as the square of a prime number and r as the replicate number):
\[ m_{min}= r \sqrt{n} \]
For an alternative construction based on Kautz and Singleton, it has more recently been shown that the number of tests can scale as:
\[ m_{min}= (l(r-1)+1) n^{1/(l+1)} \]
Where l is a positive integer such that:
\[ l(r-1)+1 \le n^{(1/(l+1))} \]
This result implies that for larger desings (larger n), it is possible to use a larger replicate number to achive better compression.
For particular cases, such as a Steiner triple system (corresponding to k=3) we get an equivalent limit of
\[ n = m(m-1)/6 \]
Rearranging we find:
\[ m_{min} = 1/2+ \sqrt{(24n+1)} \]
Note that this limit only strictly applies when n=6k+1 or n=6k+3 where k is an integer.. This Steiner limit is less than the STD limit at k=3 for large n. In the cases where I can find a model by random construction, I have never found a model better than the Stiner limit, and in cases off of the n=6k+1 or n=6k+3 scenarios, the limit suggests optimization but often that optimization is not achievable.
More generally, the designs found in OA are all Steiner systems of the form S(2,r,m) with the number of blocks as n. Here r is the number of replicates, m is the number of assays, and n is the number of samples. These systems scale as
\[ n = m(m-1)/(r(r-1)) \]

Examples found in the literature of larger systems include:

Scaling comparison for r=3 (k=2), n=100 and n=1000

Applying some values, we can see how each model scales
n Design m
100 counting limit 12.3
100 STD 30
100 Steiner 25
100 Kautz and Singleton 30 (l=1)
(k>1 not allowed)
10,000 counting limit 25.57
10,000 STD 300
10,000 Steiner 245.5 (t=2)
85.34 (t=3)
10,000 Kautz and Singleton 300 (l=1)
107.7 (l=2)
70 (l=3)
(k>3 not allowed)
My read of these scalings suggests:

From these limits, we can see that for disease prevalence testing we see that a small, but not too small test size is ideal. If we approximate the k=p*n where p is the prevalence, we can use the STD estimate to show how the number of assays scale with the number of samples:


\[ m= k \sqrt{n} = p n^{(3/2)} \]
This shows that to achive the same disjunctness guarantees for large sample numbers requires a disproportionate number of assays. It could be argued that we don't need the full disjunctness to follow k because high disjunct models already capture most of what we want, but I don't have enough evidence to fully back that.

Taking the relationship above to the logical limit, we would ideally test only single samples, thereby minimizing n and minimizing m. But this extrapolation is incorrect. Our approximation of k=p*n is only valid for large n. At smaller values of n, the required k is higher than n*p because there is a good possiblity of seeing more than k positive results.

We can approximate the upper observable value of k from a binomial distribution by taking the max value we might see with, say 95% confidence (19/20) using Hoeffeig's inequality:

\[ 19/20 = exp( \frac{-2(np-k)^2}{n} ) \]

Sovling for k and substituing in to our STD limit above, we can find the point where n=m for any p as:

\[ n = \frac{1}{2p^2} \left( 2+\log{(20/19)} -2 \sqrt{2 \log{(20/19)}} \right) \]
Substituting in values for p we see that for p=0.01, n=7053; p=0.05, n=282; and p=0.1, n=70. The takeaway is that for small prevalence values we benefit from designs with large n, but only up to a limit.

The exercise above suggests a process defining the acceptable prevalace for a given design. If we know n and k from a design, then we can call out the prevalance that corresponds to a k within the 95% confidence interval as:

\[ p = \left( 2k \pm \sqrt{2 n^2 \log{(20/19)}} \right)/(2n) \]

Example: S2 design (k=1, n=45, m=10)

If we have 45 samples, and at most 1 positive (k=1, n=45) (the S2 case in OA), we find the minimum number of assays is 5.523. Assays only come in integer numbers, so we round up to 6. This suggests a compression of 45/6 = 7.5 is theoretically possible.

If we were to actually design a 6 assay test for 45 samples, this would mean that the assay could describe 2^6= 64 different states, while the samples could be in 45+1=46 different sample states. Because we had to round up, the sample configurations take up only 71.8% of the possible space. This fraction of the space is called the rate of a design.

The Origami Assay design S2 also tests 45 compounds with a guarantee on one or fewer positives, but it is forced to use 10 tests. This means that S2 needs to use 210 =1024 assay states to describe 46 sample configurations (only 4.5% of the possible space).

If we test the case when we have up to two positives (k=2) and 45 samples (n=45), we find we would need a minimum of m_min=10.02 assays, rounded up to 11. This shows us that no matter how we design our assay, it is impossible to create a design that tests 45 samples with 2 or fewer positives in 10 assays.

Example: S3 design (k=2, n=13, m=10)

If we have 13 samples, and at most 2 positive (k=1, n=13) (the S3 case in OA), we find the minimum number of assays is 6.52. Rounded up to a whole number, this suggests that we need 7 assays, which is three less than the 10 required by the S3 design.

Expanded more fully, S3 uses 210 =1024 assay states to describe 1+13+78=92 sample configurations (only 9% of the possible space).

Why the extra space?

Why do the Origami Assays designs need this extra assay space? The extra space is needed for two distinct reasons:

(1) Assay physics: A diagnostic binary assay acts as an OR function for all the samples in a well. If any one sample is positive, then that assay well reports a positive. This OR function constrains the kinds of solutions any pooling strategy can produce. As an example, adding another sample can never flip a positive assay result to a negative result.

(2) Search limits: While we have gone to great lengths to construct our assay pooling designs, we leave open the possibility that there may exist even more efficient designs. Origami Assays has focused on relatively small designs (n<1000) which are experimentally practical for pathogen screening. These small designs have the advantage that they can be searched more thoroughly

Two advantages of a somewhat less efficient design are:

(1) Robustness above k: While the designs only guarantee correct calls for up to k positive results, the larger designs give good results if the number of positives exceeds k too.

(2) Positive failure: In large part because of the OR function of the assay, any false calls made when k is exceeded will be false positives, not false negatives. As a result, multiplexing an assay will never mean that a positive result is missed--a desirable feature in a pathogen screening context.

Origami Assay Construction

Requirements: Note that there is no requirement for each pool to contain the same number of samples, however in practice the compact designs that work tend to have nearly equal sample weight.

Each assay design is found through Monte Carlo sampling. Initially a list of samples of length r*n is created. This list contains all of the replicates of each sample. Next a sample is palced into a random assay well. If that sample is already in the well or the sample's placement creates a duplicate pair test, then the sample is removed from the well. This process is repeated until either all of the samples are placed or one of the samples can not be placed.
If a sample can not be placed, then the design process is restarted and a new random configuration is generated.
This design strategy guarantees that each successful model is (r-1)-disjunct, or said another way is a uniquely decipherable code of order r-1 (UDr-1)

Decoders


COMP: Combinatorial Orthogonal Matching Pursuit
1) Every sample that appears in a negative assay result is marked as negative.
2) The remaining samples that are present one or more times in a positive assay are marked as positive.
This decoder will tend to produce more false positives, but is also more resistant to assay error as it will call any sample positive if it has any support.

DD: Definite Defectives
1) Every sample that appears in a negative assay result is marked as negative.
2) Mark every sample that is present alone in a positive assay as positive.
3) Mark all remaining items as negative.

This decoder will will call fewer positives, but is more sensitive to assay errors.

SSS: Smallest Satisfying Set
1) Every sample that appears in a negative assay result is marked as negative.
2) Of the remaining samples, find the smallest set of samples that could be positive to achieve these results and report those. If there is a tie, choose the best set at random.
Note that SSS is one of the best performing decoding algorithms in practice, but is too slow for large models (n>1000)

PAN: Positive, Ambiguous, Negative (Origami Assays)
1) Samples that never appear in a positive assay are marked as negative (N). (same as COMP)
2) Every sample that appears exclusively in positive wells is marked as potentially positive (pp).
3) Potentially positive (pp) samples that appear as the only pp sample in at least one positive assay are marked as positive (P). (same as DD)
4) All other samples are listed as ambiguous (A). Note, for the balanced designs in Origami Assays, empirically we find from simulations that for cases where the prevalence is near the positive call max, approximately 50% of the ambiguous calls are positive also. This indicates that the pool of ambiguous is enriched in positive samples.

This approach does not require that the design is balanced, or assume any structure to the pooling design. However, in practice, balanced pooling designs perform better.

ePAN: Positive, Ambiguous, Negative with assay errors (Origami Assays)
ePAN is the same as PAN except the assay states are first sampled using an error estimate of false positive and false negative rates. The sample distribution is estimated by taking 10,000 samples assuming each experiment is independent. Each sampled assay configuration is then scored using PAN to assign it a P, A, or N value.

The final reported value is a normalized weighting of P, A, and N values that sums to 1.0 for each sample based on the assay error.

Full Bayesian:
The most comprehensive method is to perform a full Bayesian analysis. This analysis uses the pooling design to define the relationships between the sample states and assays. Next it uses a Gibbs sampling or related stochastic method to sample potential configurations of sample states (positive/negative), and potential assay errors.

Using a full Bayesian approach should extract the most information, as it allows us to integrate all of the assay results along with all of the prior knowledge we might have. For example, we likely know the prevalence of a positive result from our sample--information that can help us to estimate how many positive results we expect to see in a given sample set.

Wile the full Bayesian method is the most accurate method, it presents three practical problems:
  1. Uncertainty in priors: A strength of Bayesian analysis is that it allows us to use prior information, but often that information is not well known. For example, assay error rates differ between laboratories and platforms. Furthermore, because many laboratories only perform a single assay on a single sample, they are unable to detect errors, so can't compute their error rates.
  2. Computational complexity: Sampling through the space of possible sample configurations and assay configurations is a computationally demanding process. While there are ways to accelerate these calculations, the methods themselves introduce additional assumptions.
  3. Interpretation complexity: While a full Bayesian assay provides the most information, much of that information is difficult to communicate and is not actionable. For example, most end users are unfamiliar with Bayes factors or hyperparameters that may describe a Bayesian result. Furthermore, some results of the analysis such as which pooled assay was likely an error may not be relevant because there is nothing that will be done with those data. Such data are both hard to calculate and of no use to the end user--a bad combination.

This list above is not to say that these problems are not solvable, but it does raise the question of if such a robust method is even needed when simpler methods provide more practical value.

Comparison of methods
Below is a comparison of the decoding methods along with their relative strengths and weaknesses:

COMP & DD SSS PAN ePAN Bayesian
1) Reported metrics P,N P,NP, A, N P, A, N
normalized weights
p(pos), Beta distribution,
full error distribution
2) Accuracy Low Medium Medium/High High High
3) Ease of interpretation High High High Medium Low
4) How a positive is reported P P P,A P,A
weights
p(pos),
hyperparameters
5) Adaptability for end user LowLow Medium High High
6) Ease of computation High High High MediumLow
7) Uncertainty estimate No No YesYes Yes
8) Account for assay errors No No No Yes Yes
9) Accounts for prevalence NoNo No No Yes
10) Error detection No No Yes YesYes

More notes:

Some more interesting references: https://arxiv.org/pdf/1808.01457.pdf On the Optimality of the Kautz-Singleton Construction in Probabilistic Group Testing