This same question of population prevalence appears often in the ecology and quality control literature [REFS]. In both cases the important value is not the individual infection / defect status, but instead the more general rate.

Two questions that arise in pooling for prevalence estimation:

- How big should the pool be?
- How many samples are needed?

We note that in practice, the pool size may be dictated by experimental requirements such as limits on the number of samples that can be physically pooled. The following discussion gives guidance on where this pool size should be ideally chosen if there are no experimental restrictions.

The key determinant of the optimal pool size is an estimate of the underlying prevalence. This may seem circular in that we are using the pooled results to estimate the prevalence, while at the same time using a prevalence estimate to decide the pool size. Fortunately, in most cases we already have a rough idea of the prevalence, and the results are fairly tolerant of different pool sizes.

The table below shows the simulation results for prevalence studies using a variety of underlying prevalence values and pool sizes. Each entry in the table is the median confidence interval width--a measure of the uncertainty in the estimation. The simulation is based on 10 tests, randomly sampled 1000 times each to gather statistics.

pool | Prevalence | |||||
---|---|---|---|---|---|---|

size | 0.001 | 0.01 | 0.02 | 0.05 | 0.1 | 0.2 |

100 | 5.0 x 10^{-3} |
1.6 x 10^{-2} |
9.2 x 10^{-1} |
9.2 x 10^{-1} |
9.2 x 10^{-1} |
9.2 x 10^{-1} |

50 | 6.0 x 10^{-3} |
2.1 x 10^{-2} |
3.2 x 10^{-2} |
9.0 x 10^{-1} |
9.0 x 10^{-1} |
9.0 x 10^{-1} |

10 | 2.9 x 10^{-2} |
4.8 x 10^{-2} |
6.5 x 10^{-2} |
1.2 x 10^{-1} |
1.7 x 10^{-1} |
5.3 x 10^{-1} |

7 | 4.1 x 10^{-2} |
4.1 x 10^{-2} |
6.8 x 10^{-2} |
1.1 x 10^{-1} |
1.6 x 10^{-1} |
2.9 x 10^{-1} |

5 | 5.7 x 10^{-2} |
5.7 x 10^{-2} |
9.3 x 10^{-2} |
1.2 x 10^{-1} |
1.8 x 10^{-1} |
2.9 x 10^{-1} |

3 | 9.2 x 10^{-2} |
9.2 x 10^{-2} |
9.2 x 10^{-2} |
1.5 x 10^{-1} |
2.4 x 10^{-1} |
3.1 x 10^{-1} |

1 | 2.4 x 10^{-1} |
2.4 x 10^{-1} |
2.4 x 10^{-1} |
2.4 x 10^{-1} |
3.6 x 10^{-1} |
4.4 x 10^{-1} |

Based on these simulations, the best pool size (s) can be estimated by the prevalence (pr) as:

\[ s=\frac{1}{pr}\] This relationship roughly holds across most prevalence values when there is little assay error. As assay error increases, the optimal pool size tends to decrease.

\[ CI=\frac{c}{\sqrt{n}}\]

Here c is a constant defined empirically by the confidence intervals listed in the table above.

As an example, consider the case of an estimated prevalence of 10% (p=0.10), where we want to know the actual prevalence to within 1%. Looking at the table above, we see that the optimal pool size is 7, and with 10 assays would yield a median confidence interval of 0.16 (16%) (note that if we selected a pool size of 10 or 5 the results would be similar). If we want to reduce this error down by a factor of 16x to achieve our 1% error, we will need 16

Next, lets compare the pooling case to the single assay case. If we have the same case of an estimated prevalence of 10% (p=0.10), where we want to know the actual prevalence to within 1%. In this case, if we start with a pool size of 1, we find the median confidence interval of 0.36 (36%) for 10 tests. To reduce this down to 1%, we will need 36

This example demonstrates that we can reduce our expected number of assays down by a factor of approximately 5 simply by choosing the appropriate pooling size.

loading...

This calculator works by using a Bayesian approach to estimate the probability of the data given a prevalence value. Using a fine grid of prevalence values, we can then empirically construct a posterior probability density for all prevalence values between 0 and 1.0.

Posterior Estimation:

We assume each pool is independent, so the probability of any data configuration given a prevalence (pr). If we divide the space of prevalence values into m even segments, we can calculate the probability of any one segment i as:

\[p(pr_i|data)=\frac{ \prod{p(data| pr_i)} }{ \sum_{j=0}^m \prod{p(data| pr_j)}}\]

for each pool size (s) there will be a count of positives (n

\[p_-=(1.0-pr)^{s}\] \[p_+=1.0-p_-\] \[p(s|pr)=(p_-(n_{s-}))(p_+(n_{s+}))\]

https://arxiv.org/pdf/1612.07122.pdf See page 3, section under main results for discussion of sampling as v=ln(2) v=L*K/T (L=pool size, k=num positives, t=number of tests, v = prob) It is also interesting to note that while ν = ln 2 (which is ’maximally informative’ in the sense of maximising the entropy of the test outcome) optimises the rate of COMP (as well as DD below) for the near-constant column weight design, COMP [23] and DD [21] with Bernoulli designs are optimised with a fraction 1 − e^(−1) ≈ 0.632 of positive tests --- https://www.researchgate.net/publication/223968436_Estimating_the_prevalence_of_infections_in_vector_populations_using_pools_of_samples Pooling in field biology done all the time. -- https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0006513 Pooling flies to detect pathogens -- Essentially using Dorfman designs Pools of 9 flies/ test were used and a prevalence of 0.33% (while I estimate they could use more like 300 to get their data), but sample collection is hard. --- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3449676/ Pooling and PCR as a method to combat low frequency gene targeting in mouse embryonic stem cells Screen 2,300 colonies uisng only 123 PCR reactions (20x compression) Mouse gene targeting in mouse ES are 1%-10% --- Old pooling paper first suggestion? https://www.ncbi.nlm.nih.gov/pmc/articles/PMC338641/ Recombinant fragment assay for gene targetting based on the polymerase chain reaction. --- https://www.ncbi.nlm.nih.gov/pubmed/31747440 Beef testing Determining an optimal pool size for testing beef herds for Johne's disease in Australia. Pool size of 10 was optimal due to experimental constraints. --- https://www.ncbi.nlm.nih.gov/pubmed/9634309 Pooling of urine specimens for PCR testing: a cost saving strategy for Chlamydia trachomatis control programmes. Suggests that pools of 5 samples works, 10 works most of the time, but does see some errors. Given these, the optimal pool size is determined more by the experimental condition than the math. ---- https://www.ncbi.nlm.nih.gov/pubmed/15814983 Utility of pooled urine specimens for detection of Chlamydia trachomatis and Neisseria gonorrhoeae in men attending public sexually transmitted infection clinics in Mumbai, India, by PCR. Screening 690 men, then tested in pools of 5x (total 138 pools), then retested (Dorfman) Prevalence was 15/690 and 37/690 by individual test.