Full Record

Author | Higgins, Michael |

Title | Applications of Integer Programming Methods to Solve Statistical Problems |

URL | http://www.escholarship.org/uc/item/1g55c77q |

Publication Date | 2013 |

Discipline/Department | Statistics |

University/Publisher | University of California – Berkeley |

Abstract | Many problems in statistics are inherently discrete. When one of these problems also contains an optimization component, integer programming may be used to facilitate a solution to the statistical problem. We use integer programming techniques to help solve problems in the following areas: optimal blocking of a randomized controlled experiment with several treatment categories and statistical auditing using stratified random samples.We develop a new method for blocking in randomized experiments that works for an arbitrary number of treatments. We analyze the following problem: given a threshold for the minimum number of units to be contained in a block, and given a distance measure between any two units in the finite population, block the units so that the maximum distance between any two units within a block is minimized. This blocking criterion can minimize covariate imbalance, which is a common goal in experimental design. Finding an optimal blocking is an NP-hard problem. However, using ideas from graph theory, we provide the first polynomial time approximately optimal blocking algorithm for when there are more than two treatment categories. In the case of just two such categories, our approach is more efficient than existing methods. We derive the variances of estimators for sample average treatment effects under the Neyman-Rubin potential outcomes model for arbitrary blocking assignments and an arbitrary number of treatments. In addition, statistical election audits can be used to collect evidence that the set of winners (the outcome) of an election according to the machine count is correct – that it agrees with the outcome that a full hand count of the audit trail would show. The strength of evidence is measured by the <italic>p<\italic>-value of the hypothesis that the machine outcome is wrong. Smaller <italic>p<\italic>-values are stronger evidence that the outcome is correct. Most states that have election audits of any kind require audit samples stratified by county for contests that cross county lines. Previous work on <italic>p<\italic>-values for stratified samples based on the largest weighted overstatement of the margin used upper bounds that can be quite weak. Sharper $p$-values than those found by previous work can be found by solving a 0-1 knapsack problem. We also give algorithms for choosing how many batches to draw from each stratum to reduce the counting burden. |

Subjects/Keywords | Statistics; Blocking; Causal inference; Election auditing; Exact inference; Experimental design; Integer programming |

Language | en |

Rights | public |

Country of Publication | us |

Format | application/pdf |

Record ID | california:qt1g55c77q |

Other Identifiers | qt1g55c77q |

Repository | california |

Date Indexed | 2018-02-26 |

Sample Search Hits | Sample Images

…larger than the one observed
under an *exact* null hypothesis. Frequently in causal *inference*, subjects in one treatment group
are matched to one or more subjects in another treatment group based on the units’ propensity
scores or observed values of…

…between units. Note that, for these twelve observations, the
approximate algorithm yields the desired optimal blocking in figure 2.2. In general,
this will not be the case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
*Exact* and…

…*exact* p-values
(P# ). Both are substantially smaller than bounds using the method in Stark (2008b). . 38
Number of batches to audit and expected number of ballots to audit to get p-values
no larger than 0.05 for 2006 Minnesota Senate…

…46
vi
List of Tables
3.1
Conservative and *exact* p-values for the hypothesis that the apparent outcome of the
2006 U.S. Senate race in Minnesota was wrong, based on Minnesota’s audit of a stratified random sample of 202 precincts. Values are given…

…for two test statistics: maximum
MRO and maximum taint. Column 2: conservative p-value using the method of Stark
(2008b). Column 3: LKP conservative p-value. Column 4: *exact* p-value obtained by
solving KP…

…37
3.2
Statutory, PSS, first.r, and next.r sample sizes for the 2006 Minnesota Senate
contest. Number of batches to audit and expected number of ballots to audit to obtain
p-values no larger than the *exact* p-values in Table 3.1 (0.0159 for…

…efficiently assign a finite number of units to pre-selected treatment categories. Nonparametric
*inference* problems may involve computing ranks and counting the number of permutations of
treatment assignments that would produce a test statistic as large or…

…The core of the paper presents an *exact* substantive test of details when units are selected
using a simple random sample. A substantive test of details is a type of audit that tests whether
the aggregate discrepancy in a set of audited units is…