<<

Papers presented at the ICES-III, June 18-21, 2007, Montreal, Quebec, Canada

The Hybrid Ratio Estimator

Brian Simonson, M.S. The Lewin Group

Abstract the total projected expenditures across these subsets add up to the known total universe expenditures. Creating a The Centers for Medicare and Medicaid Services (CMS) ratio estimator that rescales projections to universe totals effort to estimate improper payments in Medicare and is an intuitive solution to this problem. The Medicaid uncovered shortcomings with the two groundbreaking work with the HRE is the closed form predominant ratio estimators: the separate ratio estimator expression for the . (SRE) and the combined ratio estimator (CRE). The SRE allows the rates by strata to be projected to a known 1.1 The Evolution of the HRE universe total for the denominator. Unfortunately, it does not allow for the analysis of subsets of the data, known Ratio estimators are highly utilized in the many areas of as domain level analysis. In other words, one can only statistical research. Ratio estimators are grounded in the produce concerning the whole universe, but in theory of survey , and have been discussed in applied settings disaggregated subsets of the universe depth by many survey sampling statisticians. At its’ root, may be interesting for policy and other reasons. The there are two types of ratio estimators, those that utilize CRE, however, allows for domain level analysis, but universe total information available for the denominator does not utilize population total information. It follows and those that do not. that all denominator totals are projected from sampling weights, and these estimated totals may not equal the When constructing ratio estimators in , known universe totals. This difference leads to confusion the have been two predominant methods over the years: and may cast doubt on the usefulness of results, the SRE and the CRE. The SRE combines strata by particularly by the non-statistician.. The analysis of large projecting the total numerator by the product of the ratio programs such as Medicare and Medicaid requires a estimate and the known population denominator for the method for which all dollars are accounted for. One must stratum. Note that these totals need not be known only at satisfy the accountant as well as the statistician. In the strata level, but most audiences are most familiar addition, domain level analysis is crucial for this or any with this construct. An overall error rate could then be investigative project. Thus, the Hybrid Ratio Estimator attained by dividing by the universe total for the (HRE) was formulated, allowing for domain level denominator. An expression for SRE is given below: analysis while also projecting to the universe using ⎛ nk ⎞ known universe totals where available. ⎜ ⎟ ∑ xkj Keywords: Ratio Estimator, Domain, Medicare a ⎜ ⎟ j 1 ∑t ⎜ = ⎟ yk 1. Introduction ⎜ nk ⎟ a k =1 ⎜ ⎟ ∑ ykj ∑t Rˆ The Hybrid Ratio Estimator (HRE) is a ratio estimator ⎜ ⎟ yk k ⎝ j =1 ⎠ that allows for analysis of subsets of the universe, known ˆ k =1 (1) RSRE = = as domains, while rescaling projections to known t y t y universe totals. The two predominant ratio estimators, the CRE and the SRE, do not possess the capabilities of where the HRE. The CRE allows for domain estimation, but k denotes the strata (k=1 to a) projections are not rescaled to universe totals. The SRE n = number of sampled units from stratum k uses the universe total as the denominator, but is k t = known population total for variable y in stratum k incapable of domain level analysis. The Centers for yk Medicare and Medicaid Services (CMS) effort to t = known population total for variable y estimate improper payments in Medicare and Medicaid y uncovered these shortcomings. These large statistical xk = value of x for sampled unit j in stratum k audits received great attention from both statistical yk = value of y for sampled unit j in stratum k auditors and politicians. The study required an estimator that could both analyze subsets of the universe and have

754 Papers presented at the ICES-III, June 18-21, 2007, Montreal, Quebec, Canada

The CRE combined information across strata by simply we used a stratified sampling design to sample 300 projecting the numerator and denominator strictly using claims from the plant, sampling 10 claims each day for a sampling weights. The form for this estimator for a 30 day month (thus day is a stratum). Furthermore, we stratified random sample (STSI) is shown below. Note know that this contractor had $2.3 million in the superscripts d that represent a domain of interest expenditures for this month. Consider the following results from the sample and estimation using the CRE. nd a N k ∑∑k x ˆ kj d t x k ==11nk j Equipment Error Projected Projected True (2) Rˆ = f (tˆ ,tˆ ) = = Type Rate Payments in Expenditures Expenditures CRE x y nd ˆ a k Errors (in t y N k ∑∑ykj thousands) k==11nk j Wheelchairs .100 100 1,000 N/A where Canes .150 150 1,000 N/A Overall .125 250 2,000 2,300

This example can cause major confusion among a N = total number of units in the universe of stratum k k general audience. In our example, the total expenditures d nk =number of sampled units from stratum k in domain are known to be $2.3 million, but the CRE uses sampling d weights to project payments to be $2 million. Most observers would first wonder why the total projected Typically, academic statistical discussions that compare expenditures did not match the true universe totals. What the SRE and CRE revolve around issues of precision. could be done to circumvent this criticism? Well, one There is no clear cut winner since the variance depends might suggest using the universe total project the overall on the correlation of the numerator and the denominator, rate of failure to the universe. Of course, this would just the overall rate and the rates in within strata. While this cause greater confusion because the projected discussion is interesting, there are more important expenditures by equipment type would still not sum to practical considerations to consider. First, a random the overall expenditures. sample, no matter how highly stratified, will vary in its makeup from the true population. The SRE ensures that There exists research on estimators that utilize various the sample is projected to known population totals. known universe totals, but this research mostly lays in Furthermore, the SRE results in a ratio estimator whose the realm of regression estimation techniques. Now denominator is the universe population total, a very while it has been shown that the ratio estimator is a important quality, especially, if being used in areas of specific form of the regression estimator, this again is research where these totals are important, such as finance difficult to convey to the public, making regression and accounting. There is one major pitfall of the SRE. estimation a limited tool in applied research. There is a The SRE does not allow for ratio estimation of subsets of need for a specific estimator that can be explicitly the universe, typically referred to as domains or super derived and intuitively understood. This estimator populations. The only domains that could be estimated should be able to estimate domains and utilize universe would be the ones for which known universe information in any form that is available. What we now denominator totals were known. present is the HRE, which is essentially a mixture of the SRE and the CRE, and its’ solution for variance Now the CRE does allow for the estimation of domains, estimation is made possible through a particularly nifty but typically does not have an apparatus for utilizing observation that the estimator is a special form of a universe level information. Consequently, the double ratio estimator. The key to deriving the variance denominator for the CRE is projected, and hence not of the HRE lies in the fact it is a ratio of two separate equal to the true universe total. This can be tough to ratio estimators, where both separate ratio estimators discuss in real world settings. One could argue that the have a common denominator. population total for the denominator could be applied at the end of the CRE method, essentially applying the rate 1.2 The Estimator to the universe. While this is true, this method breaks down if domain level analysis is performed. The HRE is presented below for a given domain d. As one can see, the HRE is a ratio of two SREs. For example, suppose we estimated the percentage of the total expenditures for medical equipment from a Medicare contractor that were improperly paid. Suppose only two types of equipment were processed through this contractor: wheelchairs and canes. Suppose as well that

755 Papers presented at the ICES-III, June 18-21, 2007, Montreal, Quebec, Canada

ˆdi ˆ d ˆ d tx *i (5)Var(RHybrid )= Var(R ) *di ∑ t y *d ∑tˆx i tˆ i tˆy 1 *d d *d ˆ d x i = Var tˆ − R tˆ (3) R = = = 2 ()x y Hybrid ˆ*d ∑ ˆ*di ˆdi *d t y t y t y *i ()t y i ∑ t y i tˆi ⎛ ⎞ y 1 ⎜ ˆ*di d ˆ*di ⎟ = Var⎜∑tx − R ∑t y ⎟ *d 2 ⎝ ⎠ ()t y i i where subscript i denotes the subsets of universe for which universe total information is known. In a sense, 1 *di d *di = ∑Var tˆx − R tˆy these subsets represent a second domain for which 2 () *d i population totals are known. Note that the only ()ty restriction for these totals is that the sum across all i ⎛ ⎛ ˆdi d ˆdi ⎞⎞ domains must be the overall universe total. Specifically, 1 ⎜ *i ⎜ tx − R t y ⎟⎟ = ∑Var⎜t y ⎜ ⎟⎟ *d 2 ⎜ tˆi ⎟ ()t y i ⎝ ⎝ y ⎠⎠ *i * (4) ∑t y = t 1 *i i y = ∑Var t ξ i 2 ()y *d t y i While these totals are treated statistically as domains, the () i fact that they are subjected to the constraint in (4) and the Where ξˆ is a ratio of two random variables, thus is a confusion of discussing two domains, we will refer to ratio estimator. Again utilizing the expression for the these domains as partitions1. This is reference in variance of a ratio estimator, one can follow that: intuitive since the totals are provided by any given partitioning of the universe. Finally, also note the use of superscript *. This notation refers to quantities that are ˆ d 1 *i i (6)Var()RHybrid = ∑Var()t y ξ benchmarked, or projected, to the universe. This *d 2 becomes integral later on when differentiating between ()t y i standard projection using sampling frequencies and t*i2 projecting that have been rescaled with respect to 1 y ˆdi d ˆdi i ˆi population totals. = ∑ Var()tx − R t y − ξ t y *d 2 tˆi2 ()t y i y 1.3 The Variance Derivation At this point, we should pause to note what has been One of the reasons the HRE is practical is its’ variance accomplished. The HRE is a ratio of two SREs. The has an explicit solution that can be directly solved for by realization that both SREs share a common denominator observing that it is a special case of a double ratio leads to the use of definition for the Taylor estimator. We will now step through the derivation of approximation twice. This would not be the case for any HRE variance. double ratio estimator in the form of a/b/c/d, but this estimator was essentially in the form a/b/c/b. Although it First, applying the Taylor approximation method for may not seem like much, this slight of hand manoeuvre determining the variance of a ratio estimator and then allows for a concise solution for the variance of the HRE. simplifying the equation 1.4 Conclusions

In this paper the hybrid ratio estimator, a concise ratio estimator that allowed for domain level analysis as well as projections to population totals was presented. The estimator is essentially a domain specific ratio of two separate ratio estimators, and its’ variance was derived using a novel approach of applying the formula for the 1 This treatment of known totals as domains exists in the variance of a ratio estimator twice. The HRE is currently general SRE as well. The topic is rarely discussed, employed in two major audits of Medicare and Medicaid, however. and its’ practicality has saved many hours of discussion

756 Papers presented at the ICES-III, June 18-21, 2007, Montreal, Quebec, Canada

with accountants in the bookkeeping of government program expenses. Clearly not all projects care so deeply about projected totals combining to equal universe totals, and in such projects the combined ratio estimator would suffice. However, even in projects where population totals are not politically important issues, there still is a use in benchmarking the projections to the universe. Often, the sample may not represent the universe well. Assuming the ratio estimated with the sample is still a sound approximation, its contribution to the overall ratio can be rescaled to accurately reflect the universe. For a more detailed discussion of the HRE, you can download the full report at http://www.geocities.com/brianernstsimonson .

757