Fixed Effects High-Dimensional Profiling Models in Low Information Context

Profiling or evaluation of health care providers, including hospitals or dialysis facilities, involves the application of hierarchical regression models to compare each provider’s performance with respect to a patient outcome, such as unplanned 30-day hospital readmission. This is achieved by comparing a specific provider’s estimate of unplanned readmission rate, adjusted for patient case-mix, to a normative standard, typically defined as an “average” national readmission rate across all providers. Profiling is of national importance in the United States because the Centers for Medicare and Medicaid Services (CMS) policy for payment to providers is dependent on providers’ performance, which is part of a national strategy to improve delivery and quality of patient care. Novel high dimensional fixed effects (FE) models have been proposed for profiling dialysis facilities and are more focused towards inference on the tail of the distribution of provider outcomes, which is well-suited for the objective of identifying sub-standard (“extreme”) performance. However, the extent to which estimation and inference procedures for FE profiling models are effective when the outcome is sparse and/or when there are relatively few patients within a provider, referred to as the “low information” context, have not been examined. This scenario is common in practice when the patient outcome of interest is cause-specific 30-day readmissions, such as 30-day readmission due to infections in patients on dialysis, which is only about ~ 8% compared to the > 30% for all-cause 30-day readmission. Thus, we examine the feasibility and effectiveness of profiling models under the low information context in simulation studies and propose a novel correction method to FE profiling models to better handle sparse outcome data.


INTRODUCTION
Unplanned readmissions following a hospital discharge are a major source of morbidity and mortality risk for patients on dialysis. The burden of hospitalization is particularly high for patients on dialysis, where the latest U.S. national data shows that the frequency of 30-day readmissions is 31.1%, which is more than double the frequency of readmissions seen in older Medicare beneficiaries without kidney disease (United States Renal Data System/USRDS [1]).
Profiling or evaluation of health care providers, such as hospitals, dialysis facilities, and nursing homes among others, serves multiple purposes, including (1) identifying providers with performance below standard by government agencies for regulatory or payment purposes and (2) conveying information and feedback to stakeholders (e.g., the public, patients, providers) regarding the quality of care among providers. The main focus of our work is objective (1), specifically with *Address correspondence to this author at the Department of Medicine, University of California Irvine, 333 City Blvd. West, City Tower, Suite 400, Orange, CA 92868, USA; E-mail: danhvn1@hs.uci.edu respect to the goal of identifying providers whose performances (e.g., 30-day readmission) are exceptionally worse (W) and not different (ND) relative to a reference, such as a national "average" standard. Also related to the inferential process of identifying/flagging providers with 30-day readmission rates W and ND from the national rate, it is of interest to obtain accurate estimates of provider-specific effects and associated quality metrics.
When the outcome, such as 30-day readmission, is not frequent and/or when there are relatively few patients within a provider, referred to as the "low information" context [2], estimation and inference for profiling models are understandably more challenging. This is the situation when the patient outcome of interest is cause-specific 30-day readmissions, such as 30-day readmission due to infections in patients on dialysis, which is only about ~8% compared to greater than 30% for all-cause 30-day readmission. Infectionrelated hospitalizations are serious adverse events that are oftentimes preventable. Hence, it is an important performance indicator that is carefully monitored in dialysis facilities. day unplanned hospital readmission are hierarchical logistic regressions of the form outcome ~ provider effects + patient case-mix effects. Thus, patient outcomes vary across providers due to variation in providers' quality of care (provider-specific effects) and variation in patient-level case-mix effects, which include demographics, comorbidities, and the type of index admission. Because of the nested data structure and the need to stabilize estimation, modeling provider effects as random effects (RE) has been used [2][3][4][5][6][7].
A justification for the use of RE models is that they provide stable provider effect estimates through shrinkage, although several inherent disadvantages have been noted. In particular, RE estimates are biased toward the overall provider average and biased in the presence of confounding between patient risk factors and provider effects [8]. Also, although the overall average error in estimation of provider effects is smaller because mean square error is minimized over the full set of provider effects in the RE approach, fixed effects (FE) estimates have smaller error for outlier 'providers whose effects are exceptionally large or small' [8], which are the providers we wish to identify. Our previous works also have shown that the benefit of stabilization comes at a severe cost in substantially biased provider effects estimation and, perhaps more important, at a substantial reduction in the power to identify W providers [9,10]. Our works and others have used high-dimensional FE models to identify substandard ("extreme") performance, especially for profiling 30-day readmission for dialysis facilities where the outcome is not sparse [3,[8][9][10][11][12][13][14][15]. However, the extent to which FE models are useful in the low information context has not been studied, which is the focus of this work. Thus, we assess the relative performance of the FE model proposed by He et al. [15], including the stability of provider-specific estimates and the ability to identify extreme providers in simulation studies. Briefly, the FE model of He et al. [15] is a high-dimensional parameter model with a unique fixed intercept for each provider and is used in assessing the performance of dialysis facilities [3,8,15]; see also Chen et al. [14] and Estes et al. [11,12] for recent dialysis facility profiling applications. Furthermore, in this work, we also propose and examine the performance of a novel corrected FE model estimation approach geared towards estimation under low information context, where the (uncorrected) FE model estimates of some provider-specific effects may be unreliable.

METHODS: HIGH-DIMENSIONAL FE PROFILING MODELS
We introduce the FE profiling model using the context of hospital readmission as an illustrative example. Let the binary outcome Y ij equal 1 if patient index discharge j in provider i results in a readmission within 30 days, for patient discharge j = 1, 2,…, N i in provider (dialysis facility) i = 1, 2,…, F . The FE profiling model (He et al. [15]) is In practice, the process of risk adjustment is complex and depends, in part, on policy objectives and the specific patient population (e.g., general population, dialysis population). However, we point out that it is critical to adequately risk-adjust for patient-level factors and avoid inclusion of variables (e.g., provider-level or patient-level variables) that are/may be related to the process of care (e.g., see [2,3,13]).
To avoid confusion, we emphasize that the model shown in (1) is not a collection of individual models (i.e., not a separate model for each provider), but rather a single model with high-dimensional parameters and requires simultaneous estimation for thousands of provider-specific effects/parameters ( {! i } i=1 F and ! ). For example, for profiling dialysis facilities the dimension of ! = (! 1 ,…,! F ) T is > 6,000 dialysis facilities across the U.S., and the dimension of ! is ~ 40. Standard estimation (e.g., maximum likelihood) and software fails; thus, He et al. (2013) proposed a feasible estimation method based on an alternating one-step Newton-Raphson that iterates between estimation of ! and ! i .
The summary performance index for each provider which incorporates patient-level risk factors ( Z 's) used in practice is the standardized readmission ratio (SRR). For FE model (1), given the provider and the patient case-mix effect estimates, denoted by ! i and ! , respectively, the estimated SRR for provider i is where p ij = g !1 (" i +# T Z ij ) is the estimated probability of readmission for patient j in provider i and

ESTIMATION AND INFERENCE PROCEDURES
In addition to the challenge of high-dimensional parameters, compounding difficulties are encountered in the low information context where the outcome is sparse, resulting in providers with few readmissions or even no readmission. For very small providers with few patients, there is very low information to assess performance. In extreme cases of providers with no or very low readmission, the FE estimation method [15] leads to unstable estimates for those providers. Thus, in the low information context, we propose a correction to the FE estimates for provider-specific effects.

FE Model Estimation
To describe our proposed FE corrected estimation for provider-specific effects, we first set the notation for the likelihood of the FE model (1) and briefly summarize the alternating Newton-Raphson algorithm proposed by He et al. [15]. For the FE model (1), Because direct maximization of (3) is not feasible with standard methods when F is large (e.g., F ! 6, 000 ), He et al. (2013) proposed an effective iterative algorithm that alternates between estimation of ! i given ! and estimation of ! given ! i using onestep Newton-Raphson updates. More precisely, estimation of the high-dimensional parameters ( !, " ) are feasible since the likelihood (3) can be written as Thus, given ! , ! i can be estimated via a Newton-Raphson procedure that depends only on one variable in the maximization of L i (! i , ") . Briefly, the estimation procedure proposed by He et al. (2013) is as follows.
The (m +1) th maximization step for ! , given (iv) The above steps are repeated until convergence, defined by and ! is some prespecified tolerance level. Denote these final uncorrected provider-specific estimates as . Programs in R, sample data, and tutorial are provided in the online Supplementary Materials. In our implementation, we choose ! (0) = 0

Corrected Estimation of Provider Effects
As described earlier, estimation of provider effects, i ! , for the FE model can be unstable for some providers in the low information context. Thus, we consider an approach to "correct" or stabilize FE estimates. We adapt the Firth correction in (standard) logistic regression [16,17] to the high-dimensional FE model (1). Recall that for the standard (non-hierarchical data) logistic regression model with N independent units, Firth's modified score equations [16] for estimation to reduce small sample bias is This is equivalent to using a penalized likelihood L * (! ) = L(! ) | I(! ) | "1/2 [17], where the penalty term | I(! ) | "1/2 is equivalent to Jeffreys' prior [18]. Applying this to logistic regression yields the modified estimation For binary outcome with small sample size, Firth's logistic regression has become a standard approach to reduce bias in the estimated regression coefficients.
We adapt this penalized estimation to the highdimensional FE model (1) to correct for unstable estimation of ! i for providers with low information. We first note that ! can be precisely estimated because it is based on data from all providers; therefore, penalization on patient-level risk factors is unnecessary. Direct application of the Firth's modified score to penalize ! = (! 1 ,…,! F ) is not feasible for FE profiling model (1) due to the challenge of calculating the score penalties. These are obtained via the diagonals of the N ! N hat matrix, which in dialysis population applications are in the order of N ! 500, 000 or larger. The size of N is many orders of magnitude larger for profiling applications in the general population. However, estimating ! with Firth's correction, for a fixed ! , is equivalent to sequentially estimating ! i individually, for a fixed ! , using Firth's correction. This is seen as follows. For a fixed ! , the hat matrix used in the estimation of ! with Firth's correction for a fixed ! , the estimation of ! using Firth's correction can be reduced to a sequence of estimations of a single parameter ! i by penalizing the score U i , using the weights h ij = w ij / j=1 N i ! w ij . More specifically, the provider-specific penalized score equations are We propose a simple correction to adjust the estimates from Section 3.1 of provider-specific effects, ! i 's, using the modified score U I * . More precisely, first, ! is fixed at the estimate resulting from Section 3.1, namely ! U . The provider effects ! i 's are then reestimated using the estimation procedure outlined in 3.1 with the following modifications. In Step (i), Note that when ! (0) is set to the zero vector, the initial value of ! (0) reduces to value previously noted in Step (i) in Section 3.1. In Step (ii), ! (m+1) is set equal to ! (m ) . In other words, ! is no longer estimated. Finally, the score in Step (iii) is modified by replacing U i with U i * .

Inference: Identifying Extreme Providers
In profiling, one of the main interests is to identify/flag providers that significantly deviate from the national norm (e.g., national average). The current public policy in the U.S. penalizes providers that perform significantly W than the national standard (SRR > 1 ). Thus, in practice, the goal is to flag/identify providers as W or ND from the national standard (SRR not different than 1). Better (B) providers (SRR < 1 ) are not penalized nor incentivized.
First, note that for a provider with an adjusted event rate that does not differ from the national norm, ! i = ! M , which implies SRR i = 1. When SRR i > 1 or SRR i < 1 , the event rate for provider i is greater than or less than the national norm, respectively. Thus, testing the null hypothesis H 0 : Simultaneously testing the null hypothesis for thousands of providers is computationally expensive. However, one can take advantage of the fact that ! and ! M can be estimated based on the large data from all providers. Hence, these parameters are estimated and fixed throughout the proposed algorithm below which is based on resampling responses under the null hypothesis. Since the global parameters ! and ! M are fixed, model fitting to the resampled data only requires estimation of provider-level effects ! i . This reduces the computational burden substantially since each ! i is estimated using only data from each provider separately. The steps of the procedure for each provider i are as follows.
estimation of ! i b only involves steps (iii)-(iv) in Section 3.1 for the uncorrected FE model since ! is fixed. For the correction method, the estimation proceeds as described earlier in Section 3.2; that is, the corrected estimation algorithm is applied to the b th dataset to obtain p ij b .
(3) A nominal two-sided p -value for the i th provider, P i , is calculated as , where T i O is calculated based on the original/observed data and I(A) denotes the indicator function for event A.

SIMULATION STUDY DESIGN
We designed simulation studies to assess the performance of the uncorrected and corrected FE model estimation methods, mainly with respect to (A) estimation of provider-specific effects, ! i 's and SRR i 's; and (B) identification of extreme providers relative to a reference. Data were generated from the model For the provider effects, {! i } i=1 F , 2.5% were underperformers (W: "worse") and 2.5% were overperformers (B: "better") whose effects, ! i 's, were equally spaced in the intervals [0.4,1.0] and [!1.0, !0.4] , respectively. The remaining 95% of providers, with effects not different (ND) from the reference, were generated from a N(0,! 2 ) distribution with ! 2 = 0.2 2 . Note that a constant ! 0 has been added to simulation model (4) to conveniently control the baseline rate of readmission (outcome data sparsity), where baseline rates of readmission considered were 20%, 10%, 5%, and 3% corresponding to ! 0 = log(1 / 13.5) , log(1 / 33) , log(1 / 73) , and log(1 / 126) , respectively. This setup conveniently regulates the level of outcome data sparsity. For each baseline readmission rate setting, 200 datasets were generated and the estimation (Section 3) and inference procedure (Section 3.3) was applied to each simulated dataset.
The provider volume of each generated dataset range from a minimum of 48 to a maximum of 195 patients on average, similar to real USRDS data in applications (e.g., see [14]). More specifically, the number of patients were generated from a truncated Poisson distribution following He et al. (2013), where the number of patients was taken to be (15) . This process mimics the sparse data structure of dialysis facility (provider) i in practice.

Estimation of Provider-Specific Effects and SRRs
The results for provider-specific estimates of ! i 's for the 125 (2.5%) under-performers ( ! i > 0 ) and 125 over-performers ( ! i < 0 ) for the case of 3% overall event rate (most sparse) are provided in Figure 1A where averages of ! i estimates over 200 simulated data sets are plotted. As expected, under this extremely low information context, provider effect estimates are unstable for the uncorrected FE method.
However, note that these providers are mainly the over-performers ( ! i < 0 ) with low or zero events ( j ! y ij 's are small) leading to "explosion" of the estimates (Figure 1A). It is important to note that these unstable estimates are in the direction of the true effect (negative direction for negative ! i 's, where ! i " #$ ). Also as expected, estimates for under-performers ( ! i > 0 ) are less unstable and more on target for the uncorrected FE method. The corrected estimation approach, which adapts the Firth's modified score equation for the FE model, largely eliminates the instability and estimates are on more target for the true ! i 's ( Figure 1B). Figure 2 (left column) shows estimates of ! i 's for increasing percentage of overall events, from 3% to 20% for the uncorrected FE method. Clearly, the frequency of unstable estimates for ! i < 0 decreased with increasing overall events, although unstable estimates are apparent even at a 10% event rate. However, the magnitude of the unstable estimates declined quickly (! i < 0 ) as the overall event rate increased (e.g., at 20%).
Next, we summarize results for estimation of the provider-specific SRRs. As describe in Section 2, SRR is the summary performance index for each provider used in practice which incorporates patient-level risk factors Z ij and their estimated effects, ! . More specifically, given the provider and the patient case-mix effect estimates for each approach, denoted by ! i * and ! , respectively, the estimated SRR for provider i is where p ij * = g !1 (" i * +# T Z ij ),p M ,ij * = g !1 (" M * +# T Z ij ) , * and denotes the uncorrected and corrected approach, namely U and C . Figure 3 (left column) summarizes the uncorrected FE model estimates of SRR for 3% to 20% overall outcome event. We note that even though specific ! i < 0 were unstable for highly sparse data (e.g., at 3% -10%; Figure 2), corresponding estimates of SRR's are stable overall and targets the true SRR, because SRR incorporates patients characteristics, their effects, as well as provider-specific effects as shown in (5); see Figure 3 (left column). Average SRR estimates for the corrected estimation performed well and are summarized in Figure 3 (right column). However, we note that for extremely sparse data (e.g., at 3%), the uncorrected approach slightly overestimate SRRs while the corrected approach slightly underestimate SRR for truly worse providers (true SRR > 1 ; Figure 4 -top). For truly better providers (true SRR < 1 ), both methods slightly over estimate the true SRRs, although more so with the corrected method. Differences in SRR estimates between the two methods are neglible as the overall percent of events increases (e.g., at 20%; Figure 4 -bottom).

Flagging Extreme Providers/Facilities
The overall performance of the uncorrected and corrected FE methods to identify extreme providers are assessed in terms of sensitivity (SEN) to correctly identify providers that under-perform (W: "worse"), over-perform (B: "better") relative to the reference standard (e.g., national reference), and specificity.

Specificity
(SPEC) refers to the correct identification/flagging of providers whose performances are not different from the reference standard (ND: "not different"). We note that provider assessment policies in practice focus on identifying under-performing providers (W providers) as those are tied to payment policy or regulatory goals. Figure 5 summarizes the distribution of SEN-W, SEN-B, and SPEC for varying levels of outcome sparsity, ranging from 3% to 20% overall outcome rate. For extremely sparse data of 3% and 5%, the uncorrected method has highest sensitivity to detect under-performing providers (higher SEN-W; left column). This is expected since the for truly worse providers, there are more outcome events ( j ! y ij ); see Figure 5 (left column). SEN-W rates were similar between uncorrected and corrected methods at 20% overall overall outcome rate.
Because the event counts are zero or low for truly better providers in the context of sparse outcome data, the unstable/poor estimation of provider effects from the uncorrected method results in lower sensitivity to detect over-performing providers (lower SEN-B) compared to the corrected method ( Figure 5 -middle column). However, note that the nominal SEN-B rates are low overall, as expected, compared to nominal SEN-W rates. This is expected in the low information context since B providers would have fewer readmissions, making it difficult to correctly identify B providers when the outcome is sparse. SPEC rates were high and similar between uncorrected and corrected methods (Figure 5 -right column).
As mentioned earlier, the main current objective of flagging "extreme" providers in profiling analysis focuses on identifying W providers and ND providers. Overall performance of the uncorrected and corrected estimation methods to identify truly worse (sensitivity -worse), truly better (sensitivity -better), and specificity (providers not different from the reference) across data sparsity of 3%, 5%, 10%, and 20% overall outcome event rate. Displayed is average for each SRR i estimate, averaged over 200 simulated data sets.
Providers that over-perform (B providers) are not relevant to current payment policy or regulatory objectives. Therefore, under this regime, it is of interest to ensure that there are no (or low rate of) false negatives that misclassify/flag B provider as W provider ( FN B!W ). Indeed, there are none, i.e., FN B!W = 0 across all levels of data sparsity (Figure 6), which is not surprising since W and B providers are on the opposite tails of the distribution of providers. This is true with the uncorrected FE model (as well as the corrected estimation method) since the direction of unstable estimates of ! i 's are in the same (negative) direction of true ! i (as pointed out earlier), despite the unstable provider-specific estimates. However, it is not uncommon for false negative classification of a B provider as a ND provider ( FN B!ND ). Although FN B!ND deceases with increasing percentage of overall outcome events as expected, FN B!ND is common for the extremely low information context (e.g., 3%, 5% overall event rate; Figure 6). We emphasize that high FN B!ND does not affect current public policy because over-performers are not incentivized and are consider "ND" providers anyway. Therefore, the FE profiling model, even uncorrected, is still useful in the low information context with respect to the current public policy goal of identifying W and ND providers. However, if the public policy goal evolves to also incentivize for better performance, then novel methods able to correctly identify B providers with high sensitivity are needed.

DISCUSSION
Seminal works by Kalbfleisch and Wolfe [8] and He et al. [15] show that FE model estimates have smaller error for outlier providers whose effects are exceptionally large or small, and these extreme providers are precisely the ones we wish to identify in profiling analysis. The high-dimensional FE models were then used to assess the performance of dialysis facilities (providers) with respect to all-cause hospital readmissions which are frequent outcomes in dialysis patients. Subsequently, our own works have elucidated several operating characteristics [9,10] of the FE profiling models and have been applied to assess the performance of dialysis facilities with respect to allcause 30-day readmissions [11,12,14]. However, to date there is no work that examines the performance of FE models in the low information context where the outcome is sparse. The current study starts to fill this gap in knowledge. Several findings from this study have important practical impact in the low information context. First, even though the provider-specific estimates with true ! i < 0 (truly B providers) are unstable, they are in the same direction as the true effects and the instability has moderated effects on the estimation of SRRs; i.e., SRRs are reasonably wellestimated and are the relevant quantities used in practice as they incorporate patient case-mix. However, if the provider-specific estimates, ! i 's, are themselves of interest, then our proposed correction method can be used to provide better estimates, especially corresponding to uncorrected ! i that are substantially less than zero. Second, the consequence of sparse outcome data impacts more directly inference for B providers because true over-performers are the ones that contribute no or few events (readmissions); however, this "deficit" in estimation does not greatly impact the identification of W providers/underperformers and ND providers, which is the current focus of profiling in practice. Development of novel methods that have better sensitivity for flagging B providers would be useful when public policies or regulatory goals incorporate an incentive for overperformers.

ACKNOWLEDGEMENT
This study was supported by a National Institute of Diabetes and Digestive and Kidney Diseases grant R01DK092232. The interpretation/reporting of the results presented are the responsibility of the authors and in no way should be seen as an official policy or interpretation of the U.S. government.  We describe details of calculating the penalties for the Firth's modified score equations, adapted for the highdimensional FE profiling model in Section 3.2. For a fixed ! , the hat matrix used in the estimation of ! with Firth's correction

Dependence Structure of Covariates in Simulation Model
The correlation matrix and the standard deviation of the patient case-mix variables, Z ij1 ,…, Z ij15 , are summarized in Table 1.

Online Supplementary Materials: Analysis Example and R Codes
Example dataset, R codes, and tutorial for fitting the uncorrected and corrected models are publicly available at https://sites.google.com/view/usrds-modeling/software.