On the Probabilities of Environmental Extremes

: Environmental researchers, as well as epidemiologists, often encounter the problem of determining the probability of exceeding a high threshold of a variable of interest based on observations that are much smaller than the threshold. Moreover, the data available for that task may only be of moderate size. This generic problem is addressed by repeatedly fusing the real data numerous times with synthetic computer-generated samples. The threshold probability of interest is approximated by certain subsequences created by an iterative algorithm that gives precise estimates. The method is illustrated using environmental data including monitoring data of nitrogen dioxide levels in the air.


INTRODUCTION
Environmentalists, as well as epidemiologists, often encounter the following basic problem. Suppose the value of a high threshold is T , but the data values at hand regarding a certain environmental variable are much smaller than T . Moreover, the data size could be moderately large at best. Based on the data, what is the probability p of exceeding T ? For example, if the data values are all less than T / 2 , what is the chance that a future value exceeds T ? This generic problem is dealt with in this paper in terms of nitrogen dioxide emission.
According to the American Lung Association (www.lung.org/clean-air/outdoors/what-makes-airunhealthy/nitrogen-dioxide), nitrogen dioxide ( NO 2 ) is a gaseous pollutant emitted from the burning of fossil fuels at high temperatures primarily by vehicles, followed by power plants, diesel-powered heavy construction equipment and other movable engines. However, most of the NO 2 in ambient air is formed in the atmosphere through photochemical reactions between nitric oxide and other air pollutants. Moreover, NO 2 causes a range of harmful effects on the lungs, including increased inflammation of the airways, worsened cough and wheezing, reduced lung function, increased asthma attacks, and a greater likelihood of emergency department and hospital admissions. When exposed to NO 2 , infants and children, due to their greater breathing rate for their body weight, have a higher risk of respiratory failure. Epidemiological studies have demonstrated associations between NO 2 exposure and premature death, cardiopulmonary effects, intensified allergic responses, and lower birth weight in newborns. It was found that exposure to NO 2 and other outdoor air pollutants shortened the survival of lung cancer patients [1]. The International Agency for Research on Cancer (IARC) classified outdoor air pollution and particulate matter (PM) as carcinogenic (Group 1); and the evidence of a long-term effect of NO 2 on mortality has been found to be as great as that of PM [2]. Consistent evidence of a relationship between NO 2 with lung cancer was noted by a systematic metaanalysis [3]. The U.S. EPA's National Ambient Air Quality Standard (NAAQS) (dep.wv.gov/daq/planning/ NAAQS/Pages/default.aspx) measures NO 2 as an indicator for the NO x family. To protect the public's health from outdoor air pollution, NAAQS set the 1-hour NO 2 standard as 100 parts per billion (ppb) and an annual (arithmetic average) standard of 53 ppb.

Repeated Fusion
Regarding the posed exceedance problem, as they are, the data do not give much information about such questions, and particularly so when the samples are not large. However, the situation differs dramatically if somehow we could only have a "peek" into the domain above the threshold. And if we can do that once, then we can repeat that numerous times henceforth. A way of doing that is by repeated fusion of real and synthetic or artificial data. As we shall see, synthetic data can enhance patterns in real data, a statistical idea highlighted by augmented reality (AR) explored in Kedem, De Oliveira, and Sverchkov (2017, Ch. 5), Kedem et al. (2019), and in Kedem and Pyne (2021) [4][5][6].
Our approach to the estimation of the small exceedance or tail probability p is based on numerous fusions. It runs as follows. Given any method which produces numerous upper bounds B i for p . Say, upper bounds which exceed p with a 95% chance. Then many upper bounds exceed p but many do not. Therefore, there are subsequences of ordered upper bounds which approach p from above and from below. In this paper, the numerous upper bounds are produced by repeated fusion of the data with computergenerated samples, where the number of fusions is arbitrarily large, and where the support of the generated data is large enough so that it ranges beyond T . Hence, if the given and generated data are somehow connected, we then have a way to "peek" into the realm above T .
Repeated fusion of the data with external computergenerated data is referred to as repeated out of sample fusion or ROSF. Unlike the bootstrap, we seek information repeatedly outside the sample.
To describe our approach, we must first review some ideas and then illustrate an iterative method for the estimation of small threshold probabilities. Thus, as such, part of this paper is a review.
The following items are new. Lemmas 0.2 and 0.3 which support Proposition 0.1 are new. If B ( j ) are ordered upper bounds, then Lemma 0.3 predicts when there is a shift in B ( j i ) subsequences converging to p from above and from below. Also new is the fact, not emphasized hitherto, that already by themselves the quartiles and mean of numerous upper bounds for the true threshold probability p provide useful approximations for p , as illustrated in Tables 1,3,5, and 7. This is a fast, albeit crude, way to assess the magnitude of tail probabilities. We will show how to improve these crude assessments. The NO 2 data analysis is new as well. Certain technical details are described in the APPENDIX.

Upper Bounds for p by Data Fusion
Let X be a random variable. The problem is to estimate a small tail probability p = P(X > T ) for a given threshold T from a reference sample X 0 = (X 1 ,..., X n 0 ) , where max( X 0 ) < T . This section follows for the most part Kedem et al. (2019) and Kedem and Pyne (2021) [5,6].
Assume that X 0 is from some unknown reference probability density (pdf) g(x) , x ! (0,") , and let ) (x G denote the corresponding distribution function (CDF). Since max( X 0 ) < T and since g(x) is unknown, there is not much we can say about p . However things are very different when an independent random sample X 1 exists from a distribution with pdf g 1 (x) and CDF G 1 (x) supported over a region stretched beyond T . We shall assume that X 0 and X 1 "talk" to each other via a relationship between their distributions, whence useful information is gained about p . Let X 1 ! g 1 , G 1 be a computer-generated random sample of size n 1 and consider the fusion of X 0 and of size n 0 + n 1 . We shall assume the density ratio model [7] where ! 1 is a scalar parameter, ! j is an r !1 vector parameter, and h(x) is an r !1 vector-valued function. Clearly, to generate X 1 we must know the corresponding g 1 . However, beyond the generating process, we do not make use of this knowledge. Thus, by our estimation procedure, none of the probability densities g, g 1 and the corresponding CDF's G, G 1 , and none of the parameters ! 1 and ! 1 are assumed known, but, strictly speaking, the so called tilt function h must be a known function. However, in the present application the requirement of a known h is weakened considerably by the mild assumption (3) below, which may hold even for misspecified h , as numerous examples with many different tail types show. Accordingly, based on numerous experiments, some of which discussed in Kedem et al. (2019) and Kedem and Pyne (2021) [5,6], for non-negative data we shall assume the "gamma tilt" h(x) = (x, log x) . Further justification for choosing the gamma tilt is provided by the rather precise p -estimates in the first eight entries in Table 10 below. Notice that the "normal tilt" h(x) = (x, x 2 ) is used in the last entry in the table.
Under the density ratio model (2), the maximum likelihood estimate of G(x) based on the fused data t = ( X 0 , X 1 ) is given in (15) in the APPENDIX along with its asymptotic distribution described in Theorem 0.2.
From the theorem we obtain confidence intervals for p = 1 ! G(T ) for any threshold T using (18). In particular we get an upper bound B 1 for p . In the same way, from additional independent computergenerated samples X 2 , X 3 ,..., X N we get additional upper bounds for p . Thus, from the repeated fusions the density ratio (2) assumption produces the sequence of upper bounds B 1 , B 2 ,..., B N which, conditional on X 0 , is then a sequence of independent and identically distributed random variables from some distribution F B .
It is assumed that Since the number of fusions can be as large as we wish, our key idea, F B is known for all practical purposes. Hence, from d, we see that for sufficiently large N , F B provides information about p [8]. Clearly, already by themselves, the quartiles and mean of F B provide useful approximations for p . We shall illustrate this fact below.

"Down-up" Subsequences
For a sufficiently large number of fusions N , we show how to produce subsequences {B ( j i ) } which approach p from above ("down") and from below ("up").
A relationship between j and p is obtained from the well known distribution of order statistics, This probability is readily available since N is arbitrarily large, and hence, F B is known for all practical purposes by Theorem 0.1-b.
Consider now only B ( j i ) 's in a neighborhood of the true p , all satisfying the inequality, Observe that (6) is satisfied for small P(B ( j i ) > p) when B ( j i ) lies to the left of the true p . Suppose now we solve (6) with B ( j i ) along some p -increments, and find B ( j 2 ) nearest the smallest p which satisfies (6). (6) we obtain another approximation for p . We keep doing that to obtain a subsequence B ( j 1 ) , Clearly, depending on the p -increment, some of the B ( j i ) fall to the left of the true p and some to the right.
This process stops when the next B ( j i ) is equal to the previous one.
The preceding argument suggests the following Iterative Algorithm for the estimation of tail probabilities proposed in Wang (2018) and Kedem et al. (2019) [5,9]. Recall the order statistics formula (5).

Iterative Algorithm
2. Choose a starting point j = j 1 and find the smallest p that satisfies the inequality where F B is the empirical distribution function of the B i 's. Evaluate (7) along p -increments of size ! : 4. Repeat steps 2 and 3 for j 2 .
In general, starting with any j , convergence occurs when for the first time B ( j k ) = B ( j k+1 ) for some k and we keep getting the same probability p j k . That is, the process stops when p j k keeps giving the same B ( j k+1 ) . Thus, the algorithm produces the iterative steps, For each starting point j , the algorithm produces a sequence j 1 , j 2 ,! . The first thing we wish to show is that such a sequence is either non-decreasing or nonincreasing. To show this, we note that the left-hand side of (7) is non-increasing in p . Thus, solving (7) is equivalent to solving To solve (8), we need the following lemma.
05 beta( j,n! j+1) ) almost surely by the monotonicity of F B . Then by the definition of order statistics and the quantile function, we have By Lemma 0.2, we know that if we solve (7) analytically, the sequence obtained should satisfy j 1 ! j 2 ! ! . However, since the algorithm solves (7) approximately by using a p -increment, then we could Lemma 0.3 For each starting point j 1 , the algorithm produces one of the three types of sequences: 1. j 1 > j 2 ! ! i.e. a sequence that goes down; 2. j 1 = j 2 = ! i.e. a sequence that stays at j 1 ; 3. j 1 < j 2 = ! i.e. a sequence that goes up for one step and then stays there.
Proof. The proof is technical and is given in Appendix B.
Lemma 0.3 points to the existence of a "down" and "up" sequence produced by the Iterative Algorithm. This and the argument at the beginning of this section lead to the following proposition, supported by the examples in the next section.
Proposition 0.1 Assume that the samples size n 0 of X 0 is large enough, and that the number of fusions N is sufficiently large so that B (1) where the p j are evaluated along appropriate numerical increments. Then, iterating between (10) and the ordered sequence {B ( j ) } produces "down" and "up" sequences depending on the B ( j ) relative to p j .
In particular, in a neighborhood of the true tail probability p , with a high probability, there are "down" We further note that: The "down" and "up" sequences in the proposition are indeed the type 1 and type 3 sequences in Lemma 0.3, respectively. Furthermore, by Lemma 0.3, the Iterative Algorithm produces "down" and "up" sequences depending on the relative size of the p - The proposition states that there is a shift between "down" and "up" patterns around the true p . As we shall see in the next section, the "down" and "up" patterns around the true p essentially converge to what looks like a "fixed point" close to p . Such fixed points could occur elsewhere as well.
For a very small p -increment such that ! < arg j min (B ( j ) "p j ) , meaning, there is at least one p -increment between p j and B ( j ) for every j , the algorithm will not produce any "up" sequence.

Algorithm Illustrations
In practice, computational constraints limit the size of the number of fusions N . Hence in (10), F B is approximated very closely by F B obtained with N = 10, 000 fusions, while in the binomial coefficients we use N = 1000 , near the maximum allowed by R. In all cases we use the misspecified tilt function h(x) = (x, log x) , appropriate for gamma data, and the computer-generated data are uniform where the upper limit exceeds T : X 1 ! Unif (0, L) where L > T . The precision of the estimates p of p obtained at the "down-up" transition supports these choices.
The only exception is the normal case in Section 8 where h(x) = (x, x 2 ) and N = 5, 000 . In that case To observe the "down-up" pattern near the true p , each entry in the following tables is obtained from a different sample of 1000 independent B 's, at times with the same j . This is to show that different samples lead to the same p observed at the shift from "down" to "up". From the following tables we see that as the "down-up" sequences approach p with any j , the number of iterations from the Iterative Algorithm decreases, a telltale sign we approach the true p .

Illustration in Terms of Lognormal Data
We start with a simulated lognormal example, with parameters µ = ! 2 = 1 , denoted by LN(1,1), where we know for sure that the tail of the distribution is far from that of a gamma tail, meaning that h(x) = (x, log x) is misspecified.
The descriptive statistics of the upper bounds B 1 ,..., B 10,000 obtained from N = 10, 000 fusions of X 0 with independent computer-generated samples X 1 ! Unif (0,130) , where n 0 = n 1 = 100 , are given in Table 1. Conspicuously, the median and mean are of the same order of magnitude as that of the true p .
We chose a p -increment of 0.000015 which is of the same order of magnitude of both the median(B) / 10 = 0.00003828 a n d mean(B) / 10 = 0.00005504 . The pattern in Table 2 points to a shift from "down" to "up" at p = 0.0001045544 close to the true p = 0.0001 with an error of 4.5544e ! 06 . The next lognormal example concerns X 0 ! LN(0,1) and misspecified h(x) = (x, log x) . We have max( X 0 ) = 13.77121 , T = 41.22383 , p = 0.0001 , and X 1 ! Unif (0, 70) . Hence max( X 0 ) < T < 70 .
The descriptive statistics of the upper bounds B 1 ,..., B 10,000 obtained from N = 10, 000 fusions of X 0 with independent computer-generated samples X 1 ! Unif (0, 70) , where n 0 = n 1 = 100 , are given in Table  3. Conspicuously, the median and mean are of the same order of magnitude as that of the true p . We chose a p -increment of 0.000015 which is of the same order of magnitude of both the median(B) / 10 = 0.00002012 and mean(B) / 10 = 0.00004287 . Table 4 points to a shift from "down" to "up" at p = 0.0001042241 close to the true p = 0.0001 with an error of 4.2241e ! 06 .

Illustration in Terms of Mercury Data
The present illustration concerns levels of mercury data measured in marine life in mg/kg. The data source is NOAA's National Status and Trends Data https://products.coastalscience.noaa.gov/nsandt_da ta/data.aspx.
The mercury data consists of 8,266 observations of which 9 observations exceed T = 22.41 giving the proportion of p = 0.001088797 . We treat the mercury data as a population and draw a reference random sample without replacement X 0 of size n 0 = 200 from it.
The results of 10,000 fusions of the mercury reference sample X 0 with X 1 ! Unif (0, 50) samples of size n 1 = 200 gave 10,000 B upper bounds. Their descriptive statistics are summarized in Table 5. Observe that the 1st quartile, median, mean, and 3rd quartile are all of the same order of magnitude as that of the true p , and hence they give us an idea as to the value of p . Here max( X 0 ) = 13.8 < T = 22.41 < 50 .
One-tenth of both the median and the mean suggest a p-increment of 0.0001 . That is, an increment of the order of mean(B) / 10 and median(B) / 10 . Table 6 points to a shift from "down" to "up" at p = 0.001092137 close to the true p = 0.001088797 with an error of 3.34e ! 06 .

Illustration in Terms of Radon Data
Residential radon is a tasteless, colorless and odorless radioactive gas naturally abundant in the soil. Approximately 40 percent of Pennsylvania homes have radon levels above the EPA action guideline of 4 pCi/L. The iterative algorithm is applied here to Beaver County radon data from 1989 to 2017. The data consist of 7,425 radon observations, taken as a population, of which only 2 exceed 200. Hence, with T = 200 we wish to estimate the small probability p = 2 / 7425 = 0.0002693603 . The reference sample X 0 was chosen without replacement from the 7,425 radon observations. The generated X 1 samples are from Unif (0, 300) and n 0 = n 1 = 500. We observe that max( X 0 ) = 107 < T = 200 , so that the largest data point is close to T / 2 .
The results of 10,000 fusions of a radon reference sample X 0 of size n 0 = 500 with X 1 ! Unif (0, 300) samples of size n 1 = 500 gave 10,000 B upper bounds. Their descriptive statistics are summarized in Table 7. Observe that the 3rd quartile of 0.0002686 is very close to true p = 0.0002694 . The p -increment was chosen as 0.00003, which is of the same order of magnitude as one tenth of either the 1st quartile, mean, median, or 3rd quartile of the 10,000 B 's.
From Changing to a p -increment of 0.000018, close to median(B) / 10 = 0.00001806 , we get the exact same p = 0.0002689389 , whereas a p -increment of 0.00002686 which is equal to one-tenth of the 3rd quartile gives p = 0.0002675389 with an error of 1.8611e ! 06 . A different approach to residential radon exceedances using fused county data is studied in Zhang et al. (2020a,b) [11,12], employing a density ratio model with variable tilt functions.

Illustration in Terms of Normal Data
Throughout the paper we deal with tail probabilities of non-negative continuous data where we use h(x) = (x, log x) . However, the Iterative Algorithm is applicable in more general situations, including the situation when the data are normal, provided the tilt function is chosen judicially. To underscore this, we bring next an example with normal sample X 0 , in which case the "normal tilt" h(x) = (x, x 2 ) is specified when X 1 is a normal sample as well, but nearly specified when X 1 is uniform over a sufficiently large support.
Accordingly, consider X 0 ! N(0, 4) , T = 6.180465 , p = 0.001 , max( X 0 ) = 4.168643 , X 1 ! Unif (!10,10) , and n 0 = n 1 = 100 . From 5,000 fusions we have median(B) = 0.0037089 and mean(B) = 0.0039251 (same order of magnitude as that of p ), one tenth of which is on the order of the used p -increment=0.0001. The down-up results are given in Table 9, showing a shift at p = 0.001008915 , giving an error of 8.915e-06, on par with the previous results.

Results Summary
To get a picture of of the repeated fusion method in tail estimation, the top four entries in Table 10 depict the pairs ( p,p ) obtained from the previous examples with a misspecified h(x) = (x, log x) . The next four entries are from nearly specified cases where h(x) = (x, log x) is sensible, and where the increment was chosen as before. There is no apparent difference from the four misspecified cases. In the Weibul(1.1,1) case the down-up shift alternated between 0.0001051111 and 0.0001201111 and we report the average. Similar results can be found in Kedem et al. (2019) and Kedem and Pyne (2021) [5,6].
Again, already the median and mean of B 1 ,. .., B 10,000 ( B 1 ,..., B 5,000 in the last entry) give a good idea as to the value of p , however, the Iterative Algorithm improves greatly on the mean and median as we see from Table 10 where in all cases p is close to p . We have seen that in many other cases.

Comparison with POT
Possibly, an alternative method to repeated fusion is the extreme value theory method of peaks over threshold (POT), where only the values above a sufficiently high threshold are used in the estimation of small tail probabilities [13,14]. This results in a reduced sample which could prove to be problematic when the original sample is too small to begin with. By contrast, with the repeated fusion method the total number of observations, albeit some of which are artificial, increases. A brief comparison between these two methods is given in Wang (2018) and Kedem et al. (2019) [5,9]. It shows that, across quite a few distributions, for sample sizes on the order of 100 and p = 0.001 , the repeated fusion method tends to give higher coverage and smaller mean absolute error. However, it must be emphasized that this could be reversed for much smaller probabilities and much larger samples. We shall look into this problem elsewhere.

Nitrogen Dioxide Data Analysis
Finally, we apply next the Iterative Algorithm to nitrogen dioxide ( NO 2 ) data from Washington DC, where, as before we use 10,000 fusions giving upper bounds B 1 ,..., B 10,1000 for p , the chance that NO 2 exceeds T . The algorithm was applied for T = 100 , where max( X 0 ) = 48.06 < T / 2 . As in the previous rather precise computational results, the down-up shift point is the point estimate p .
Our repeated fusion analysis was applied to a random sample X 0 of hourly measurements of size n 0 = 400 . The generated X 1 samples were from Unif (0,150) , h(x) = (x, log x) , and n 0 = n 1 = 400. We have max( X 0 ) = 48.06 < T = 100 < 150 , so that the largest data point is close to T / 2 . The results of 10,000 fusions of the NO 2 reference sample X 0 with X 1 ! Unif (0,150) samples of size n 1 = 400 gave 10,000 B upper bounds. Their descriptive statistics are summarized in Table 11. .209e-05 9.860e-05 1.039e-04 1.293e-04 5.821e-04 From Table 12, with mean(B) / 10 = 0.00001039 as the p -increment, the down-up shift occurs at p = 0.0001344551 not far from the 3rd Quartile(B) = 0.0001293 .
To approximate the variance of p , we can redo the analysis over and over again. Thus, we have repeated the above analysis ten times, each time with different 10,000 fusions. The resulting p 's from different reference samples X 0 of sizes n 0 = 400 , using different p -increments equal to mean(B) / 10 , are given in Table 13. The exception is an increment of 3rd Qu(B) / 10 in entry 9.
The variation among the p 's is fairly small pointing to consistent results. In fact the sample mean and standard deviation of the p 's are 0.0001306905 and 1.411819e-05, respectively.
We see that in all cases the median(B) and mean(B) are close to 0.0001, which is not far from the resulting p 's.

CONCLUSION
The repeated fusion of a given moderately large sample with numerous computer-generated samples  Some words of caution are in order. First, like with any other statistical method, the sample size is important. Thus, in the NO 2 analysis, the excessively high threshold of T = 200 proved problematic, which most likely could be overcome with a larger sample. Second, we do not have any guidelines as to the size of the support of the uniform fusion samples, except to say that the support must contain T , as was done in our data analysis.

APPENDIX Appendix A: Density Ratio Model for Multiple Sources
The appendix addresses the density ratio model (2) for m +1 data sources. Thus, we deal with the density ratio model more generally where X 0 is fused with m computer-generated samples. Above we dealt with the special case of m = 1.
Assume that the reference random sample X 0 of size n 0 follows an unknown reference distribution with probability density g , and let G be the corresponding cumulative distribution function (cdf). Let X 1 ,..., X m , be additional computer-generated random samples where X j ! g j , G j , with size n j , j = 1,..., m . The augmentation of m +1 samples t = (t 1 ,…, t n ) = ( X 0 , X 1 ,…, X m ), (11) of size n 0 + n 1 +!+ n m gives the fused data. The density ratio model stipulates that where ! j is an r !1 parameter vector, ! j is a scalar parameter, and h(x) is an r !1 vector valued distortion or tilt function. None of the probability densities g, g 1 ,..., g m and the corresponding G j 's, and none of the parameters ! 's and ! 's are assumed known, but, strictly speaking, the so-called tilt function h must be a known function.

Asymptotic Distribution of Ĝ (x)
Define ! 0 " 0, # 0 " 0 , w j (x) = exp(! j + " # j h(x)) , Maximum likelihood estimates for all the parameters and G(x) can be obtained by maximizing the empirical likelihood over the class of step cumulative distribution functions with jumps at the observed values t 1 ,…, t n [18]. Let p i = dG(t i ) be the mass at t i , for i = 1,…, n . Then the empirical likelihood becomes Maximizing !( !, G) subject to the constraints (14) we obtain the desired estimates. In particular, where I(t i ! t) equals one for t i ! t and is zero, otherwise. Similarly, Ĝ j is estimated by summing exp(! j + " # j h(t i ))dG(t i ) .
Theorem 0.2 Assume that the sample size ratios ! j = n j / n 0 are positive and finite and remain fixed as the total sample size n = j =0 m ! n j " # . The process n(Ĝ(t) ! G(t)) converges to a zero-mean Gaussian process in the space of real right continuous functions that have left limits with covariance matrix given by
Denote by V (t) the estimated variance of Ĝ (t) as given in (16). Replacing parameters by their estimates, a 1 ! " level pointwise confidence interval for G(t) is approximated by where z ! /2 is the upper ! / 2 point of the standard normal distribution. Hence, a 1 ! " level pointwise confidence interval for 1 ! G(T ) for any T , and in particular for relatively large thresholds T is approximated by (1 !Ĝ(t) ! z " /2V (t),1 !Ĝ(t) + z " /2V (t)).
Appendix B: Proof of Lemma 0.3 To prove that the three types of sequences specified by Lemma 0.3 are the only ones that we obtain from the algorithm, we first show how the pincrement determines the relationship between j 1 and j 2 .
By Lemma 0.2, B ( j 1 ) !p j 1 almost surely. If there is a p -increment k! such that p j 1 ! k" < B ( j 1 ) , then the approximated solution that the algorithm provides is k! since it is the smallest p -increment that satisfies (7). If there are more than one p -increment between p j 1 and B ( j 1 ) , then the algorithm will give the smallest one among them and we can denote this as k! .
The next step of the algorithm is to find the smallest B ( j ) ! k" . If we have a B ( j 2 ) such that k! " B ( j 2 ) < B ( j 1 ) , then the sequence will go down from B ( j 1 ) to B ( j 2 ) . Now it remains to show that B ( j 3 ) ! B ( j 2 ) to prove that the algorithm can produce the type 1 sequence i.e. j 1 > j 2 ! ! . Recall the expression p j = q q 0.05 beta( j,n! j+1) F B and this is non-decreasing in j .
Thus, we must have p j 2 !p j 1 .
Note that we have k! between p j 2 and B ( j 2 ) which means that there is at least one p -increment between p j 2 and B ( j 2 ) so that the sequence will either go down or stay at 2 j . To sum up, if there exists a pincrement ! k such that p j 1 ! k" < B ( j 1 ) and a B ( j 2 ) such that k! " B ( j 2 ) < B ( j 1 ) , then the algorithm will give type 1 seqence.
Now we shall examine the circumstances under which the algorithm produces type 2 sequence i.e. the sequence stays at j 1 . There are two cases: 1. There is no p -increment between p j 1 and B ( j 1 ) but there is a p -increment k! = B ( j 1 ) . In this case, the algorithm will give k! as the solution and then the smallest B ( j ) ! k" is still B ( j 1 ) ; 2. There exists at least one (if more than one then we take the smallest one among them) pincrement k! between p j 1 and B ( j 1 ) but there is no B ( j 2 ) between k! and B ( j 1 ) . In this case, the smallest B ( j ) ! k" is also B ( j 1 ) .
Based on the analysis above, it is readily seen that if there is no p -increment k! such that p j 1 ! k" ! B ( j 1 ) , then the algorithm gives type 3 sequence i.e. j 1 < j 2 = ! .
It remains to show that the sequence stays at j 2 after one-step going up from j 1 to j 2 . Suppose that k! is the smallest p -increment such that k! " B ( j 1 ) and B ( j 2 ) is the smallest B ( j ) ! k" . Note that P(B ( j ) > p) is non-increasing in j so that if k! satisfies the inequality (7) for j 1 then it must satisfy it for j 2 since j 2 > j 1 . Therefore, the solution of p is also k! for j 2 so that the sequence stays at j 2 . At this point, we have completed the proof of Lemma 0.3.

ACKNOWLEDGEMENT
The work of B. Kedem