New statistical process control charts for overdispersed count data based on the Bell distribution

Abstract Poisson distribution is a popular discrete model used to describe counting information, from which traditional control charts involving count data, such as the c and u charts, have been established in the literature. However, several studies recognize the need for alternative control charts that allow for data overdispersion, which can be encountered in many fields, including ecology, healthcare, industry, and others. The Bell distribution, recently proposed by Castellares et al. (2018), is a particular solution of a multiple Poisson process able to accommodate overdispersed data. It can be used as an alternative to the usual Poisson (which, although not nested in the Bell family, is approached for small values of the Bell distribution) Poisson, negative binomial, and COM-Poisson distributions for modeling count data in several areas. In this paper, we consider the Bell distribution to introduce two new exciting, and useful statistical control charts for counting processes, which are capable of monitoring count data with overdispersion. The performance of the so-called Bell charts, namely Bell-c and Bell-u charts, is evaluated by the average run length in numerical simulation. Some artificial and real data sets are used to illustrate the applicability of the proposed control charts.


INTRODUCTION
Count data arise in many fields, including biology, ecology, healthcare, marketing, economics, and industry.Overdispersion (variance > mean) in count data is quite common (Hougaard et al. 1997, Okamura et al. 2012, Coly et al. 2016) and can occur for several reasons, including mechanisms that generate excessive zero counts or censoring (Avcı et al. 2015).In the presence of overdispersed count data, the Poisson model, which is the most popular distribution for analyzing count data and assumes equidispersion (variance = mean), may result in incorrect inference about parameter estimates, standard errors, tests, and confidence intervals (Avcı et al. 2015).Particularly in Statistical Process Control (SPC), a misspecified Poisson model (c or u chart) may increase the false alarm rate (Saghir & Lin 2015).Indeed, when dealing with the application of control charts to data that show excessive dispersion, several works (Spiegelhalter 2005, Mohammed & Laney 2006, Albers 2011) express concern about the use of such a structure.This concern is because increased variation may cause multiple data values to be falsely detected as out of control when they are merely false positive.

BELL DISTRIBUTION
In this section, we present a brief review of the Bell distribution and some of its properties.We also discuss point estimation via the maximum likelihood estimator for its parameter.
Regarding pseudo-random sample generation from Y ∼ Bell(), we resort to Proposition 3 of Castellares et al. (2018), which stated that the random variable Y has the same distribution as the sum of N independent and identically distributed (IID) zero-truncated Poisson (ZTP) random variables with parameter  > 0, where N is Poisson distributed with parameter e  −1.In other words, if X i IID ∼ZTP(), for i = 1, … , N, with N ∼ Poisson(e  −1) and independent of {X 1 , … , X N }, then Y = X 1 +…+X N ∼ Bell().
In R software, we can generate pseudo-random observations from the ZTP distribution by using the rztpois(.)function of the "actuar" package (Dutang et al. 2008).
We can easily estimate the parameter  via the maximum likelihood method.By considering the observed sample y = (y 1 , y 2 , … , y n ) ′ of size n from Y ∼ Bell(), we obtain the likelihood function (3) Castellares et al. (2018) showed that the maximum likelihood estimator (MLE) θ of  has a closed-form expression and is given by where W 0 (.) is the Lambert W function (Corless et al. 1996), which can be computed via the W(.) function of the "LambertW" package (Goerg 2011, Goerg 2016) in R, and Ȳ = ∑ n i=1 Y i /n is the sample mean.Castellares et al. (2018) also presented an alternative and useful (mainly in a regression modeling framework, in which it is very common to model the mean of the response variable as a function of several other variables, also called explanatory variables or regressors) reparametrization of the Bell distribution, where  = E[Y] = e  and, hence,  = W 0 ().In this case, the PMF, mean and variance of Y ∼ Bell(),  > 0, are written, respectively, as Furthermore, it can be easily shown that the MLE μ of  is the sample mean, i.e., μ = Ȳ = ∑ n i=1 Y i /n.

NEW CONTROL CHARTS
Suppose that a process (e.g., an industrial process) generates events (e.g., nonconformities or defects) according to a Bell() distribution.Letting Y denote the number of events per process unit, the PMF of Y is given by (1).Also, let Y 1 , Y 2 , … , Y n be a random sample of size n from Y ∼ Bell(), and consider T = ∑ n i=1 Y i and Ȳ = T/n = ∑ n i=1 Y i /n, which represent, respectively, the total number of events and the average number of events per unit.Hence, it follows from Equation (2), using the properties of the expected value and variance operators, that the exact mean and variance of T and Ȳ are given by (5) The above quantities can be used to construct the proposed control charts in the usual manner for Shewhart charts.Assuming that a standard value for  is available, the center lines and L control limits2 for each chart -namely, the Bell-c (total number of events per unit) and Bell-u (average number of events per unit) charts -are shown in Table I.It is worth pointing out that since the Bell distribution approaches the Poisson distribution for small values of , the control limits presented in Table I also approach the control limits for special cases of the c and u charts derived from the Poisson distribution.Moreover, similar to the c chart, the Bell-c chart should not be used in the cases where the sample sizes are unequal, because both the center line and the control limits of the Bell-c chart will vary with the sample size.Therefore, the Bell-u chart is recommended over the Bell-c chart for variable sample size as the former is easier to interpret for this scenario (the center line of the Bell-u chart will not vary across samples).In order to build the so-called Bell charts, considering the lack of knowledge about , we can apply, for example, the MLE (4), using all the available m samples, in the place of  (plug-in approach) in the control limits shown in Table I.In this case, the obtained control limits are commonly treated as trial control limits (Montgomery 2013).
Since the  parameter contains little or no direct interpretation, it can be difficult for industry professionals, such as engineers and technicians, to provide a standard (or reference) value.In order to overcome this difficulty, we can use a convenient reparametrization of the Bell distribution in terms of the mean  = E[Y] = e  .This parametrization was described at the end of the previous section.The derived control limits for the proposed Bell charts, considering this alternative reparametrization, are displayed in Table II.When no standard is given, then  can be estimated by the overall sample mean, that is, μ = Ȳ = (in the case of the Bell-u chart with unequal sample sizes3 ), where Y ij denotes the number of events for the i-th inspection unit of the j-th sample.Both estimators may be easily proved to be unbiased estimators for .
Therefore, because of their practical advantage as well as simplicity, we shall hereafter consider the control limits provided in Table II.
Table II.Control limits for the Bell-c and Bell-u charts, considering a different reparametrization of the Bell distribution (standards given).

Bell-c chart
Bell-u chart

PERFORMANCE EVALUATION
In this section, we conduct Monte Carlo (MC) simulation studies to assess the performance of the proposed Bell charts, as well as of some existing/traditional control charts for count data (namely, the Poisson-based c and u charts, and the COM-Poisson-based cmpc and cmpu charts) when the true data-generating process is Bell distributed.All simulations and computations were performed using the R software version 3.6.1.Interested readers can email the authors for the corresponding R codes.
The average run length (ARL) is a measure commonly used to evaluate the performance of control charts.The in-control ARL, also denoted as ARL 0 , is defined as the average number of samples (or monitoring points) before a signal is given (that is, a single point falls outside the control limits), assuming that the process is in control; while the out-of-control ARL (or ARL 1 ) is the average number of samples that are taken until a mean shift is observed when the process is out of control (Saghir & Lin 2015).
Let us assume that Y ∼ Bell() comes from a process with in-control average nonconformities, and let  s be the shifted mean nonconformities parameter after a shift occurs in , that is, Y ∼ Bell( s ).For the proposed Bell charts, the ARL 0 is defined as For instance, in the usual Six Sigma programs,  = 0.0027 and, thus, ARL 0 = 1/0.0027≈ 370.That is, even if the process is in control, an out-of-control signal will be given every 370 samples, on average (Montgomery 2013).On the other hand, ARL 1 values near one are desirable, mainly for large-size shifts in a process mean (Shewhart control charts).

In-control ARL
Without loss of generality, in this subsection we consider a Bell process with mean nonconformities rates:  = 3 and 15, as well as two different values for the probability of false alarm:  = 0.01 and 0.1.These  values correspond, respectively, to ARL 0 = 100 and 10.We also assume three sample sizes for the process: n = 50, 100 and 500 (or n = 50, 100 and 500, in the case of the Bell-u chart with variable sample size4 ), even as processes with three different sample quantities in phase 1: m = 20, 50 and 100.
Figures 1-4 and A1-A4 (see Appendix A) show the results obtained from 1, 000 MC simulations (or replicates) with m * = 5, 000 phase 2 samples each5 , performed for each scenario studied.That is, by varying the number of phase 1 samples, the sample size, the Bell distribution parameter, and the probability of false alarm.In particular, the ARL 0 results in Figures 1-4 concern the Bell-c chart with equal sample sizes, while Figures A1-A4 contain the ARL 0 values of the Bell-u chart with unequal sample sizes.In all these figures, the dashed line represents the nominal ARL 0 value, and the asterisk inside the boxplot indicates the average ARL 0 estimate of the 1, 000 simulations.
Despite some slight to moderate discrepancies from the target ARL 0 value in some cases, which is indeed expected due to the parameter estimation effect6 , the results presented in Figures 1-4 and A1-A4 seem to indicate the good performance of the proposed Bell charts.Note that, in general, the results improve, i.e., the ARL 0 values approach the nominal one when both m and n (or n ) increase.

Out-of-control ARL
In this subsection, we evaluate the detection ability of the proposed Bell charts by means of ARL 1 , for the same scenarios as in the "In-control ARL" subsection.Due to space constraints, we consider only shifts at three levels that represent percentage increases p in the nonconformities rate  of the Bell process.The hypothesized levels are as follows: p = 0.5% ( s = 3.015 and 15.075), 1% ( s = 3.03 and 15.15) and 10% ( s = 3.3 and 16.5).

NEW STATISTICAL PROCESS CONTROL CHARTS
The ARL 1 values from the 1, 000 MC simulations with m * = 5, 000 samples each, are presented in Tables III and AI (see Appendix A) for the Bell-c chart with equal sample sizes and the Bell-u chart with unequal sample sizes, respectively.From these tables, it can be clearly seen that the ARL 1 on average decreases with increasing p.Note also that, for both control charts based on the Bell distribution, the ARL 1 values are quite close to one when p ≥ 1%, and regardless of the m, n (or n ),  and  values.The impact of Bell data on some standard control charts The Bell control chart theory introduced here may be applied in practical situations as a useful and exciting alternative to the well-known c and u charts developed via the Poisson assumption, and the cmpc and cmpu charts derived from the COM-Poisson distribution, among others, when data are overdispersed.Hence, in-control count data with overdispersion can be modeled well via the proposed Bell charts.
In this subsection, we apply the above-mentioned Poisson-and COM-Poisson-based control charts to sample data generated from the Bell distribution.The aim is to investigate, employing simulations, the performance (in terms of ARL 0 and ARL 1 ) of these well-known control charts when they are applied to the Bell processes.For comparison purposes, we also use the Bell-c and Bell-u An Acad Bras Cienc (2023) 95(2) e20200246 9 | 22 LAION L. BOAVENTURA et al.

NEW STATISTICAL PROCESS CONTROL CHARTS
charts for the cases when the samples are of equal and unequal sizes.The simulations were conducted using the same settings as described in the previous subsections.However, due to space and time limitations, we consider only m = 100 and n = 100 (or n = 100).Of course, it may also be of interest to examine further the performance of the Bell-based control charts to analyze count data generated from other distributions, including the Poisson and COM-Poisson distributions with some level of dispersion.Although we provide some preliminary results (indeed, based on a single artificial sample only) in the "Poisson data" and "COM-Poisson data" subsubsections, this issue will be better addressed in our future work.
Tables IV and V display, respectively, the ARL 0 and ARL 1 results for the control charts with equal sample sizes (or c-type control charts).While the performance measures for the control charts with unequal sample sizes (or u-type control charts) are shown in Tables AII and AIII of Appendix A. From these tables, it can be seen that, regardless of the scenario configuration, the Poisson-based control charts produced poor results (i.e., gave many false alarms) for the in-control Bell samples.This result is in agreement with some authors, e.g., Saghir & Lin (2015).On the other hand, the COM-Poisson-based control charts generally provided reasonable to good results for both the in-control and out-of-control Bell samples (notice that such results are sometimes close to the ones obtained from the Bell-based control charts).This finding is somewhat expected as the COM-Poisson distribution has one extra parameter which adds some flexibility to the model and, consequently, the corresponding control chart.
Finally, it is worth pointing out that we obtained the CL and control limits (UCL and LCL) by finding the respective mean and standard deviation (SD) for each case, thus calculating the limits by the "Mean ± L × SD" rule of thumb.For the Poisson and Bell distributions, the mean is easily determined.Meanwhile, the COM-Poisson summary statistics have complicated formulas, but can be readily computed using the "compoisson" package (Dunn 2012) in R.

APPLICATIONS
It is well-known that there are practical situations in which the nature of the production process allows an item or product to contain several nonconformities (defects) and not be classified as nonconforming.For example, the manufacture of personal computers might have one or more very minor flaws in the cabinet finish, but since these flaws do not seriously affect the unit's functional operation, it could be classified as conforming (Montgomery 2013).Thus, in this section, we apply SPC charts to inspect the total or average number of defects per sample, and determine if the process is in control.

Simulated data examples
Here, we consider three artificial data sets containing some levels of overdispersion to illustrate the usefulness of the Bell-c and Bell-u charts, as well as their ability to produce bounds comparable to those established by the classical SPC theory.
In the cases where the samples are of equal sizes, the control charts chosen for comparison were the traditional c chart and the cmpc chart.The c chart assumes that the Poisson distribution well models the number of defects per inspection unit.However, according to Sellers (2012), the c chart, while having a better performance for large samples, does not work well in the cases of overdispersion.Thus, the author developed a control chart using a more general count distribution that relaxes the equidispersion assumption of the Poisson distribution.The so-called cmpc chart, based on the assumption of a COM-Poisson distribution, has shown excellent results when fitting overdispersed count data.Therefore, it can also be used as a reference tool to compare the performance of the Bell-c chart in this work.For the cases when the samples are of unequal sizes, the control charts selected for the comparison were the u chart and the cmpu chart.
Similar to the "Performance evaluation" section, all the analyses were done using the R programming language, and it was assumed L = 3.

Poisson data
In this application, we generated m = 50 samples of size n = 100 from a Poisson distribution with parameter  = 3.The idea here was to compare the performance of the Bell-c and c charts, mainly in phase 2.More specifically, in phase 1, we calculated the control limits of both charts from the aforementioned m = 50 samples.While in phase 2 we simulated new m * = 70 samples of size n = 100 from the same distribution (Poisson), but with the last 20 samples disturbed by  s = 3.3.Then, for both estimated control charts, the occurrence (or not) of false alarms in the first 50 samples and the number of runs until the first of the 20 nonconforming samples could be detected.This figure points out for this particular artificial example that the Bell-based control chart produced better results than the Poisson-based control chart for the in-control samples, even though the true data-generating process is not Bell distributed.This was due to the fact that the Bell-c chart identified a single false alarm, while the c chart itself found three.
Finally, for the samples that are known to be out-of-control, both charts performed similarly, showing that the Bell-c chart may be a useful alternative to the traditional c chart.

COM-Poisson data
As in the "Poisson data" subsubsection, in this application we generated m = 50 samples of equal size n = 100, but this time, from a COM-Poisson distribution with parameters  = 1.5 and  = 0.5.Here, the idea also was to compare the performance of the Bell-c and cmpc charts in phase 2 monitoring.Phases 1 and 2 analyses were carried out in the same way as before, with the exception that the last 20 samples (in a total of m * = 70 new samples) were perturbed by  s = 0.75.
Figure 6 shows the estimated control charts applied to the phase 2 samples, from which it can be observed that similar to the previous subsubsection, the Bell-based control chart provided better results than the COM-Poisson-based control chart for the in-control samples.Although the COM-Poisson represents the true data-generating distribution, the Bell-c chart gave a single false alarm, while the cmpc chart itself found three.Such excellent performance of the Bell-c chart is maintained when analyzing out-of-control samples, showing that the Bell-c chart can also be a useful, as well as simpler alternative to the cmpc chart.

Bell data
In this last application, we simulated m = 50 samples of unequal sizes (with n = 100, and each sample size obtained by following the same approach described in footnote 4) from a Bell distribution with parameter  = 3, and compared the performance of the Bell-u chart against the u chart and cmpu chart.Once again, as in the previous subsubsections, we considered the application to phase 2 samples.The idea here was to evaluate both Poisson-and COM-Poisson-based control charts, compared to the Bell-based control chart performance when the data are actually from a Bell distribution.
From the analysis of Figure 7, it can be noticed that, in the case of conforming (in-control) samples, the Bell-u chart did not produce false alarms as the cmpu chart, while the u chart identified four false alarms.However, the Bell-based control chart performed slightly better than the COM-Poisson-based control chart when in the presence of nonconforming samples, where the Bell out-of-control samples were generated with  s = 3.3.

Real data example
In this subsection, we considered the Bell-c, cmpc and c charts to analyze the real data set related to the number of nonconformities in samples of 100 printed circuit boards; see Chapter 7 of Montgomery (2013).The data analysis initially comprehends 20 samples with overdispersion (variance/mean ≈ 1.21), which suggests that the process can be described by a COM-Poisson distribution with λ = 2.87 and ν = 0.37 (Sellers 2012).Further, Alevizakos & Koukouvinos (2022) considered 20 additional samples, also with overdispersion (variance/mean ≈ 2.00), simulated from a COM-Poisson distribution with  = 3.09 and  = 0.37, which implies a deterioration in the process; for more details, see Table 10 of Alevizakos & Koukouvinos (2022).Here, we checked the goodness-of-fit of the Bell distribution ( μ ≈ 20) through the Pearson's  2 test application with the first 20 samples, where we found p-value = 0.57.We also used the same test procedure for the last 20 samples, in which we obtained a p-value = 0.98 ( μ ≈ 24).
Figure 8 shows the estimated Bell-c, cmpc, and c charts, applied to the printed circuit boards data set, with control limits calculated from the initial 20 samples.That is, the left part of the green dashed vertical line is precisely the monitoring of the first 20 samples (phase 1), while the right part of this line represents the monitoring of the remaining 20 samples, which did not participate in the construction of the control limits (phase 2).It can be observed from Figure 8 that the Bell-c chart7 detected the process mean shift upwards as soon as it occurred.This detection was unlike the cmpc chart, which did not identify any nonconformity in the process, neither on the left side nor on the right side of the green dashed vertical line.The c chart, as expected, produced false alarms just to the left of this line.This result is because, as we know, the c chart is not flexible enough to accommodate overdispersed data.Furthermore, it is essential to note that Alevizakos & Koukouvinos (2022) also considered the data set analyzed by Sellers (2012); they had already observed that the proposed chart did not point to declines over the UCL, which is undesirable.In our case, with the Bell-c chart, the first warning appeared on unit #22, with three units outside the UCL correctly showing process deterioration.This result confirms that the Bell-c chart is an excellent alternative to usual control charts, such as the cmpc and c charts.

FINAL REMARKS
The one-parameter discrete Bell distribution, introduced by Castellares et al. (2018), has been established as a viable alternative to the Poisson distribution for analyzing count data with overdispersion.As demonstrated by the examples, we provided in the "Applications" section.The Bell-based control charts can be useful choices/alternatives to the traditional ones (e.g., the c and u charts derived from the Poisson distribution) for process monitoring of overdispersed count data.Notably, the proposed Bell charts provided satisfactory results (few false alarms and fast process change/shift detection) even when the sample data were generated from the two-parameter (and thus, more complex) COM-Poisson distribution.
In the "Performance evaluation" section, simulation results also demonstrated the excellent performance of the proposed Bell charts under different scenarios (i.e., by varying the number of phase 1 samples, the sample size, the Bell distribution parameter and the probability of false alarm).In this case, the ARL criterion was used to measure control chart performance.
In the applications provided in the "Applications" section, we determined the control limits based on the standard three-sigma rule (Six Sigma program).Nevertheless, alternatively, like in the "Performance evaluation" section, one can obtain the UCL and LCL via a specified type I error rate , so that the probability of finding sample points outside the control region is equal to .Finally, we considered a real data set related to the number of nonconformities in samples of 100 printed circuit boards, which demonstrated that our proposed Bell-c chart outperformed the standard charts when showing process deterioration.
There are many extensions of the current work.For instance, following Braun (1999), Chakraborti & Human (2008), Castagliola &Wu (2012), andCastagliola et al. (2014), we intend to develop optimization designs in order to increase the performance of the Bell control charts when estimating the unknown process parameter  using phase 1 data.
If Y ∼ Bell(), then the mean and variance of Y are given, respectively, by E[Y] = e  and Var[Y] = (1 + )e  .

Figure 1 .
Figure 1.The ARL 0 values of the Bell-c chart for various m and n (   = 3 and    = 0.01).

Figure 2 .
Figure 2. The ARL 0 values of the Bell-c chart for various m and n (   = 15 and    = 0.01).

Figure 3 .
Figure 3.The ARL 0 values of the Bell-c chart for various m and n (   = 3 and    = 0.1).

Figure 4 .
Figure 4.The ARL 0 values of the Bell-c chart for various m and n (   = 15 and    = 0.1).

Figure 5 .
Figure 5. Performance comparison of Bell-c and c charts, both constructed from Poisson distribution samples (phase 2).

Figure 5
Figure 5 illustrates this application, where the points (samples) to the right of the blue dashed vertical line are the disturbing observations (out-of-control samples).Furthermore, the red solid horizontal lines represent the UCL and LCL, the blue solid horizontal line indicates the CL, and the green points correspond to the out-of-control signals.This figure points out for this particular artificial example that the Bell-based control chart produced better results than the Poisson-based control chart for the in-control samples, even though

Figure 6 .
Figure 6.Performance comparison of Bell-c and cmpc charts, both constructed from COM-Poisson distribution samples (phase 2).

Figure 7 .
Figure 7. Performance comparison of Bell-u, u and cmpu charts, all constructed from Bell distribution samples (phase 2).

Figure 8 .
Figure 8. Performance comparison of Bell-c, cmpc and c charts, constructed from the printed circuit boards data set.

Figure A3 .
Figure A3.The ARL 0 values of the Bell-u chart for various m and n n n (   = 3 and    = 0.1).

Figure A4 .
Figure A4.The ARL 0 values of the Bell-u chart for various m and n n n (   = 15 and    = 0.1).

Table I .
Control limits for the Bell-c and Bell-u charts (standards given).UCL = upper control limit, CL = center line, LCL = lower control limit.

Table III .
Mean (standard deviation in parentheses) values of ARL 1 of the Bell-c chart, for the different scenarios studied.

Table IV .
Mean (standard deviation in parentheses) values of ARL 0 of the competitor c-type control charts when the true data-generating process is Bell(  ) distributed, for some    and    values (m=100 and n=100).

Table V .
Mean (standard deviation in parentheses) values of ARL 1 of the competitor c-type control charts when the true data-generating process is Bell(  ) distributed, for some   ,    and p values (m=100 and n=100).

Table AI .
Mean (standard deviation in parentheses) values of ARL 1 of the Bell-u chart, for the different scenarios studied.

Table AII .
Mean (standard deviation in parentheses) values of ARL 0 of the competitor u-type control charts when the true data-generating process is Bell(  ) distributed, for some    and    values (m=100 and n n n =100).

Table AIII .
Mean (standard deviation in parentheses) values of ARL 1 of the competitor u-type control charts, when the true data-generating process is Bell(  ) distributed, for some   ,    and p values (m=100 and n n n =100).