Acessibilidade / Reportar erro

New statistical process control charts for overdispersed count data based on the Bell distribution

Abstract

Poisson distribution is a popular discrete model used to describe counting information, from which traditional control charts involving count data, such as the c and u charts, have been established in the literature. However, several studies recognize the need for alternative control charts that allow for data overdispersion, which can be encountered in many fields, including ecology, healthcare, industry, and others. The Bell distribution, recently proposed by Castellares et al. (2018), is a particular solution of a multiple Poisson process able to accommodate overdispersed data. It can be used as an alternative to the usual Poisson (which, although not nested in the Bell family, is approached for small values of the Bell distribution) Poisson, negative binomial, and COM-Poisson distributions for modeling count data in several areas. In this paper, we consider the Bell distribution to introduce two new exciting, and useful statistical control charts for counting processes, which are capable of monitoring count data with overdispersion. The performance of the so-called Bell charts, namely Bell-c and Bell-u charts, is evaluated by the average run length in numerical simulation. Some artificial and real data sets are used to illustrate the applicability of the proposed control charts.

Key words
average run length; Bell charts; count data; overdispersion

Introduction

Count data arise in many fields, including biology, ecology, healthcare, marketing, economics, and industry. Overdispersion (variance > mean) in count data is quite common (Hougaard et al. 1997HOUGAARD P, LEE MLT & WHITMORE GA. 1997. Analysis of overdispersed count data by mixtures of Poisson variables and Poisson processes. Biometrics 53(4): 1225-1238., Okamura et al. 2012OKAMURA H, PUNT AE & AMANO T. 2012. A generalized model for overdispersed count data. Pop Ecol 54(3): 467-474., Coly et al. 2016COLY S, YAO AF, ABRIAL D & CHARRAS-GARRIDO M. 2016. Distributions to model overdispersed count data. Journal de la Société Française de Statistique 157(2): 39-64.) and can occur for several reasons, including mechanisms that generate excessive zero counts or censoring (Avcı et al. 2015AVCI E, ALTÜRK S & SOYLU EN. 2015. Comparison count regression models for overdispersed alga data. Int J Res Rev Appl Sci 25(1): 1-5.). In the presence of overdispersed count data, the Poisson model, which is the most popular distribution for analyzing count data and assumes equidispersion (variance = mean), may result in incorrect inference about parameter estimates, standard errors, tests, and confidence intervals (Avcı et al. 2015). Particularly in Statistical Process Control (SPC), a misspecified Poisson model (c or u chart) may increase the false alarm rate (Saghir & Lin 2015SAGHIR A & LIN Z. 2015. Control charts for dispersed count data: an overview. Qual Reliab Eng Int 31(5): 725-739.). Indeed, when dealing with the application of control charts to data that show excessive dispersion, several works (Spiegelhalter 2005SPIEGELHALTER DJ. 2005. Handling over-dispersion of performance indicators. BMJ Quality & Safety 14(5): 347-351., Mohammed & Laney 2006MOHAMMED MA & LANEY D. 2006. Overdispersion in health care performance data: Laney’s approach. BMJ Quality & Safety 15(5): 383-384., Albers 2011ALBERS W. 2011. Control charts for health care monitoring under overdispersion. Metrika 74(1): 67-83.) express concern about the use of such a structure. This concern is because increased variation may cause multiple data values to be falsely detected as out of control when they are merely false positive.

Sheaffer & Leavenworth (1976)SHEAFFER RL & LEAVENWORTH RS. 1976. The negative binomial model for counts in units of varying size. J Qual Technol 8(3): 158-163. and Kaminsky et al. 1992KAMINSKY FC, BENNEYAN JC, DAVIS RD & BURKE RJ. 1992. Statistical control charts based on a geometric distribution. J Qual Technol 24(2): 63-69. considered the negative binomial distribution, while Sellers 2012SELLERS KF. 2012. A generalized statistical control chart for over- or under-dispersed data. Qual Reliab Eng Int 28(1): 59-65. used the Conway–Maxwell–Poisson (COM-Poisson) model (cmpc and cmpu charts) as flexible alternatives to the Poisson control charts (c and u charts). In particular, Kaminsky et al. 1992KAMINSKY FC, BENNEYAN JC, DAVIS RD & BURKE RJ. 1992. Statistical control charts based on a geometric distribution. J Qual Technol 24(2): 63-69. also developed control charts that plot the total or the average number of events based on the geometric distribution (g and h charts, respectively). However, as pointed out by Saghir & Lin (2015), there are still very few articles on monitoring dispersed count data. Other proposals involve using compound distributions, e.g., the Poisson-gamma mixture (Cheng & Yu 2013), the shifted (or zero-truncated) generalized Poisson distribution (Famoye 1994), among others.

Recently, Castellares et al. 2018CASTELLARES F, FERRARI SLP & LEMONTE AJ. 2018. On the Bell distribution and its associated regression model for count data. Applied Mathematical Modelling 56: 172-185. introduced a one-parameter discrete probability distribution named the Bell distribution. It is infinitely divisible and capable of modeling count data with overdispersion, and with many other attractive properties, e.g., it is a single parameter distribution that belongs to the exponential family of distributions. Although the Poisson distribution is not nested in the Bell family, the Bell distribution approaches the Poisson distribution for small values of the parameter. That is, the Poisson distribution is a limiting case of the Bell distribution which arises when the Bell parameter tends to zero. The Bell distribution motivated us to propose two new statistical control charts to describe the total and the average number of events per inspection unit. The so-called Bell charts, namely Bell-c and Bell-u charts, represent interesting and useful alternatives to the charts mentioned above when monitoring counting processes with overdispersed data. It is worth pointing out that, in a typical Shewhart control chart, such as the ones proposed in this paper, the main aim is to detect significantly massive shifts in the process parameter. This kind of control chart also ignores historical data, which is why it is called “without memory“.

The remainder of this paper is organized as follows. The “Bell distribution“ section revises the Bell distribution and some of its basic properties. The “New control charts“ section presents the new attribute control charts based on the Bell distribution. The “Performance evaluation“ section provides simulation studies to assess the performance of the proposed Bell charts, even when compared to some traditional control charts. The “Applications“ section illustrates the usefulness of the Bell charts through several examples. Finally, the “Final remarks“ section concludes the paper with a few remarks and discussions on future works.

Bell distribution

In this section, we present a brief review of the Bell distribution and some of its properties. We also discuss point estimation via the maximum likelihood estimator for its parameter.

Introduced by Castellares et al. 2018CASTELLARES F, FERRARI SLP & LEMONTE AJ. 2018. On the Bell distribution and its associated regression model for count data. Applied Mathematical Modelling 56: 172-185., a random variable Y is said to be Bell distributed with parameter θ>0, denoted by YBell(θ), if its probability mass function (PMF) is given by

P(Y=y|θ)=θyeeθ+1Byy!,fory=0,1,2,,(1)
where By=1ek=0kyk!, y=0,1,2,, are the Bell numbers1 1 It is worth noting that the Bell number By is the y-th moment of a Poisson distribution with parameter equal to one; see Remark 1 of Castellares et al. 2018. (Bell 1934aBELL ET. 1934a. Exponential numbers. The American Mathematical Monthly 41(7): 411-419., bBELL ET. 1934b. Exponential polynomials. Ann Math 35(2): 258-277.), which can be computed via the bell(.) function of the “numbers“ package (Borchers 2018) in R software (R Core Team 2018R CORE TEAM. 2018. R: A Language and Environment for Statistical Computing. (Version 3.6.1). R Foundation for Statistical Computing. Vienna, Austria.).

If YBell(θ), then the mean and variance of Y are given, respectively, by

E[Y]=θeθandVar[Y]=θ(1+θ)eθ.(2)
Note that Var[Y]/E[Y]=1+θ>1, θ>0. Therefore, the Bell distribution may be a suitable distribution for modeling count data with overdispersion, although it may not accommodate all possible forms of overdispersion; see Remark 5 of Castellares et al. 2018CASTELLARES F, FERRARI SLP & LEMONTE AJ. 2018. On the Bell distribution and its associated regression model for count data. Applied Mathematical Modelling 56: 172-185..

Regarding pseudo-random sample generation from YBell(θ), we resort to Proposition 3 of Castellares et al. 2018CASTELLARES F, FERRARI SLP & LEMONTE AJ. 2018. On the Bell distribution and its associated regression model for count data. Applied Mathematical Modelling 56: 172-185., which stated that the random variable Y has the same distribution as the sum of N independent and identically distributed (IID) zero-truncated Poisson (ZTP) random variables with parameter θ>0, where N is Poisson distributed with parameter eθ1. In other words, if XiIIDZTP(θ ), for i=1,,N, with NPoisson(eθ 1) and independent of {X1,,XN}, then Y=X1++XNBell(θ ). In R software, we can generate pseudo-random observations from the ZTP distribution by using the rztpois(.) function of the “actuar“ package (Dutang et al. 2008DUTANG C, GOULET V & PIGEON M. 2008. actuar: An R package for Actuarial Science. J Stat Soft 25(7): 1-37.).

We can easily estimate the parameter θ via the maximum likelihood method. By considering the observed sample 𝐲=(y1,y2,,yn) of size n from YBell(θ ), we obtain the likelihood function

L(θ|𝐲)=i=1nθyieeθ+1Byiyi!=θi=1nyien(eθ+1)i=1nByii=1nyi!θi=1nyieneθ.(3)
Castellares et al. 2018CASTELLARES F, FERRARI SLP & LEMONTE AJ. 2018. On the Bell distribution and its associated regression model for count data. Applied Mathematical Modelling 56: 172-185. showed that the maximum likelihood estimator (MLE) θ̂ of θ has a closed-form expression and is given by
θ̂=W0(Y),(4)
where W0(.) is the Lambert W function (Corless et al. 1996CORLESS RM, GONNET GH, HARE DEG, JEFFREY DJ & KNUTH DE. 1996. On the LambertW function. Adv Comput Math 5(1): 329-359.), which can be computed via the W(.) function of the “LambertW“ package (Goerg 2011GOERG GM. 2011. Lambert W random variables - a new family of generalized skewed distributions with applications to risk estimation. Ann Appl Stat 5(3): 2197-2230., Goerg 2016GOERG GM. 2016. LambertW: An R package for Lambert W × F Random Variables. URL https://CRAN.R-project.org/package=LambertW. R package version 0.6.4.
https://CRAN.R-project.org/package=Lambe...
) in R, and Y=i=1nYi/n is the sample mean.

Castellares et al. 2018CASTELLARES F, FERRARI SLP & LEMONTE AJ. 2018. On the Bell distribution and its associated regression model for count data. Applied Mathematical Modelling 56: 172-185. also presented an alternative and useful (mainly in a regression modeling framework, in which it is very common to model the mean of the response variable as a function of several other variables, also called explanatory variables or regressors) reparametrization of the Bell distribution, where μ=E[Y]=θeθ and, hence, θ=W0(μ). In this case, the PMF, mean and variance of YBell(μ ), μ>0, are written, respectively, as

P(Y=y|μ)=exp{eW0(μ)+1}[W0(μ)]yByy!,fory=0,1,2,,
E[Y]=μandVar[Y]=μ[1+W0(μ)].
Furthermore, it can be easily shown that the MLE μ̂ of μ is the sample mean, i.e., μ̂=Y=i=1nYi/n.

New control charts

Suppose that a process (e.g., an industrial process) generates events (e.g., nonconformities or defects) according to a Bell(θ ) distribution. Letting Y denote the number of events per process unit, the PMF of Y is given by (1). Also, let Y1,Y2,,Yn be a random sample of size n from YBell(θ ), and consider T=i=1nYi and Y=T/n=i=1nYi/n, which represent, respectively, the total number of events and the average number of events per unit. Hence, it follows from Equation (2), using the properties of the expected value and variance operators, that the exact mean and variance of T and Y are given by

E [ T ] = n θ e θ , Var [ T ] = n θ ( 1 + θ ) e θ , E [ Y ] = θ e θ , Var [ Y ] = θ ( 1 + θ ) e θ n .

The above quantities can be used to construct the proposed control charts in the usual manner for Shewhart charts. Assuming that a standard value for θ is available, the center lines and Lσ control limits 2 2 In the usual Six Sigma quality control program, L=3. for each chart - namely, the Bell-c (total number of events per unit) and Bell-u (average number of events per unit) charts - are shown in Table I. It is worth pointing out that since the Bell distribution approaches the Poisson distribution for small values of θ, the control limits presented in Table I also approach the control limits for special cases of the c and u charts derived from the Poisson distribution. Moreover, similar to the c chart, the Bell-c chart should not be used in the cases where the sample sizes are unequal, because both the center line and the control limits of the Bell-c chart will vary with the sample size. Therefore, the Bell-u chart is recommended over the Bell-c chart for variable sample size as the former is easier to interpret for this scenario (the center line of the Bell-u chart will not vary across samples).

Table I
Control limits for the Bell-c and Bell-u charts (standards given). UCL = upper control limit, CL = center line, LCL = lower control limit.

In order to build the so-called Bell charts, considering the lack of knowledge about θ, we can apply, for example, the MLE (4), using all the available m samples, in the place of θ (plug-in approach) in the control limits shown in Table 1. In this case, the obtained control limits are commonly treated as trial control limits (Montgomery 2013).

Since the θ parameter contains little or no direct interpretation, it can be difficult for industry professionals, such as engineers and technicians, to provide a standard (or reference) value. In order to overcome this difficulty, we can use a convenient reparametrization of the Bell distribution in terms of the mean μ=E[Y]=θeθ. This parametrization was described at the end of the previous section. The derived control limits for the proposed Bell charts, considering this alternative reparametrization, are displayed in Table 2. When no standard is given, then μ can be estimated by the overall sample mean, that is, μ̂=Y=1mnj=1mi=1nYij (in the cases of the Bell-c and Bell-u charts with equal sample sizes) or μ̂=Y=j=1mi=1njYijj=1mnj (in the case of the Bell-u chart with unequal sample sizes 3 3 Note that, in this case, the control limits of the Bell-u chart will also vary with the sample size, since we should replace n by nj in their expressions. But alternatively, Montgomery 2013 suggested to base the control limit calculations on an average sample size n‾, which results in constant limits and is particularly helpful if the charts will be presented to management. ), where Yij denotes the number of events for the i-th inspection unit of the j-th sample. Both estimators may be easily proved to be unbiased estimators for μ.

Therefore, because of their practical advantage as well as simplicity, we shall hereafter consider the control limits provided in Table 2.

Table II
Control limits for the Bell-c and Bell-u charts, considering a different reparametrization of the Bell distribution (standards given).

Performance evaluation

In this section, we conduct Monte Carlo (MC) simulation studies to assess the performance of the proposed Bell charts, as well as of some existing/traditional control charts for count data (namely, the Poisson-based c and u charts, and the COM-Poisson-based cmpc and cmpu charts) when the true data-generating process is Bell distributed. All simulations and computations were performed using the R software version 3.6.1. Interested readers can email the authors for the corresponding R codes.

The average run length (ARL) is a measure commonly used to evaluate the performance of control charts. The in-control ARL, also denoted as ARL0, is defined as the average number of samples (or monitoring points) before a signal is given (that is, a single point falls outside the control limits), assuming that the process is in control; while the out-of-control ARL (or ARL1) is the average number of samples that are taken until a mean shift is observed when the process is out of control (Saghir & Lin 2015).

Let us assume that YBell(μ) comes from a process with in-control average nonconformities, and let μs be the shifted mean nonconformities parameter after a shift occurs in μ, that is, YBell(μs). For the proposed Bell charts, the ARL0 is defined as

ARL0=1α,
where α=1P(LCL>Y>UCL|μ). While the ARL1 is given by
ARL1=11β,
with β=P(LCL<Y<UCL|μs).

For instance, in the usual Six Sigma programs, α=0.0027 and, thus, ARL0=1/0.0027370. That is, even if the process is in control, an out-of-control signal will be given every 370 samples, on average (Montgomery 2013). On the other hand, ARL1 values near one are desirable, mainly for large-size shifts in a process mean (Shewhart control charts).

In-control ARL

Without loss of generality, in this subsection we consider a Bell process with mean nonconformities rates: μ=3 and 15, as well as two different values for the probability of false alarm: α=0.01 and 0.1. These α values correspond, respectively, to ARL0=100 and 10. We also assume three sample sizes for the process: n=50, 100 and 500 (or n=50, 100 and 500, in the case of the Bell-u chart with variable sample size 4 4 Here, each sample size nj is obtained by nj=n‾+kj, for j=1,…,m, where kj is a random integer constant ranging over the interval [−n‾/5;n‾/5]. ), even as processes with three different sample quantities in phase 1: m=20, 50 and 100.

Figures 1-4 and 9-12 (see Appendix A) show the results obtained from 1,000 MC simulations (or replicates) with m*=5,000 phase 2 samples each 5 5 As pointed out by Montgomery 2013, the phases 1 and 2 of control chart application have different and distinct objectives. In phase 1, a set of process data is gathered and analyzed at once, constructing trial control limits to determine whether the process was in control when the initial m samples were collected (retrospective analysis). On the other hand, in phase 2, the chart built from a “clean“ set of process data exhibiting control (reliable control limits), is used for monitoring future production (prospective process monitoring). , performed for each scenario studied. That is, by varying the number of phase 1 samples, the sample size, the Bell distribution parameter, and the probability of false alarm. In particular, the ARL0 results in Figures 1-4 concern the Bell-c chart with equal sample sizes, while Figures 9-12 contain the ARL0 values of the Bell-u chart with unequal sample sizes. In all these figures, the dashed line represents the nominal ARL0 value, and the asterisk inside the boxplot indicates the average ARL0 estimate of the 1,000 simulations.

Despite some slight to moderate discrepancies from the target ARL0 value in some cases, which is indeed expected due to the parameter estimation effect 6 6 For a further discussion on this relevant issue, see, e.g., the review paper by Jensen et al. 2006. , the results presented in Figures 1-4 and 9-12 seem to indicate the good performance of the proposed Bell charts. Note that, in general, the results improve, i.e., the ARL0 values approach the nominal one when both m and n (or n) increase.


Figure 1. The ARL0 values of the Bell-c chart for various m and n (𝛍 = 3 and 𝛂 = 0.01).

Figure 2. The ARL0 values of the Bell-c chart for various m and n (𝛍 = 15 and 𝛂 = 0.01).

Figure 3. The ARL0 values of the Bell-c chart for various m and n (𝛍 = 3 and 𝛂 = 0.1).

Figure 4. The ARL0 values of the Bell-c chart for various m and n (𝛍 = 15 and 𝛂 = 0.1).

Out-of-control ARL

In this subsection, we evaluate the detection ability of the proposed Bell charts by means of ARL1, for the same scenarios as in the “In-control ARL“ subsection. Due to space constraints, we consider only shifts at three levels that represent percentage increases p in the nonconformities rate μ of the Bell process. The hypothesized levels are as follows: p=0.5% (μs=3.015 and 15.075), 1% (μs=3.03 and 15.15) and 10% (μs=3.3 and 16.5).

The ARL1 values from the 1,000 MC simulations with m*=5,000 samples each, are presented in Tables 3 and 6 (see Appendix A) for the Bell-c chart with equal sample sizes and the Bell-u chart with unequal sample sizes, respectively. From these tables, it can be clearly seen that the ARL1 on average decreases with increasing p. Note also that, for both control charts based on the Bell distribution, the ARL1 values are quite close to one when p1%, and regardless of the m, n (or n), μ and α values.

Table III
Mean (standard deviation in parentheses) values of ARL1 of the Bell-c chart, for the different scenarios studied.

The impact of Bell data on some standard control charts

The Bell control chart theory introduced here may be applied in practical situations as a useful and exciting alternative to the well-known c and u charts developed via the Poisson assumption, and the cmpc and cmpu charts derived from the COM-Poisson distribution, among others, when data are overdispersed. Hence, in-control count data with overdispersion can be modeled well via the proposed Bell charts.

In this subsection, we apply the above-mentioned Poisson- and COM-Poisson-based control charts to sample data generated from the Bell distribution. The aim is to investigate, employing simulations, the performance (in terms of ARL0 and ARL1) of these well-known control charts when they are applied to the Bell processes. For comparison purposes, we also use the Bell-c and Bell-u charts for the cases when the samples are of equal and unequal sizes. The simulations were conducted using the same settings as described in the previous subsections. However, due to space and time limitations, we consider only m=100 and n=100 (or n=100). Of course, it may also be of interest to examine further the performance of the Bell-based control charts to analyze count data generated from other distributions, including the Poisson and COM-Poisson distributions with some level of dispersion. Although we provide some preliminary results (indeed, based on a single artificial sample only) in the “Poisson data“ and “COM-Poisson data“ subsubsections, this issue will be better addressed in our future work.

Tables IV and V display, respectively, the ARL0 and ARL1 results for the control charts with equal sample sizes (or c-type control charts). While the performance measures for the control charts with unequal sample sizes (or u-type control charts) are shown in Tables AII and AIII of Appendix A. From these tables, it can be seen that, regardless of the scenario configuration, the Poisson-based control charts produced poor results (i.e., gave many false alarms) for the in-control Bell samples. This result is in agreement with some authors, e.g., Saghir & Lin (2015). On the other hand, the COM-Poisson-based control charts generally provided reasonable to good results for both the in-control and out-of-control Bell samples (notice that such results are sometimes close to the ones obtained from the Bell-based control charts). This finding is somewhat expected as the COM-Poisson distribution has one extra parameter which adds some flexibility to the model and, consequently, the corresponding control chart.

Finally, it is worth pointing out that we obtained the CL and control limits (UCL and LCL) by finding the respective mean and standard deviation (SD) for each case, thus calculating the limits by the “Mean±L×SD“ rule of thumb. For the Poisson and Bell distributions, the mean is easily determined. Meanwhile, the COM-Poisson summary statistics have complicated formulas, but can be readily computed using the “compoisson“ package (Dunn 2012DUNN J. 2012. compoisson: Conway-Maxwell-Poisson Distribution. URL https://CRAN.R-project.org/package=compoisson. R package version 0.3.
https://CRAN.R-project.org/package=compo...
) in R.

Table IV
Mean (standard deviation in parentheses) values of ARL0 of the competitor c-type control charts when the true data-generating process is Bell(𝛍) distributed, for some 𝛍 and 𝛂 values (m=100 and n=100).
Table V
Mean (standard deviation in parentheses) values of ARL1 of the competitor c-type control charts when the true data-generating process is Bell(𝛍) distributed, for some 𝛍, 𝛂 and p values (m=100 and n=100).

Applications

It is well-known that there are practical situations in which the nature of the production process allows an item or product to contain several nonconformities (defects) and not be classified as nonconforming. For example, the manufacture of personal computers might have one or more very minor flaws in the cabinet finish, but since these flaws do not seriously affect the unit’s functional operation, it could be classified as conforming (Montgomery 2013). Thus, in this section, we apply SPC charts to inspect the total or average number of defects per sample, and determine if the process is in control.

Simulated data examples

Here, we consider three artificial data sets containing some levels of overdispersion to illustrate the usefulness of the Bell-c and Bell-u charts, as well as their ability to produce bounds comparable to those established by the classical SPC theory.

In the cases where the samples are of equal sizes, the control charts chosen for comparison were the traditional c chart and the cmpc chart. The c chart assumes that the Poisson distribution well models the number of defects per inspection unit. However, according to Sellers 2012SELLERS KF. 2012. A generalized statistical control chart for over- or under-dispersed data. Qual Reliab Eng Int 28(1): 59-65., the c chart, while having a better performance for large samples, does not work well in the cases of overdispersion. Thus, the author developed a control chart using a more general count distribution that relaxes the equidispersion assumption of the Poisson distribution. The so-called cmpc chart, based on the assumption of a COM-Poisson distribution, has shown excellent results when fitting overdispersed count data. Therefore, it can also be used as a reference tool to compare the performance of the Bell-c chart in this work. For the cases when the samples are of unequal sizes, the control charts selected for the comparison were the u chart and the cmpu chart.

Similar to the “Performance evaluation“ section, all the analyses were done using the R programming language, and it was assumed L=3.

Poisson data

In this application, we generated m=50 samples of size n=100 from a Poisson distribution with parameter λ=3. The idea here was to compare the performance of the Bell-c and c charts, mainly in phase 2. More specifically, in phase 1, we calculated the control limits of both charts from the aforementioned m=50 samples. While in phase 2 we simulated new m*=70 samples of size n=100 from the same distribution (Poisson), but with the last 20 samples disturbed by λs=3.3. Then, for both estimated control charts, the occurrence (or not) of false alarms in the first 50 samples and the number of runs until the first of the 20 nonconforming samples could be detected.


Figure 5. Performance comparison of Bell-c and c charts, both constructed from Poisson distribution samples (phase 2).

Figure 5 illustrates this application, where the points (samples) to the right of the blue dashed vertical line are the disturbing observations (out-of-control samples). Furthermore, the red solid horizontal lines represent the UCL and LCL, the blue solid horizontal line indicates the CL, and the green points correspond to the out-of-control signals.

This figure points out for this particular artificial example that the Bell-based control chart produced better results than the Poisson-based control chart for the in-control samples, even though the true data-generating process is not Bell distributed. This was due to the fact that the Bell-c chart identified a single false alarm, while the c chart itself found three.

Finally, for the samples that are known to be out-of-control, both charts performed similarly, showing that the Bell-c chart may be a useful alternative to the traditional c chart.

COM-Poisson data

As in the “Poisson data“ subsubsection, in this application we generated m=50 samples of equal size n=100, but this time, from a COM-Poisson distribution with parameters λ=1.5 and ν=0.5. Here, the idea also was to compare the performance of the Bell-c and cmpc charts in phase 2 monitoring. Phases 1 and 2 analyses were carried out in the same way as before, with the exception that the last 20 samples (in a total of m*=70 new samples) were perturbed by νs=0.75.

Figure 6 shows the estimated control charts applied to the phase 2 samples, from which it can be observed that similar to the previous subsubsection, the Bell-based control chart provided better results than the COM-Poisson-based control chart for the in-control samples. Although the COM-Poisson represents the true data-generating distribution, the Bell-c chart gave a single false alarm, while the cmpc chart itself found three. Such excellent performance of the Bell-c chart is maintained when analyzing out-of-control samples, showing that the Bell-c chart can also be a useful, as well as simpler alternative to the cmpc chart.


Figure 6. Performance comparison of Bell-c and cmpc charts, both constructed from COM-Poisson distribution samples (phase 2).

Bell data

In this last application, we simulated m=50 samples of unequal sizes (with n=100, and each sample size obtained by following the same approach described in footnote 4) from a Bell distribution with parameter μ=3, and compared the performance of the Bell-u chart against the u chart and cmpu chart. Once again, as in the previous subsubsections, we considered the application to phase 2 samples. The idea here was to evaluate both Poisson- and COM-Poisson-based control charts, compared to the Bell-based control chart performance when the data are actually from a Bell distribution.

From the analysis of Figure 7, it can be noticed that, in the case of conforming (in-control) samples, the Bell-u chart did not produce false alarms as the cmpu chart, while the u chart identified four false alarms. However, the Bell-based control chart performed slightly better than the COM-Poisson-based control chart when in the presence of nonconforming samples, where the Bell out-of-control samples were generated with μs=3.3.


Figure 7. Performance comparison of Bell-u, u and cmpu charts, all constructed from Bell distribution samples (phase 2).

Real data example

In this subsection, we considered the Bell-c, cmpc and c charts to analyze the real data set related to the number of nonconformities in samples of 100 printed circuit boards; see Chapter 7 of Montgomery 2013MONTGOMERY DC. 2013. Introduction to Statistical Quality Control. 7th ed. John Wiley & Sons.. The data analysis initially comprehends 20 samples with overdispersion (variance/mean 1.21), which suggests that the process can be described by a COM-Poisson distribution with λ̂=2.87 and ν̂=0.37 (Sellers 2012). Further, Alevizakos & KoukouvinosLEVIZAKOS V & KOUKOUVINOS C. 2022. A progressive mean control chart for COM-Poisson distribution. Communications in Statistics-Simulation and Computation 51(3): 849-867. (2022) considered 20 additional samples, also with overdispersion (variance/mean 2.00), simulated from a COM-Poisson distribution with λ=3.09 and ν=0.37, which implies a deterioration in the process; for more details, see Table 10 of Alevizakos & Koukouvinos (2022). Here, we checked the goodness-of-fit of the Bell distribution (μ̂20) through the Pearson’s χ2 test application with the first 20 samples, where we found p-value = 0.57. We also used the same test procedure for the last 20 samples, in which we obtained a p-value = 0.98 (μ̂24).

Figure 8 shows the estimated Bell-c, cmpc, and c charts, applied to the printed circuit boards data set, with control limits calculated from the initial 20 samples. That is, the left part of the green dashed vertical line is precisely the monitoring of the first 20 samples (phase 1), while the right part of this line represents the monitoring of the remaining 20 samples, which did not participate in the construction of the control limits (phase 2).

It can be observed from Figure 8 that the Bell-c chart 7 7 The R code used for generating this Bell-c chart is available in Appendix B. detected the process mean shift upwards as soon as it occurred. This detection was unlike the cmpc chart, which did not identify any nonconformity in the process, neither on the left side nor on the right side of the green dashed vertical line. The c chart, as expected, produced false alarms just to the left of this line. This result is because, as we know, the c chart is not flexible enough to accommodate overdispersed data. Furthermore, it is essential to note that Alevizakos & Koukouvinos (2022) also considered the data set analyzed by Sellers 2012SELLERS KF. 2012. A generalized statistical control chart for over- or under-dispersed data. Qual Reliab Eng Int 28(1): 59-65.; they had already observed that the proposed chart did not point to declines over the UCL, which is undesirable. In our case, with the Bell-c chart, the first warning appeared on unit #22, with three units outside the UCL correctly showing process deterioration. This result confirms that the Bell-c chart is an excellent alternative to usual control charts, such as the cmpc and c charts.


Figure 8. Performance comparison of Bell-c, cmpc and c charts, constructed from the printed circuit boards data set.

Final remarks

The one-parameter discrete Bell distribution, introduced by Castellares et al. 2018CASTELLARES F, FERRARI SLP & LEMONTE AJ. 2018. On the Bell distribution and its associated regression model for count data. Applied Mathematical Modelling 56: 172-185., has been established as a viable alternative to the Poisson distribution for analyzing count data with overdispersion. As demonstrated by the examples, we provided in the “Applications“ section. The Bell-based control charts can be useful choices/alternatives to the traditional ones (e.g., the c and u charts derived from the Poisson distribution) for process monitoring of overdispersed count data. Notably, the proposed Bell charts provided satisfactory results (few false alarms and fast process change/shift detection) even when the sample data were generated from the two-parameter (and thus, more complex) COM-Poisson distribution.

In the “Performance evaluation“ section, simulation results also demonstrated the excellent performance of the proposed Bell charts under different scenarios (i.e., by varying the number of phase 1 samples, the sample size, the Bell distribution parameter and the probability of false alarm). In this case, the ARL criterion was used to measure control chart performance.

In the applications provided in the “Applications“ section, we determined the control limits based on the standard three-sigma rule (Six Sigma program). Nevertheless, alternatively, like in the “Performance evaluation“ section, one can obtain the UCL and LCL via a specified type I error rate α, so that the probability of finding sample points outside the control region is equal to α. Finally, we considered a real data set related to the number of nonconformities in samples of 100 printed circuit boards, which demonstrated that our proposed Bell-c chart outperformed the standard charts when showing process deterioration.

There are many extensions of the current work. For instance, following Braun 1999BRAUN WJ. 1999. Run length distributions for estimated attributes charts. Metrika 50(2): 121-129., Chakraborti & Human (2008)CHAKRABORTI S & HUMAN SW. 2008. Properties and performance of the c-chart for attributes data. J Appl Stat 35(1): 89-100., Castagliola & Wu (2012)CASTAGLIOLA P, WU S, KHOO MBC & CHAKRABORTI S. 2014. Synthetic phase II Shewhart-type attributes control charts when process parameters are estimated. Qual Reliab Eng Int 30(3): 315-335., and Castagliola et al. 2014CASTAGLIOLA P & WU S. 2012. 2012. Design of the c and np charts when the parameters are estimated. Int J Reliab Qual Saf Eng 19(02): 1250010., we intend to develop optimization designs in order to increase the performance of the Bell control charts when estimating the unknown process parameter μ using phase 1 data.

Acknowledgments

The authors thank the anonymous referees, the editor, and the associate editor for all their useful comments and suggestions which improved remarkably the quality of this article. Laion L. Boaventura acknowledges support from Bahia State Research Foundation (FAPESB, Proc. 19.573.201.5418). Paulo H. Ferreira acknowledges support from the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, grant no. 307221/2022-9). The research of Rosemeire L. Fiaccone has been supported by FAPESB (n. app 0071/2016). Francisco Louzada is supported by CNPq (grant no. 301976/2017-1) and Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP, grant no. 2013/07375-0).

  • 1
    It is worth noting that the Bell number By is the y-th moment of a Poisson distribution with parameter equal to one; see Remark 1 of Castellares et al. 2018CASTELLARES F, FERRARI SLP & LEMONTE AJ. 2018. On the Bell distribution and its associated regression model for count data. Applied Mathematical Modelling 56: 172-185..
  • 2
    In the usual Six Sigma quality control program, L=3.
  • 3
    Note that, in this case, the control limits of the Bell-u chart will also vary with the sample size, since we should replace n by nj in their expressions. But alternatively, Montgomery 2013MONTGOMERY DC. 2013. Introduction to Statistical Quality Control. 7th ed. John Wiley & Sons. suggested to base the control limit calculations on an average sample size n, which results in constant limits and is particularly helpful if the charts will be presented to management.
  • 4
    Here, each sample size nj is obtained by nj=n+kj, for j=1,,m, where kj is a random integer constant ranging over the interval [n/5;n/5].
  • 5
    As pointed out by Montgomery 2013MONTGOMERY DC. 2013. Introduction to Statistical Quality Control. 7th ed. John Wiley & Sons., the phases 1 and 2 of control chart application have different and distinct objectives. In phase 1, a set of process data is gathered and analyzed at once, constructing trial control limits to determine whether the process was in control when the initial m samples were collected (retrospective analysis). On the other hand, in phase 2, the chart built from a “clean“ set of process data exhibiting control (reliable control limits), is used for monitoring future production (prospective process monitoring).
  • 6
    For a further discussion on this relevant issue, see, e.g., the review paper by Jensen et al. 2006JENSEN WA, JONES-FARMER LA, CHAMP CW & WOODALL WH. 2006. Effects of parameter estimation on control chart properties: a literature review. J Qual Technol 38(4): 349-364..
  • 7
    The R code used for generating this Bell-c chart is available in Appendix B.

ACKNOWLEDGMENTS

The authors thank the anonymous referees, the editor, and the associate editor for all their useful comments and suggestions which improved remarkably the quality of this article. Laion L. Boaventura acknowledges support from Bahia State Research Foundation (FAPESB, Proc. 19.573.201.5418). Paulo H. Ferreira acknowledges support from the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, grant no. 307221/2022-9). The research of Rosemeire L. Fiaccone has been supported by FAPESB (n. app 0071/2016). Francisco Louzada is supported by CNPq (grant no. 301976/2017-1) and Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP, grant no. 2013/07375-0).

  • ALBERS W. 2011. Control charts for health care monitoring under overdispersion. Metrika 74(1): 67-83.
  • LEVIZAKOS V & KOUKOUVINOS C. 2022. A progressive mean control chart for COM-Poisson distribution. Communications in Statistics-Simulation and Computation 51(3): 849-867.
  • AVCI E, ALTÜRK S & SOYLU EN. 2015. Comparison count regression models for overdispersed alga data. Int J Res Rev Appl Sci 25(1): 1-5.
  • BELL ET. 1934a. Exponential numbers. The American Mathematical Monthly 41(7): 411-419.
  • BELL ET. 1934b. Exponential polynomials. Ann Math 35(2): 258-277.
  • BRAUN WJ. 1999. Run length distributions for estimated attributes charts. Metrika 50(2): 121-129.
  • CASTAGLIOLA P & WU S. 2012. 2012. Design of the c and np charts when the parameters are estimated. Int J Reliab Qual Saf Eng 19(02): 1250010.
  • CASTAGLIOLA P, WU S, KHOO MBC & CHAKRABORTI S. 2014. Synthetic phase II Shewhart-type attributes control charts when process parameters are estimated. Qual Reliab Eng Int 30(3): 315-335.
  • CASTELLARES F, FERRARI SLP & LEMONTE AJ. 2018. On the Bell distribution and its associated regression model for count data. Applied Mathematical Modelling 56: 172-185.
  • CHAKRABORTI S & HUMAN SW. 2008. Properties and performance of the c-chart for attributes data. J Appl Stat 35(1): 89-100.
  • CHENG SS & YU FJ. 2013. A CUSUM control chart to monitor wafer quality. In: Proceedings of World Academy of Science, Engineering and Technology. 78. p. 1300. Citeseer.
  • COLY S, YAO AF, ABRIAL D & CHARRAS-GARRIDO M. 2016. Distributions to model overdispersed count data. Journal de la Société Française de Statistique 157(2): 39-64.
  • CORLESS RM, GONNET GH, HARE DEG, JEFFREY DJ & KNUTH DE. 1996. On the LambertW function. Adv Comput Math 5(1): 329-359.
  • DUNN J. 2012. compoisson: Conway-Maxwell-Poisson Distribution. URL https://CRAN.R-project.org/package=compoisson R package version 0.3.
    » https://CRAN.R-project.org/package=compoisson
  • DUTANG C, GOULET V & PIGEON M. 2008. actuar: An R package for Actuarial Science. J Stat Soft 25(7): 1-37.
  • FAMOYE F. 1994. Statistical control charts for shifted generalized Poisson distribution. J Ital Stat Soc 3(3): 339-354.
  • GOERG GM. 2011. Lambert W random variables - a new family of generalized skewed distributions with applications to risk estimation. Ann Appl Stat 5(3): 2197-2230.
  • GOERG GM. 2016. LambertW: An R package for Lambert W × F Random Variables. URL https://CRAN.R-project.org/package=LambertW R package version 0.6.4.
    » https://CRAN.R-project.org/package=LambertW
  • HOUGAARD P, LEE MLT & WHITMORE GA. 1997. Analysis of overdispersed count data by mixtures of Poisson variables and Poisson processes. Biometrics 53(4): 1225-1238.
  • JENSEN WA, JONES-FARMER LA, CHAMP CW & WOODALL WH. 2006. Effects of parameter estimation on control chart properties: a literature review. J Qual Technol 38(4): 349-364.
  • KAMINSKY FC, BENNEYAN JC, DAVIS RD & BURKE RJ. 1992. Statistical control charts based on a geometric distribution. J Qual Technol 24(2): 63-69.
  • MOHAMMED MA & LANEY D. 2006. Overdispersion in health care performance data: Laney’s approach. BMJ Quality & Safety 15(5): 383-384.
  • MONTGOMERY DC. 2013. Introduction to Statistical Quality Control. 7th ed. John Wiley & Sons.
  • OKAMURA H, PUNT AE & AMANO T. 2012. A generalized model for overdispersed count data. Pop Ecol 54(3): 467-474.
  • R CORE TEAM. 2018. R: A Language and Environment for Statistical Computing. (Version 3.6.1). R Foundation for Statistical Computing. Vienna, Austria.
  • SAGHIR A & LIN Z. 2015. Control charts for dispersed count data: an overview. Qual Reliab Eng Int 31(5): 725-739.
  • SELLERS KF. 2012. A generalized statistical control chart for over- or under-dispersed data. Qual Reliab Eng Int 28(1): 59-65.
  • SHEAFFER RL & LEAVENWORTH RS. 1976. The negative binomial model for counts in units of varying size. J Qual Technol 8(3): 158-163.
  • SPIEGELHALTER DJ. 2005. Handling over-dispersion of performance indicators. BMJ Quality & Safety 14(5): 347-351.

APPENDIX A

Figure A1
The ARL0 values of the Bell-u chart for various m and 𝐧 (𝛍 = 3 and 𝛂 = 0.01).
Figure A2
The ARL0 values of the Bell-u chart for various m and 𝐧 (𝛍 = 15 and 𝛂 = 0.01).
Figure A3
The ARL0 values of the Bell-u chart for various m and 𝐧 (𝛍 = 3 and 𝛂 = 0.1).
Figure A4
The ARL0 values of the Bell-u chart for various m and 𝐧 (𝛍 = 15 and 𝛂 = 0.1).
Table AI
Mean (standard deviation in parentheses) values of ARL1 of the Bell-u chart, for the different scenarios studied.
Table AII
Mean (standard deviation in parentheses) values of ARL0 of the competitor u-type control charts when the true data-generating process is Bell(𝛍) distributed, for some 𝛍 and 𝛂 values (m=100 and 𝐧=100).
Table AIII
Mean (standard deviation in parentheses) values of ARL1 of the competitor u-type control charts, when the true data-generating process is Bell(𝛍) distributed, for some 𝛍, 𝛂 and p values (m=100 and 𝐧=100).

Appendix B

Publication Dates

  • Publication in this collection
    05 June 2023
  • Date of issue
    2023

History

  • Received
    3 Mar 2020
  • Accepted
    30 Aug 2020
Academia Brasileira de Ciências Rua Anfilófio de Carvalho, 29, 3º andar, 20030-060 Rio de Janeiro RJ Brasil, Tel: +55 21 3907-8100 - Rio de Janeiro - RJ - Brazil
E-mail: aabc@abc.org.br