ALTERNATIVE METHODS TO MULTIPLE CORRESPONDENCE ANALYSIS IN RECONSTRUCTING THE RELEVANT INFORMATION IN A BURT’S TABLE

In this work, the reconstruction of the Burt’s table, Greenacre (1988)’s Joint Correspondence Analysis (JCA), and Gower & Hand (1996)’s Extended Matching Coefficient (EMC) are compared to Multiple Correspondence Analysis (MCA) in order to check the quality of the methods. In particular, for the whole table, the ability is considered separately the diagonal, and the off-diagonal tables, that is the ability to describe either each character’s distribution or the interaction between pairs of characters, or both. The theoretical aspects are discussed first, and finally the results obtained in an application are shown and discussed.


INTRODUCTION
Both Multicriteria Analysis and Multicriteria Decision Models [4,37] are tools largely adopted in operational research, in particular when dealing with Knowledge Discovering in Databases [14].In present days it may be of high interest to deal with qualitative unstructured data, whose treatment may be more complex.Studies in this framework are found in recent operational research literature [16,17,30,40].In this context, data reduction through exploratory multidimensional scaling may contribute to clarify the data at hand, by revealing structures and factors.In particular, factors, together with the observed characters most associated to them, may lead to a consistent dimension reduction and at the same time to the ability to select the most appropriate characters to take into account for further, more focused investigation.Thus, the identification of the proper dimension of a data table may be a topic of investigation per se.
In exploratory multidimensional scaling the identification of the proper dimension of the solution is the basis to define a threshold between relevant information and residuals.The relevant information is also tied to the possibility of interpreting the factors according to the paradigms of the methods at hand: in the linear case, the percentage of explained inertia is the most widely used.Thus, to take into account a large share of inertia is the most evident rough method that may be used and a higher-dimensional solution is normally preferred rather than a smaller one only if its corresponding inertia is significantly larger than a smaller-dimensional solution.Tied to this aspect, the reconstruction of the original data table according to a lower rank matrix is of relevance, since a good reconstruction of the data obtained this way is helpful to better understand to what extent the reduction in dimension, through the use of factors, is a reasonable approximation of the original data.
In this paper, we consider the special case of qualitative data, that are usually summarized by the so-called Burt's matrix, the super-contingency table that cross-tabulates all characters taken into account.Multiple Correspondence Analysis (MCA) [7,19] is the best known exploratory factor analysis method to deal with it, but alternative methods are proposed in literature, based on different rationale.Critics to MCA emphasize the misuse of the chi-square metrics [18], a metrics that for contingency tables finds its rationale in the partition of the chi-square in independent components [27,35,44], thus on the deviation from the expectation.Now, to compute distances between lines in both the indicator matrix and its square, the Burt's table, such metrics is tremendously biased by the obvious (squared) differences between levels belonging to the same character [20,21].In addition, rare levels raise their importance in the computation but enhance aspects that reveal being useless, since the chi-square statistics may not be applied to these tables; hence, its use is hardly justifiable [18].Eventually, it is known since long the problem of the underestimation of the inertia explained by the factors, that deserve being re-evaluated in some way [2,8,19].Indeed, applying the chi-square metrics to such a table, the highest contributions to the total inertia result from the block-diagonal tables crossing each character with itself, which is information without noticeable value, since this way the expected values provide the maximum deviation from expectation.Along with this, a problem results in the unpredictability of the partial data reconstruction of the Burt's table, as already put in evidence by [10].The problem may be relevant in qualitative discriminant analysis [39], which is based on MCA coordinates.Here, a bad reconstruction would prevent the reduction of the number of factors to take into account.
In this paper, we consider two different alternatives: the Joint Correspondence Analysis (JCA, [20]) and the Principal Component Analysis (PCA, [26]) of the Extended Matching Coefficient matrix (EMC, [18]), which are suggested by the corresponding authors as a solution.It must be noted that the aims of both MCA and EMC are the same: to perform a kind of PCA on qualitative data, i.e. to find uncorrelated quantitative characters giving independent ordinations to the units, that optimize the explained inertia.It is the definition of inertia that is different, bearing in mind that EMC aims at fixing the emphasis given to the rare levels of MCA by the chi-square metrics.Since the solutions of both methods are based on the eigendecomposition, their different dimensional solutions are encapsulated, so that the reconstruction of the original data may be seen as a sum of 1-rank independent tables.On the opposite, JCA aims at finding the best reconstruction of the off-diagonal tables of the Burt's table only, generalizing the Simple Correspondence Analysis to the simultaneous analysis of several 2-way contingency tables.The idea is analogous to the one of [42] to fit the off-diagonal elements of a correlation matrix and is obtained by numerical optimization.Therefore, this time different dimensional solutions are not encapsulated.
To compare the results, we consider the reconstruction quality only.At the end, these methods will be applied to a very small table, taken from studies in linguistics [32].

Singular Value Decomposition
We may ground our further discussion on the well-known Singular Value Decomposition (SVD, [1,19]) theorem, which states Theorem 1.Any real matrix X may be decomposed as X = U 1/2 V , with the diagonal matrix of the real non-negative eigenvalues of X X , U the orthogonal matrix of the corresponding eigenvectors, and V the matrix of eigenvectors of X X (with the same eigenvalues), with both constraints U U = I and V V = I .

This theorem corresponds to the reconstruction formula of an
or, in vector notation on which the Eckart & Young's theorem [13] is based: Theorem 2. (Eckart and Young) The s-rank reconstruction of any real matrix X , with s < S, the rank of X , once its singular values are sorted in decreasing order, is the best one in the least-squares sense.
This means that, for every s < S, the matrix H s solves the problem to approximate a matrix X by another matrix H of lower rank at the best in the least-squares sense, thus by minimizing As it is trivial that trace( H s H s ) = s α=1 λ α , the minimum of (2) reached by H s equals trace It is well-known that Principal Component Analysis (PCA, [26]) finds its rationale in this theorem, once the data table is standardized according to z i j = , with x j and σ j the average and the standard deviation of the j -th character; indeed, this way Z Z = cor(X ) = C, the matrix of correlations between the columns of X .Thus, given the PCA of the correlation matrix C, with and V as their diagonal matrix of eigenvalues and unit matrix of eigenvectors respectively, and given U as the unit eigenvectors of Z Z , the reconstruction formula (1) becomes For correspondence analysis, we shall adopt the Generalized Singular Values Decomposition (GSVD, [1,19]), in which two other matrices are involved: Theorem 3. Given two real positive definite matrices M and N , any real matrix X may be decomposed as X = U 1/2 V , under constraints U M U = I and V N V = I.
The solution is given by the SV D of the matrix In this case the minimization problem (2) becomes Using the matrix as trace ( H H ) = s α=1 λ α , the minimization problem is solved as: In the particular case in which both M and N are diagonal, (5) simplifies to Therefore, the exploratory analysis paradigm states that the most relevant information is tied to the largest eigenvalues and the non-relevant to the least ones.The problem of distinguishing among them, that is to identify at least a tentative cutpoint of either the singular-or the eigenvalues sequence, remains a crucial issue, that did not find a univocal solution so far (for PCA see, e.g., [11,25,36]).In Simple Correspondence Analysis (SCA, [7,19]), it seems more easily solved, since the special chi-square metrics adopted allows a useful solution and an easy interpretation of the results, and regarding MCA we shall see that [5,6] propose an interesting method.

Simple Correspondence Analysis
Let N an r × c contingency = F 1/2 G and both matrices X X and X X have a trivial eigenvalue λ 1 = 1 to which the eigenvectors f1 = ( √ r i ) and g1 = ( √ c j ) correspond respectively.It can also be shown that the other nontrivial eigenvalues are always contained between 0 and 1.If we take off the summation the trivial eigenvalue, the s-rank reconstruction formula (s ≤ S = min(r, c)) may be little transformed: Incidentally, we observe that, in order to produce graphics with simultaneous symmetrical representation of both rows and columns, the SCA eigenvectors are usually rescaled by defining as coordinates the vectors ϕ α and ψ α given by respectively.In this case the reconstruction formula (7) becomes Thus, depending on which coordinates one chooses, the reconstruction formula for N becomes: Returning to the optimization problem (6), we may explicitly write the residual as Now, if we consider the 1-rank reconstruction, that is Ĥ1 = E = rc , the residual takes the particular form that is the sum of squared deviations of the observed values from the expected ones under independence divided by the expected ones.This is commonly called the inertia of the table X that is, up to the factor n, the chi-square of N ; indeed, multiplying (10) by n, we get Hence, the decomposition of the inertia along the non-trivial eigenvectors of SC A leads to the decomposition of the chi-square along each corresponding dimension: Based on [44], this result led [35] to check for significance the partial chi-squares associated to each eigenvalue, χ 2 α with d f = (r + c − 2α − 1) degrees of freedom, to detect if there are linear ordinations of both rows and column levels that explain the deviation from expectation.Indeed, [28] proved that the first eigenvalue is larger than the corresponding chi-square and [29] proved that the distribution of the eigenvalues is that of those of a Wishart matrix (W (m 1 − 1)(m 2 − 1), Min(m 1 − 1, m 2 − 1), I ).Nevertheless, [15] finds that the sum of the smallest eigenvalues may be used as a test for their nullity, that leads to the Malinvaud's [31] stopping rule, to test the residuals of each reduced rank solution.Indeed, one may test for significance

Multiple Correspondence Analysis
Let us consider now a qualitative data table X with n observations, with Q as nominal characters and J as the total number of levels, that is J = Q i=1 l i , where l i is the number of levels of the i-th character.It is well-known that MCA of such a matrix is nothing but a generalization of SCA and it is based on SCA of either the indicator matrix Z , whose rows are the units and the columns are all the J levels of the Q considered variables, or the so-called Burt's table B = Z Z that gathers all contingency tables obtained by cross-tabulating all the variables in Z , including the diagonal tables obtained by crossing each variable with itself.The idea is to adopt for both matrices the same optimized decomposition of SCA, namely the GSVD of either Q Indeed, it is evident that the latter is the square of the previous, so that they share the eigenvectors, and the singular values in Burt's case are the squares of those of the indicator matrix case: ν α = μ 2 α .The identity of the eigenvalues allows identical interpretation of the resulting factors.Thus, it makes no difference to perform MCA on either matrix.On the other side, whereas the total inertia of Z is In both cases, the chi-square metrics is adopted so that the interpretation of results ought to be done once again in terms of deviations from expectation.This point deserves some special attention, since the deviation refers to all contingency tables gathered in the Burt's table, including the diagonal ones.The problem is that such diagonal matrices, that "theoretically" would indicate maximum deviation, in this case are just the expected ones, as they cross each character with itself.
As SCA, given a Burt matrix B, MCA may be defined as the weighted least-squares approximation of B by another matrix H of lower rank, minimizing Notice how (13) derives from (10).In terms of the subtables, this may be rewritten as where H is the supermatrix of the H i j .Introducing the norm notation the minimization can be written as In MCA the identification of the true dimension is particularly difficult, despite the MCA is a SCA of a particular table, because the chi-square test has no sense.Indeed, for B a chi-squared statistic may again be calculated as if it were a contingency table, and this simplifies as where χ 2 i j is the chi-squared statistic for the off-diagonal subtable N i j = Z i Z j crossing the i-th and the j -th characters, but without the possibility to make a test.Unfortunately neither Q α nor Q α computed on the indicator matrix Z are chi-square distributed [5], since Z is composed by 0's and 1's, and it is dramatically inflated by the diagonal tables without any real meaning.
The high number of eigenvalues of the MCA, and their corresponding low values, were criticized by the same Benzécri [8] that suggests to reevaluate those larger than their average.Indeed, applying both SCA and MCA to the same two characters, by partitioning the Burt's table Z Z into submatrices it can be shown (ibid.) the relation μ α = 1± √ λ α 2 that holds among μ α , the eigenvalues of Z , and λ α , those of the SCA of the contingency table crossing the two characters.In this case, it is evident that to the eigenvalues λ α = 0 of SCA correspond eigenvalues μ α = 1 2 of Z and ν α = 1 4 of B, whereas to the other two correspond, one of which larger and the other smaller than 1  2 and 1 4 respectively.To generalize this argument to several characters results in admitting to limit attention in MCA only to the eigenvalues larger than their mean, that is Q .The argument is discussed in detail by both Benzécri [8] and Greenacre [20,21].Both authors suggest, in order to get a measure of relative importance of each factor, to re-evaluate the eigenvalues larger than the mean (equal to 1 Q ) according to the formula Benzécri [8] suggests to consider as total inertia the sum of the re-evaluated eigenvalues and to take as percentage of explained inertia the ratio ρ(μ α ) α ρ(μ α ) .This results in a dramatic re-evaluation of the relative importance of the first eigenvalues.On the opposite, Greenacre [20] bases his arguments on the unusefulness to take into account the diagonal block matrices and the utility to limit attention only to the total off-diagonal inertia of the table, that is the sum of squared (non-re-evaluated) eigenvalues minus the diagonal inertia; represented as: Experiments show that the Greenacre's reevaluation is always limited to a share of the total inertia of Burt's table even by taking into consideration all the eigenvalues larger than the mean.This does not affect the interpretation of the factors, that essentially depends upon the eigenvectorsand thus to the contributions of both levels and characters to them -, but affects more the quality of representation of these elements on the factor subspaces, that varies according to the percentage of inertia attributed to each one.Indeed, this is a point that would deserve some specific consideration, in particular in deciding which reevaluation may be better taken into account.In the following, we shall call adjusted MCA the one with re-evaluated inertia, therefore with the coordinates recalculated accordingly.
The reduction in number of the dimension, thanks to both Benzécri's and Greenacre's reevaluations, does not solve the problem of the true dimension of the table.To this question, an answer comes from Ben Ammou & Saporta [5,6]: they suggest to estimate the significance of the eigenvalues of MCA according to their distribution.If the characters are independent, Pesquisa Operacional, Vol.36( 1), 2016 with n .. φ 2 i j ≈ χ 2 (l i −1)(l j −1) , thus, so the expectation of the variance S 2 μ of the eigenvalues is Roughly, one may assume that the interval 1  Q ± 2σ should contain about 95% of the eigenvalues.Indeed, since the kurtosis of the set of eigenvalues is lower than for a normal distribution, the actual proportion is larger than 95%.

Joint Correspondence Analysis
Greenacre [20] criticizes the MCA approach, stating that it is not a natural generalization of SCA, and proposes as more natural his Joint Correspondence Analysis (JCA).With it, he overcomes the MCA's useless fitting of the diagonal subtables of B which contribute with the term n(J − Q) to the total inertia.Hence, he takes as more natural measure of total inertia the sum q =s χ 2 qs of the inertias of the off-diagonal tables.This suggests an alternative generalization of correspondence analysis which fits only the off-diagonal tables, analogous to factor analysis where values on the diagonal of the covariance or correlation matrix are of no direct interest.
Indeed, the proposed redefinition of the total inertia, by removing the diagonal block-matrices, would fix an important bias due to the application to the Burt's table of the chi-square metrics, since the diagonal structure of the diagonal block-matrices represents a very high fictitious deviation from the expected values, that MCA analyzes as if it were a true deviation.On this basis, opposite to the current use, this kind of analysis is not really suitable.
Greenacre [20] proposes his Joint Correspondence Analysis (JCA) as a weighed least-squares approximation aiming at minimizing instead of ( 14) with the corresponding , sum of the chi-squares of all off-diagonal tables, that unfortunately may not be checked for significance.
In order to get the solution, he proposes an alternating least-squares algorithm, based on the reformulation of (15) as follows: (16) with r i the diagonal of the i-th block-diagonal matrix.Calling H and L the supermatrices gathering the H i j and L i j respectively, [20] states the equivalence of the rank-K solution of L which satisfies the normal equations in the minimization of the second term of ( 16) with the rank-(K + 1) matrix H = r r + L which satisfies minimizing (15), with r the supervector gathering the Q vectors r i .
The matrix approximation L of rank K is of the form L = n D X D β X D, where the J × K matrix X is normalized as X D X = Q I , with D = diag(r).The matrix X of parameters has rows corresponding to the categories of the variables and columns corresponding to the dimensions of the solution, that must be chosen in advance.The diagonal matrix D β contains a scale parameter for each dimension.This form of L and the normalization conditions are chosen to generalize the bivariate case (8).The parameter matrix X is partitioned row-wise according to the variables as X 1 , . . ., X Q , where X q is J q × K , so that the submatrices of L are L qs = n D q X q D β X s D s .There are also inherent centering constraints on X of the form X r = 0 due to the orthogonality with the dimension defined by the trivial solution.It is evident that the dimension of the solution must be chosen in advance.
Thus [20] proposes the approximate reconstruction of the whole matrix B − n r r , namely where C is a block diagonal matrix with submatrices C qq , q = 1, . . ., Q down the diagonal and zeros elsewhere.Here, each C qq is composed by dummy parameters which effectively allow perfect fitting of the submatrices on the diagonal of B −n r r , thereby eliminating their influence on the model of interest.The minimization of is equivalent to minimizing (16) because the latter set of terms in (17) can always be made zero by setting The algorithm proposed by [20] to minimize (17) can be performed iteratively by alternating between the variables in C and those in X and D β as follows: 1. fix the dimension K of the solution.

initiate the algorithm with an analysis of the full Burt matrix B, that is
3. limiting attention to the first K dimensions, say the first K columns of X x (1) , . . ., x (K ) , ( 18) can be rewritten as Pesquisa Operacional, Vol.36( 1), 2016 so that, if all quantities except the β k (k = 1, . . ., K ) are regarded as fixed, the problem reduces to a simple weighted least-squares regression (see [20], for further details).
4. Keeping X and D β fixed, set 5. Keeping C fixed, minimize with respect to X and D β : this is achieved by performing a correspondence analysis on the table B * = B − C, that is the Burt's matrix with modified submatrices on its diagonal, setting X equal to the first K vectors of optimal row or column parameters and the diagonal of D β equal to the square roots of the first K principal inertias respectively.
6. Iterate the last two steps until convergence.
In the special case Q = 2, the problem reduces to fitting the single off-diagonal submatrix N 12 .
It is relevant to mention that the initial solution described above is optimal and provides the simple correspondence analysis of N = N 12 exactly.It is also noteworthy to quote [22, pag. 148]'s alternative proposal to reevaluate the inertias along the axes by estimating them through a weighted least-squares regression.Indeed, it means to estimate only once the β k s in (19), where the x k s are the eigenvectors issued by MCA, once the dimension of the solution K is decided.We are not convinced by this solution, since the objections that concern the use of the chi-square metrics remain unaltered.

The Extended Matching Coefficient
JCA was introduced by [20] as a way to drop the excessive attention given to the diagonal of the Burt's matrix by MCA, that indeed does not deserve any interest, but with a different optimization aim, that is to reconstruct at the best the off-diagonal subtables.In this case, the chi-square metrics is justified, whereas we have already stated that in classical MCA it has no theoretical justification.For this reason, we are not convinced that some inertia reevaluation of MCA, as the quoted ones, may be a solution.On the opposite, we find interesting to explore Gower & Hand's [18] proposal to drop the chi-square metrics in favor of a more simple one: the Extended Matching Coefficient.Indeed, for two units, they define it as the number of common levels across all characters.Therefore, given the indicator matrix Z , Z Z would give us a similarity matrix to deal with; indeed, given its size, the Burt's table B as such is its corresponding in the dual space, so that it is sufficient to perform the SVD of the centered Burt's matrix, that is to be compared with (12).Now, the reconstruction formula (3) holds, but this time the layers may not be interpreted in terms of deviations from expectation, that is not taken into account by the method, but merely as contribution to the reconstruction, that is in this case the frequencies in the Burt's cells.Unfortunately, no stopping rule is available for this method so far, thus we considered both Cattel's [12] and brokenstick [3] tests.

AN APPLICATION TO THE KIND OF WORDS
To show in detail the different behavior of the different analyses in practice, we refer first to a data set taken from [32], consisting in 2000 words taken from four different kind of periodic reviews (Childish (TC), Review (TR), Dissemination (TD), and Scientific Summary (TS)), classified according to their grammatical kind (Verb (WV), Noun (WN), and Adjective (WA)) and the number of internal layers (Two-(L2), Three-(L3), and Four and more layers (L4)), as a measure of the word complexity.In Table 1 the Burt's table that results by crossing the three characters is reported.In Table 2 are reported the eigenvalues of the three SCAs of the contingency data tables that cross the three characters two by two: the eigenvalues, the percentage of corresponding inertia, and the p-value associated to the chi-square calculated for the corresponding one-dimensional reconstruction, that, in this case, is identical to the Malinvaud's test, since each solution is 2dimensional.
Table 2 -SCA of the three contingency data tables of the three characters two by two.In the columns, the eigenvalues, the percentage of inertia, and the p-value of the chi-square associated to the factors.In two cases, the tests do not attribute to the second factors any real meaning, since the p-value is larger than 5%, whereas in the case of the table type of publication -kind of words the second factor is also significant.

Words -Levels
In Figure 1 the results of the three SCAs are represented too: it must be pointed out that the vertical position of the items is significant only for the second graphic.Indeed, the inspection of this factor plane shows an arch pattern due to a Guttman effect [9,24]; the same, the interpretation is straightforward: for the first table, both verbs and nouns seem to have in general less syllables than the adjectives; for the second, the variation in use of the words according to the higher complexity of the publication: verbs for the childish, nouns for reviews and disseminations, adjectives for scientific summaries; for the third, the more complicated words (3 and more syllables) in scientific summaries than in all others.It is noteworthy in the second table the opposite pattern of verbs and adjectives, the first reducing while the publication is of higher level and the second raising; this explains clearly the observed Guttman effect.The position of long words very elongated on the second axis of both the first and the third analyses, in the latter case also with review is explained by the shortness of the verbs and its scarce presence in childish publications, but it is not significant.We may ground our comparisons on this interpretation of the data.Running MCA, the pattern of eigenvalues is represented in Table 3, in which are reported the singular values of Z , their percentage to their total (that equals J −Q Q = 2.33), the cumulate percentage, the eigenvalues of the Burt's matrix, corresponding to the explained inertia, and the cumulate inertia.In addition, on the table are reported the re-evaluated inertia and its percentages and cumulated ones according to both [8] and [20], limited to the only three singular values larger than 1/Q = 1/3, with the totals in the following row.In both cases, the first dimension's re-evaluated inertia is by far larger than the others.If we apply the Ben Ammou & Saporta's [5,6] estimation of the average singular value distribution under independence, we find that the standard deviation is σ = 0.0159364, so that the confidence interval at 95% level is (0.30146 < λ < 0.36521).As a consequence, only the first singular value is outside the confidence interval and should be considered significant.As a matter of facts, the second one is very close to the threshold (0.3640): this is consistent with the fact that only one of the 2-dimensional tables has a significant second eigenvalue.In Figure 2a the distribution of all character levels on the plane spanned by the first two factors of MCA is represented.Indeed, the patterns of all the characters' levels repeat fairy well the same in the three two-way tables: thus it may be taken as a sign of coherence between the individual SCAs and MCA.It may be observed that the similarity is good even on the second dimension, albeit not significant, whereas on the plane the Guttman effect appears again in good evidence.This may also depend upon the magnitude of the first two eigenvalues, that is sufficiently high to state that the three characters share around either 48% or 36% of the first and second factor respectively.Concerning the inertia reevaluation, this does not affect the interpretation of the single factors but in case of the spaces, since it acts as different multiplicative constants on the factors.
Let us now discuss the results of the JCA carried out on the same example.In the 2-dimensional solution 1 the axes inertias are 0.2488 and 0.0272, with a proportion of 90.15% and 9.85%, respectively: considering only the first axis as significant, we may observe in Figure 2b pattern of levels nearly identical to the one of MCA.Some differences appear on the second axis, in which are noticeable the very different positions of verbs and childish publications on the negative side and of long words and summaries on the positive one, but, once again, this may not be considered significant.Eventually, we got the results of EMC.The seven (J − Q) non-zero eigenvalues and the corresponding percentages of explained and cumulate inertia are reported on Table 4.They are reported also in the figure nearby, where the average (166.66)corresponds to the dotted line.Thus, one may identify two major eigenvalues that summarize 44% of the total inertia, three others around the mean and two minor ones.Here, the Cattel test would suggest two factors, whereas the brokenstick considers random even the first one, since the threshold to consider it non-random would be 37.04%.These contradictory results lead us to compare them with the previous ones, therefore considering the first dimension as the "true" one, but also taking into account the second, at least for the graphical representation.In Figure 3 all levels are plotted on the plane spanned by the first two factors: indeed, the pattern of levels along the first axis is somehow similar to the ones resulting from both MCA and JCA but not so much: both L4 and L3 and even more WA and WN are exchanged, slightly modifying the interpretation of the results.On the opposite, the pattern along the second factor is so different that no agreement seems to be possible.In both cases, differences may result from the fact that here rare levels are found close to the centroid and the frequent ones are far away, whereas in the chi-square-based methods the opposite occurs.Indeed, this is the case of both L2 and WN that have the highest marginal values, whereas L4, with the lowest ones, is set toward the center.
Let us look now at the one-dimensional reconstruction, as resulting by the SCAs of the three individual tables, by the MCA, and by Greenacre's JCA as reported in Table 5.The comparison of the SCA one-dimensional solutions with the original tables shows that the amount of the cumulate absolute residuals is in good agreement with the quality of the solution, as represented by the corresponding chi-square.this reason, the low quality of the reconstruction of the table crossing kind of words with the type of publications depends on the significance of the second dimension of the SCA of this table, that here is not taken into account.At first glance, it is evident the high difference in the cumulate absolute residuals of MCA in respect to the other solutions, that is an important sign of the limits of MCA in respect to JCA.
Indeed, the quality of JCA one-dimensional reconstruction is in all cases acceptable, so that it is possible to observe a synthetic graphical representation of the three tables that is realistic.
Finally, looking at the first layer obtained by EMC we find a behavior somehow comparable with the first layer of MCA: much worst for the first table, much better for the second and relatively equal for the third.This may also depend on the different way that this method uses to reconstruct the data table, as each layer does not represent a deviation from expectation but rebuilds the table anew.Thus, a better reconstruction must be expected through a larger number of factors.We did it by comparing the sum of the absolute differences in the partial reconstructions obtained by increasing the solution dimension: this could be done for the whole 7 factors of both MCA and EMC and only for the first 3 above the mean for both adjusted MCA and JCA.The results are given in Table 6.In Table 6 are reported the cumulate absolute residuals of reconstructions of MCAs, both normal and adjusted, JCA, and EMC: they are both total and partitioned according to the diagonal matrices and the off-diagonal ones.In this latter case, the residuals are divided by two, that is the sum of the residuals of the individual 2 × 2 contingency tables, that form either triangular offdiagonal sub-matrix.The residuals for 0-dimension are the deviations from independence and the following are reported for all the allowed dimensions: 7 = J − Q for both MCA and EMC and 3 for both adjusted MCA and JCA, that corresponds to the number of singular values of the Burt's table larger than the average.
The first row reports the deviations in respect to the independence, that for EMC does not make any sense.For each method, the first column represents the inertia of the whole Burt's table reconstruction: it is always descendant, as it should be expected, although with different slope: in this respect, EMC performs best by far.Indeed, the same occurs for what concerns the reconstruction of the diagonal tables: once again the EMC's performance is the best, albeit not as for the total table.Both MCA and EMC eventually rebuild totally the Burt's table, as expected.
The surprises arise looking at the off-diagonal tables reconstruction: here, the MCA reconstruction is dramatically bad and problematic: indeed, all partial reconstructions are worst than the independence, that is the estimated frequencies are further from the observed than those due the independence, but the last one.That is, the first 5 dimensions, instead of improving the estimation, get it even worse!In this respect, EMC performs much better, as it is constantly decreasing.
If we look now at both adjusted MCA and JCA, we notice that, for what concerns the diagonal submatrices, they perform very badly, even worst than MCA, but this ought to be expected, specifically for JCA, in which the diagonal submatrices are intentionally neglected.On the other side, the improvement in the reconstruction of the off-diagonal ones is incredibly better, even in respect to EMC, with an excellent performance of JCA.

CONCLUSION
This study started with the aim to understand to what extent the JCA [20] could be of help in identifying the true dimension of an analysis concerning a set of qualitative data.In this sense, the confidence interval proposed by Ben Ammou & Saporta [5,6] seems a better answer to this problem, that in the proposed example resulted in agreement with the most one-dimensional solution of the SCAs applied to the two-way tables.
During the study, the problem of the data reconstruction not only showed that MCA is bad in reconstructing the whole data table, in respect to EMC, even in what concerns the diagonal submatrices, but mostly concerning the off-diagonal ones, that are even more biased: the reconstruction of the two-way off-diagonal tables is for the most reduced-dimensional solutions worst than the independence table.Indeed, only redefining the inertia according to the adjusted MCA, a suitable reconstruction may be performed, albeit far from optimality, that is much better approached by JCA.It is interesting to note that, concerning the off-diagonal tables the adjusted MCA seems to perform better than EMC, a result that should be further studied.
Eventually, the performance of JCA, as expected, is by no means the most suitable to deal with the off-diagonal tables, that is on the study of the relations between pairs of characters.As for the interpretation of the factors, JCA is not very different from MCA, whereas the method's differences of EMC impose a different interpretation that may be further studied.Thus, JCA seems the most promising development of MCA and its properties deserve some further deepening, including the three available programs [21,41,43]: indeed, a direct comparison of these results with those obtained through Greenacre's [22] inertia evaluation through regression, may provide further insights on both methods, albeit our critics on the use of chi-square metrics for the whole Burt's table remain.
Indeed, no direct comparison of the results is strictly correct, since the methods considered in this work use different metrics, either chi-square or EMC, and/or optimize different criteria, as described.In addition, JCA solutions are not nested.These aspects deserve being taken into account while interpreting the results obtained by the different methods.Eventually, address the study in a different framework such as maximum-likelihood estimation, may be a fruitful alternative.

Figure 1 -
Figure 1 -Words' type example: The pair of characters levels on the three two-way SCAs: (a) Words vs. Levels; (b) Publications vs. Words; (c) Publications vs. Levels.

Figure 2
Figure 2 Words' type example: representation of the three-character levels on the plane spanned by the first two factors: (a) MCA; (b) JCA.

Figure 3 -
Figure 3 -Words' type example: representation of the three-character levels on the plane spanned by the first two factors of the centered PCA on the Burt's table, corresponding to the Extended Matching Coefficient.
table, with n = n .. the table grand total, X = N /n the table of relative frequencies p i j = n i j /n, r = ( p 1. , . .., p r. ) the vector of row marginal profile, c = ( p .1 , . .., p .c ) the vector of column marginal profile, and D r = diag(r), D c = diag(c) the corresponding diagonal matrices.Let E = rc represents the table with equal marginal profiles of X under the independence hypothesis.The SCA of N results from the application of GSVD to the matrix X with the real positive definite matrices represented by the diagonal matrices D−1

Table 1 -
Burt's table of the words' type example.

Table 3 -
[20]singular values, percentage to the total and cumulate percentage, eigenvalues, and cumulate inertia of the Burt's table of words' type example.Then re-evaluated inertia and percentages according to both[8]and[20].

Table 4 -
Eigenvalues, percentages of explained and cumulate inertia of the analysis of EMC on Words' type example.On the right, the pattern of the eigenvalues.The dotted line represents their average (166.66).

Table 5 -
Original two-way contingency tables of words' type example and their reconstruction according to the first dimension of SCAs, MCA, adjusted MCA, JCA, and EMC with the corresponding cumulate absolute residuals.

Table 6 -
Absolute residuals of the reduced dimensional reconstructions of both the Burt's table and the two-way off-diagonal ones according to MCA, adjusted MCA and JCA respectively: to 0 correspond the deviations from independence.