Principal Component Analysis of Molecular Geometries of Cis-and Trans-C 2 H 2 X 2 with X = F or Cl

PC1 and PC2 score graphs show how calculated molecular geometries depend on characteristics of the molecular wave-functions of cis- and trans- difluoro- and dichloroethylene. PC1 and PC2 separate the results obtained with or without polarization functions and with or without the inclusion of electronic correlation. The quality of the experimental geometries are analyzed projecting them on the PC score graphs. Using this procedure, Takeo's geometry obtained from microwave transitions does not compares with any of the ab initio calculations for cis-C2H2Cl2 , whereas Schafer's geometry obtaines from gas electron diffraction spectroscopy is in good agreement with the MP2/cc-pVDZ, MP2/cc-aug-pVDZ and CCD/cc-pVDZ calculations.


Introduction
In the last twenty years, we have devoted considerable attention [1][2][3][4][5][6][7][8][9][10] to cis-and trans-dihaloethylenes in order to gain a better understanding of their electronic and vibrational properties.The cis-and trans-dihaloethylenes (C 2 H 2 X 2 ) are interesting isomeric species since they contain the same kind and number of chemical bonds.The major difference between them is due to the relative configurations of these bonds within the molecule.In particular, trans-dihaloethylenes are intriguing molecules from a spectroscopic point of view because, in spite of their high molecular symmetry, the orientations of their in-plane dipole derivatives are not restricted to the principal symmetry axes.5][6] Furthermore, the electronic structures of cis-and trans-C 2 H 2 Cl 2 are more similar than those of cis-and trans-C 2 H 2 F 2 in terms of the intensity parameters of equilibrium charges and charge fluxes. 7,8n contrast to chemical intuition, both theoretical and experimental studies [11][12][13][14] have revealed that the cis isomer is more stable than its corresponding trans form, as a consequence of the so called cis effect. 15Theoretical results have shown that a correct interpretation of this effect depends on the precision of the geometric parameters obtained from molecular orbital calculations. 16However, calculated geometries can be strongly dependent on the calculation level (HF, MP2, CCD or [CCSD(T)]) and basis sets used whereas experimental [17][18][19][20][21][22] geometries may depend on the experimental technique employed (e.g.gas electron diffraction (GED) or microwave (MW) spectroscopy).
Recently, the molecular geometries of the trans-C 2 H 2 X 2 (X = F or Cl) species have been obtained from the microwave transitions observed in high-resolution infrared spectroscopy (IR) 23,24 and are somewhat different from those obtained using GED. 22For example, the values of the C-Cl, C=C and C-H bond lengths for trans-C 2 H 2 Cl 2 obtained from GED 22 are 1.725(2) Å, 1.332(8) Å and 1.092 (26) Å respectively whereas their corresponding values using IR 24 are 1.740(3) Å, 1.305(5) Å and 1.078(4) Å.
In order to better understand both the theoretical and experimental changes which occur in the molecular geometries of the cis-and trans-C 2 H 2 X 2 species, we have performed a multivariate exploratory analysis using Principal Components Analysis (PCA). 25,26This technique has been successful in analyzing the effects of wavefunction modifications on calculated C-H and C-X (X = F or Cl) vibrational frequencies and infrared intensities of the dihaloethylenes. 9,10For example, all the calculated C-H stretching frequencies can be adequately described by a single principal component whereas bidimensional principal component graphs are sufficiently accurate for a direct comparison of the results of trial wave-functions with the observed results of the vibrational bending frequencies.

Calculations
A set of ab initio molecular orbital calculations was performed with the Gaussian 92 27 and Dalton 28 programs.The Hartree-Fock (HF) 29 and Möller-Plesset of second order (MP2) 30 calculations were carried out using a 2 4 factorial design, where two levels of four factors were investigated: (i) the use of basis sets 6-31G or 6-311G; (ii) the presence or absence of diffuse functions; (iii) the presence or absence of polarization functions; (iv) the use, or not, of perturbative Möller-Plesset corrections of second order (MP2) to HF calculations. 16The MP2 calculations were performed using the frozen-core electron correlation approach.The others were performed using coupled-cluster calculations with double excitations (CCD) and single and double excitations (CCSD) augmented by a perturbational correction for connected triple excitations [CCSD(T)]. 31n order to evaluate the importance of electron correlation for inner-shells, in particular for the dichloroethylene systems, the CCSD(T) calculations were also performed including additional electron correlation for Cl 2s2p core electrons.These calculations result in a data matrix X n,p composed of 5 variables, which correspond to three bond lengths (C-H, C=C and C-X) and two bond angles (CCH and CCX), and "n" objects, which correspond to the different ab initio calculations for each C 2 H 2 X 2 species.This matrix X can be taken as a set of "n" calculations represented as a graphic in a 5-dimensional space.
Principal component analysis (PCA) represents a rotation of the original axis system searching a new direction concentrating at maximum the original information and for which one hope to find some kind of patterns present in the original data set.From a practical point of view, this is obtained through the diagonalization of the covariance matrix X t X (where X t is the transposed of the data matrix X).The eigenvector elements called loadings represent the director cosines, i.e., the contribution of the original axes for the composition of the new axes called principal components.The eigenvalues represent the amount of variance described by the corresponding eigenvectors.The first eigenvector is the first principal component (PC1) and corresponds to the axis for which the objects have the maximum variance.Therefore, PC1 corresponds to the axes for which the objects are at its maximum spread.The second principal component, (PC2), is orthogonal to PC1, and represents the second axis of larger residual variance, i.e., it is the axis of maximum amount of variance not explained by PC1.A projection of the data on these two axes yields a graphical representation of the maximum statistical information that can be compressed into two dimensions, and may help to detect patterns hidden in the original multidimensional data.
In this work the principal component analyses using autoscaled (i.e., each element on a column was subtracted by the average and scaled to unit variance on the column) data were carried out using the chemometrics package Ein*Sight 3.0 32 on a personal microcomputer of the Laboratory of Theoretical and Computational Chemistry of the Departamento de Química Fundamental at the Universidade Federal de Pernambuco (UFPE, Brazil).The M.O.calculations were performed on workstations of UFPE and San Diego Supercomputer Center (SDSC) of the University of California, San Diego (UCSD).

Results and Discussions
In Tables 1 to 4 the optimized geometries for cis and trans-C 2 H 2 X 2 (X = F and Cl) are shown together with the experimental values.

cis-C 2 H 2 F 2
The score graph in Figure 1 for the cis-C 2 H 2 F 2 species shows that the 5-dimensional original space in Table 1 can be accurately represented by two principal components, which describe 95.5% of the total data variance.The first principal component, PC1, describes 54.7% of the variance.It is dominated by the C-F (+0.57) bond length and the CCH (+0.56) and CCF (-0.57) bond angles (see equation of PC1 in Figure 1).In this Figure , we can observe that PC1 separates the calculations containing polarization functions (at left), which have near zero or negative scores, and the calculations without polarization functions (at right, positive scores).This arrangement means that the ab initio calculations with polarization functions have the smallest numerical values for the C-F bond length and the CCH bond angle, which have positive coefficients in the PC1 equation, and the highest numerical value for the CCF angle (negative coefficients in PC1).For example, the C-F, CCH and CCF values for the MP2/6-311G calculation are 1.404Å, 124.3 o and 122.1 o respectively, whereas their corresponding values are 1.338 Å, 122.2 o and 122.7 o for the MP2/6-311G** calculation.On the other hand, PC2 describes 40.8% of the total data variance.It is dominated by the C-H (+0.69) and C=C (+0.66) bond lengths.This second principal component separates the calculations including electron correlation (MP2, CCD and CCSD(T)), which have positive scores, from those at the HF level (negative scores).In this case, ab initio calculations without electronic correlations produce the smallest numerical values for the C-H and C=C bond lengths.For example, these values are 1.071Å (C-H) and 1.307Å (C=C) for the HF/6-311++G** calculation, whereas their corresponding values are 1.082Å and 1.332Å for the MP2/6-311++G** calculation, respectively.These values for the more sophisticated CCSD(T)/cc-pVTZ vib average (i.e., for geometrical corrections due to average vibrations) calculation are 1.083Å (C-H) and 1.336Å (C=C), respectively, thus very similar to the MP2/6-311++G** calculation.
In Figure 1, the experimental geometries were inserted substituting autoscaled experimental values in equations of PC1 and PC2.This procedure will also be adopted for the other dihaloethylenes.In Table 1 we can note that the microwave (MW) geometries from Laurie and Pence 17 and from Harmony et al. 18 are very similar and appear superimposed in Figure 1.They are very close to those using the MP2/6-31++G**, MP2/6-31G** and CCD/6-31G** calculations.van Schaick's geometry 19 using gas electron diffraction spectroscopy (GED) is situated at the right and near to MP2/6-31++G** calculation.On the other hand, the geometry of Carlos et al., 20 also using GED, is very far from this group and practically isolated.It appears at the left and near the top as consequence of both a large CCF bond angle (123.7 o ) and small CCH bond angle (121.6 o ), corresponding to a negative score of PC1, and also a large C=C bond length (1.331Å) with a positive score in PC2.

trans-C 2 H 2 F 2
The score graph in Figure 2 for the trans-C 2 H 2 F 2 species reveals that the original 5-dimensional space in Table 2 can be adequately represented by two principal components, which describe 98.1% of the total data variance.The first principal component, PC1, describes 58.9% of the variance, while PC2 contains 39.2% of the remaining variance.Analogously to what was found for cis-C 2 H 2 F 2 , PC1 is dominated by the C-F bond length (+0.57) and the CCH (+0.55) and CCF (-0.58) bond angles, whereas PC2 is dominated by the C-H (+0.71) and C=C (+0.66) bond lengths.It is also interesting to note that the coefficients of the PC1 and PC2 equations in trans-C 2 H 2 F 2 are also very similar to those found in cis-C 2 H 2 F 2 .As consequence, here also PC1 separates the calculations with polarization functions with respect to those without them, which have positive scores and are at the right in Figure 2. PC2 separates the calculations with electronic correlation (MP2, CCD and CCSD(T)) from those at HF level.
Since trans-C 2 H 2 F 2 is non-polar, the geometries are not directly accessed from the MW spectrum.Here three experimental geometries are available: those obtained by van Schaick et al. 19 and Carlos et al. 20 using the GED technique and that from Craig et al. 23 using infrared spectroscopy.The GED geometry from van Schaick et al. is close those obtained using the CCSD(T)/cc-pVTZ, MP2/ 6-311G**, MP2/6-31G** and MP2/6-311++G** calculations.Craig's geometry is reasonably close to those using the CCD/6-31G** and MP2/6-31++G** calculations.Again Carlos' geometry is far from those obtained using more elaborate calculations.This is mainly due to the large CCH bond angle of 129.

cis-C 2 H 2 Cl 2
The score graph in Figure 3 for the cis-C 2 H 2 Cl 2 species shows that the 5-dimensional original space in Table 3 can be adequately represented by two principal components.
The first principal component, PC1, describes 58.7% of the total data variance, while PC2 contains 34.3% of the variance.Therefore, PC1 and PC2 describe 93.0% of the total variance.Their coefficients are different from those found for cis-and trans-C 2 H 2 F 2 .PC1 is dominated by the C=C (-0.51) and C-H (-0.50) bond lengths and the CCCl (+0.51) bond angle.It is important to point out that the C-Cl (+0.31) bond length and the CCH (+0.37) bond angle are small but can not be ignored.In contrast to cisand trans-C 2 H 2 F 2 , here PC1 separates the calculations with or without electronic correlation; the HF calculations have positive scores and are located at the right of PC1.PC2 is mainly dominated by the C-Cl (+0.65) bond length and the CCH (+0.58) bond angle.
Four groups can be roughly identified in Figure 3. On opposite sides of the PC1 axis, (I) HF calculations without polarization functions at right and (II) calculations including electronic correlation without polarization functions appearing near the top, (III) HF calculations including polarization functions, located at the bottom part of the plot and (IV) calculations with electronic correlation including polarization functions possessing negative scores in PC1 and PC2.Two experimental geometries were inserted in Figure 3.The geometry of Takeo et al. 21using microwave (MW) transitions and Schäfer et al. 22 ones using gas electron diffraction (GED) spectroscopy.The first is isolated from all the theoretical calculations.This is mainly due to the large CCH angle (123.2 o ) and the low value of the C=C bond length (1.319Å).Schäfer's geometry is close the MP2/ cc-pVDZ, MP2/cc-aug-pVDZ and CCD/cc-pVDZ calculations.This suggests the necessity of using correlated basis sets in electronic correlation calculations to adequately reproduce the experimental geometry.Here the MP2/6-nG** and CCD/6-nG** (n = 31 or 311) calculations are not close to the experimental ones, in contrast to what was observed for cis-and trans-C 2 H 2 F 2 .

trans-C 2 H 2 Cl 2
Figure 4 shows the score graph for the trans-C 2 H 2 Cl 2 species.The first two principal components explain 92.9% of the total data variance of the calculated angles and bond lengths.PC1 describes 50.7% of this variance and PC2 contains 42.2% of the variance.Analogous to what was found for cis-C 2 H 2 Cl 2 , the latter is relatively close to the MP2/cc-aug-pVDZ, CCSD(T)/cc-pVTZ-vib, CCSD(T)/cc-pVTZ-2s2p-vib, MP2/cc-pVDZ and CCD/ccpVDZ, i.e., to the more sophisticated calculations.Craig's geometry in turn, is close the MP2 calculations with basis sets without polarization functions.In particular, these experimental geometries mainly differ on the values of the bond lengths.

Conclusions
The results of the principal component analysis (PCA) reported here reveal in a convincing way how calculated molecular geometries depend on the characteristics of the molecular wave-functions of cis-and trans-difluoro-and dichloroethylene.This can be better visualized through bidimensional graphs.In other words, these graphs indicate that the 5-dimensional original space (three bond lengths and two angle bonds) is adequately represented by only two principal components (PC1 and PC2) in describing the total data variance.Our results reveal that the presence or not of polarization functions and the inclusion or not of electronic correlation in the ab initio calculations are the two main effects explaining the total data variance for the geometry.The inclusion of polarization functions in the basis set decreases both the C-X (X = F or Cl) bond length and the CCH bond angle, whereas the inclusion of electronic correlation (MP2, CCD or CCSD(T)) increases both the C=C and C-H bond lengths.The simultaneous inclusion of these effects is essential to obtain calculated geometries in good agreement with the experimental ones.The use of 6-31G or 6-311G basis sets with or without difuse functions seem to have smaller effects.From the bidimensional PCA graphs, it was possible to analyze how (di)similar are these experimental geometries (obtained from different techniques) compared to the calculated ones.For example, the microwave geometries compare very well with the CCD/6-31G**, MP2/6-31G** and MP2/6-31++G** calculations, whereas the experimental values obtained from gas electron diffraction (GED) are not close to these calculations when considering the bidimensional graph of cis-C 2 H 2 F 2 .On the other hand, the GED geometries from van Schaick et al. 19 for trans-C 2 H 2 F 2 are in good agreement with theoretical calculations when polarization functions and electronic correlations are simultaneously used, in contrast to the GED geometry from Carlos et al.. 20 For the dichloroethylene species the GED geometries from Schafer et al. 22 seem to be the best since they appear close to the higher level calculations.

Figure 1 .
Figure 1.Score plot for the optimized geometry of cis-C 2 H 2 F 2 .The experimental points were projected into the score plot.

Figure 2 .
Figure 2. Score plot for the optimized geometry of trans-C 2 H 2 F 2 .The experimental points were projected into the score plot.
3 o .The CCH values obtained from Craig et al. and from van Schaick et al. are 126.3o and 125 o , respectively.

Figure 4 .
Figure 4. Score plot for the optimized geometry of trans-C 2 H 2 Cl 2 .The experimental points were projected into the score plot.

Table 3 .
Optimized geometry of cis-C 2 H 2 Cl 2 .Bond length in Angstrom and bond angles in degrees a Corrected due to average vibration, see text; b additional electron correlation for Cl 2s2p core electron; c additional electron correlation for Cl 2s2p core electron and average vibration corrections; d Ref. 22; e Ref. 21.

Table 2 .
Optimized geometry of trans-C 2 H 2 F 2 .Bond length in Angstrom and bond angles in degrees Corrected due to average vibration, see text; b Ref. 23; c Ref 19; d Ref. 20. a

Table 1 .
Optimized geometry of cis-C 2 H 2 F 2 .Bond length in Angstrom and bond angles in degrees a Corrected due to average vibration, see text; b Ref. 17; c Ref. 18; d Ref. 19; e Ref. 20.
22ore plot for the optimized geometry of cis-C 2 H 2 Cl 2 .The experimental points were projected into the score plot.The geometry of Craig et al.24obtained from infrared (IR) spectroscopy and Schäfer et al.'s geometry22obtained from gas electron diffraction (GED) spectroscopy.
The coefficients of the PC1 and PC2 equations are very similar for cis-and trans-C 2 H 2 F 2 and trans-C 2 H 2 Cl 2 , in contrast to what occurs in cis-C 2 H 2 Cl 2 .