Using Artificial Intelligence Methods to Design New Conducting Polymers

In the last years the possibility of creating new conducting polymers exploring the concept of copolymerization (different structural monomeric units) has attracted much attention from experimental and theoretical points of view. Due to the rich carbon reactivity an almost infinite number of new structures is possible and the procedure of trial and error has been the rule. In this work we have used a methodology able of generating new structures with pre-specified properties. It combines the use of negative factor counting (NFC) technique with artificial intelligence methods (genetic algorithms GAs). We present the results for a case study for poly(phenylenesulfide phenyleneamine) (PPSA), a copolymer formed by combination of homopolymers: polyaniline (PANI) and polyphenylenesulfide (PPS). The methodology was successfully applied to the problem of obtaining binary up to quinternary disordered polymeric alloys with a pre-specific gap value or exhibiting metallic properties. It is completely general and can be in principle adapted to the design of new classes of materials with pre-specified properties.


Introduction
Conducting polymers constitute a new class of electronic materials with unusual properties, large technological potential applications and new physical phenomena 1 .
The main focus of the early research on conducting polymers was on the electrical conductivity.Once the basic structural features required to obtain highly conductive materials were identified, the focus of this research shifted to the development of highly conductive polymers with good environmental stability and more acceptable processing attributes.One way to achieve this is exploring the concept of copolymerization 2 .In general, the copolymers show electronic and mechanical properties intermediaries of their related homopolymers.
Due to the rich carbon reactivity, an almost infinite number of new structures is possible.This makes the systematic search for new structures almost impossible, and trial and error approach has been the rule.In this work we discuss a methodology 3,4 capable of generating automatic solutions for ordered and disordered polymeric alloys with pre-specified properties.It combines the use of negative factor counting technique (NFC) 5,6 , with genetic algorithms (GAs) 7 .The NFC technique allows us to obtain the eigenvalues of very large matrices without direct diagonalization.GAs originated from the studies conducted by John Holland in the 1970s 8 .The metaphor underlying GAs is that of natural evolution.GAs follow these ideas in a very simple way and allow us to use the computer to evolve automatic solutions over time.This methodology was originally developed by us to study polyanilines 3,4 and it was the first time that the NFC technique coupled with artificial intelligence methods (Gas) was used in materials science.To our knowledge no other approach using electronic parameters and combinatorial/artificial intelligence methods has been applied to conducting polymers.
In this work we investigated the copolymer poly(phenylenesulfide phenyleneamine) (PPSA), an alternating copolymer, formed by combination of polyaniline (PANI) 9,10 and polyphenylenesulfide (PPS) 11 polymers (see Fig. 1).PPSA 12 not only combines PPS and PANI structural features but also presents higher solubility besides good chemical stability.Due to these thermal and chemical stabilities, electron-rich character, and electrical conductivity, possible PPSA applications include hole-injection material for multilayer LED devices 13 and corrosion inhibition 14 .
The kind of problem we are interested in solving is to find optimum relative concentration for binary, ternary up to quinternary polymeric alloys presenting some pre-specified properties.For instance, consider a ternary disordered alloy formed from the three types of structural units shown in Fig. 1: What are the values for x, y, and z that would produce a gapless structure with the largest possible electronic delocalization (in principle the most conductive structures) or a structure with a pre-specified gap?Considering that typical chains can contain hundreds of rings, a systematic analysis of each possible structure is computationally very expensive or even impossible.In principle GAs can be used to automatically find good solutions through intelligent searches in the configuration space with selective sampling.

Methodology
In Fig. 2 we show a generic flow chart of a continuous parameter µGA 15,16 .The GA starts with a group of initial (in general randomly generated) solutions.Each solution can be represented as bit-strings (sequences of zeros and ones) of specified length.For analogy from genetic population terminology each string is named as "chromosome", "genes" are fragments of a chromosome, and "population" is a set of individuals (chromosomes) used in a GA iteration ("generation").
The optimum population size depends on the specific problem under analysis 7,17 , in our present case as the fitness evaluation function (see discussion below) is computationally expensive we have opted to use a population of only 5 chromosomes (referred in the literature as micro GA (µGA) 18,19 ).
Once the first population is generated the "fitness" (how "good" is the proposed solution) of each individual (chromosome) is calculated through an evaluation function.The next populations ("offspring" generated from "parents") are composed in the following way (see Fig. 2): (1) The best individual (in terms of fitness) from the previous population is always included ("elitism" option).( 2) The other 4 individuals are generated from crossover operators over individuals from the previous generation and selected with probability proportional to their fitness.The operator we are using here is uniform crossover 15 , i.e. the "genes" are randomly copied from  the first and second parent chromosomes.The use of uniform crossover is one procedure to achieve genetic variability.In some ways it substitutes the use of mutation in µGas.
(3) As the mutation operator generally introduces poor individuals in terms of fitness (in special for small populations) in general it is not used in µGAs.Thus, and because the number of chromosomes are small, the population could become rapidly homogeneous.When this happens the next population is obtained by keeping the best individual (elitism) and adding other four randomly generated.(4) The above steps are repeated until the criteria convergence is reached.To describe the electronic structure of our polymeric chains we are using a LCAO (Linear Combination of Atomic Orbitals) 20 approximation with a Hückel (tight-binding) hamiltonian 20 : The IPN (Inverse Participation Numbers) 21 are a measure of the level of delocalization of a molecular orbital.They are obtained from: (2) IPN can assume values from zero (maximum delocalization) to one (localized over only one orbital).The c jr are the LCAO expansion coefficients.
As the structures are long polymeric chains (200 rings -100 monomeric units) obtaining the Hückel eigenvalues and eigenvectors from direct diagonalization is computationally very expensive.One alternative is to use the NFC 5,6 technique.The NFC basic idea is to obtain the eigenvalues of large matrices without direct diagonalization.Once obtained the eigenvalues, the eigenvectors of interest can be obtained (one by one) through the use of the inverse iteration method (IIM) 22 , and consequently the IPNs through Eq. 2. Through Eq. 3 or Eq.4.1 and 4.2 we can combine GA with NFC/IIM in order to have a methodology capable of generating automatic solutions to the complex problem of disordered polymeric chains with pre-specified properties.This methodology was used with success to the study of polyaniline alloys 4 .Our polymeric chains are randomly sequences of monomeric units (A, B, C, D, E and F, Fig. 1) satisfying imposed relative concentration.The chains are generated in the following way: (1) From GA, a population of chromosomes is generated defining the relative concentration of A, B, C, D, E, and F units (see Fig. 1).
(2) The chains are then generated from a random function with post-sorting corrections 23 in order to provide the exact relative unit percentage.(3) From these structures the Hückel matrices are constructed.(4) From NFC/IIM the eigenvalues and eigenvectors are obtained.(5) The IPN are then calculated.(6) The fitness function f(x) is then evaluated.
The process is repeated until the convergence is attained, i.e., if a variation of the best fitness over a specified number of generations is not observed.
One major point in GA procedures is how to implement the fitness function.When we are searching for structures presenting zero gap and extended electronic states at Fermi level we can define f(x) as a function of the gap and IPN values with the same statistical weight.As they have different range definition (gap varying from 0.0 to 1.36β, and IPN from 0.0 to 1.0) we need to use a scale constant (0.735) to satisfy this condition: where .ϕ is the gap value chosen arbitrarily to determine a structure with pre-specified gap.

Results and Discussions
In order to test the reliability and effectiveness of our GA methodology we investigated the simple case of a binary alloy where the systematic search is feasible.We carried out a detailed analysis (solving the Hückel equations for each configuration) varying x (alloy concentration parameter) in steps of 1%.For each concentration x we relationship the gap and IPN values.For binary alloys A 100-x B x , A 100-x D x , A 100-x E x , C 100-x B x , C 100-x D x , C 100-x E x we optimized the concentration x that would produce some specific gap values and for binary alloys A 100-x F x and C 100-x F x (this structures produce zero gap) we search for structures with simultaneous zero gap and HOMO more delocalized possible.In Table 1 we show the results.We can observe that is possible to obtain more than one optimal solution, i.e, to have degenerate states (see Table 1).Basically It depends on the way we define the fitness function.When the fitness includes only one parameter this is very likely to occur, as in the case of binary alloys A 100-x B x and A 100-x D x (see Table 1).For fitness function with many dependent parameters this would be very improbable to happen.
In summary GA produce exactly the same results obtained by systematic search.

Ternary alloy
We then proceed for simulations of ternary alloys A x B y C 100-x-y , A x C y D 100-x-y , A x C y E 100-x-y and A x C y F 100-x-y , where the systematic search is impracticable.The GA simulations will look for automatic solutions determining the relative x and y concentration that produce polymeric structures with zero gap and lowest IPN values or with a specific gap value (in this case 0.95β).
In Table 2 we show the GA results.Although GA could not obtain solutions to the chosen pre-specified gap value (green) for ternary alloys, close solutions were obtained.These results do not mean that GA failed, but only that for PPS with PANI alloys is not possible to obtain this gap value.
For the optimization of relative concentrations that produce structures with zero gap and lowest IPN, GA found the solution A 4 C 52 F 44 (see Table 2).In Fig. 3 we show the calculated electronic structure for the GA proposed solution.As we can see from Fig. 3a where we show the density of states, the Fermi level is inside the valence band (p-type material).In Fig. 3b we show the square coefficients of HOMO (Highest Occupied Molecular Orbital).The HOMO is relatively delocalized for around 60% of the polymeric chain.GA found a satisfactory solution for the problem (copolymer formed by combination of PPS and PANI).The copolymer presents conductivity lower than PPS and PANI 12 .

Quintenary alloy
When we are performing the optimization of relative concentration x, y, z and w for a quintenary alloy, we must fulfill the following condition: x + y + z + w ≤ 100.As the chromosomes, that represents the relative concentration of different monomeric units of polymeric chains are generated by GA through operators (crossover, elitism or even randomly -see the methodology) this condition sometimes is not reached.When it happens the respective chromosome receive the worst fitness value.In order to provide a significant number of chromosomes those reach the above condition we considered in this section a population of 7 individuals, and we carried out GA simulations up to 500 generations or until convergence is reached.The results obtained from GA for the optimization of relative concentrations for structures that present a specific gap value (green region ~ 0.95β) were: (a) alloy A x B y C z D w E 100-x-y-z-w , closest solution A 67 B 18 C 7 D 8 after 237 generation with a gap of 0.47β; (b) alloy A x B y C z D w F 100-x-y-z-w (F units replacing E ones), closest solution A 11 B 61 C 11 D 17 after 399 generations with a gap of 0.45β.In this case an almost twice number of generations was needed to achieve convergence.It is due the presence of F units that tend to induce zero gap.
In Fig. 4 we show the calculated electronic structure for the GA proposed solutions for copolymer A 7 B 5 C 1 D 30 F 57 found by GA in generation 74 and for A 1 D 45 F 54 found by GA in generation 328.These two copolymers are obtained in the same simulation of optimizing the relative concentration for A x B y C z D w F 100-x-y-z-w alloy that presents simultaneously zero gap and lowest IPN.As we can see in Fig. 4a for the A 7 B 5 C 1 D 30 F 57 copolymer the Fermi level is inside the valence band (p-type material), and the HOMO is delocalized around 70% of the chain (Fig. 4c).These results are consistent with experimental results, as PPSA (a copolymer formed by combination of PPS and PANI) is less conductive than the homopolymers PPS and PANI 12 .For the A 1 D 45 F 54 copolymer the found solution satisfies the conditions for metallic regime.The Fermi level is inside the valence band (p-type metal -Fig.4b) and the HOMO is delocalized over the whole chain (Fig. 4d).We would like to stress that these are extended (conducting) states in disordered one-dimensional polymeric chains 24 .The existence of these states contrasts to Anderson's localization theorem 25 but are common features of conducting polymers.The precise origin of these states has been explained by Phillips  and collaborators 26,27 with the Random Dimer Model (RDM).One important aspect in conducting polymers is that the intermolecular hopping of the carriers plays an important role to determine the conductivity values.In this sense it is fundamental to know whether the extended states in isolated chains (defining the metallic regime) would survive when the interaction among chains (more realistic description of the material) is explicitly taken into account.It has been demonstrated that this in fact occurs 28 and that the general conclusions derived from isolated chains can be used to extrapolate the macroscopic behavior.

Conclusions
The obtained results show that the methodology we are presenting here is able to find automatic solution to the problem of designing disordered polymeric chains with pre-specified properties (in the present case metallic ones or specific gap value).Considering the large number of possible polymeric units of conducting polymers (an almost infinite number of possible combinations exists to generate new structures) the present methodology can be a very effective tool guiding the experimentalists in the search of new conducting materials with specific properties.In the present study we did not impose any chemical constraints on our GA.In an actual problem chemical or physical constraints for the structures can be easily implemented.This can be done simply adding new terms on the evaluation function that will make some undesired structures to have low fitness or through restraints (discarding) in the generation of the polymeric structures.The methodology is completely general and can be used in the design of new classes of materials, polymeric or not, disordered or not.

Figure 1 .
Figure 1.Possible structural monomeric units for copolymers of PPS with PANI.PPS (A); PANI (B -leucoemeraldine form; Dpernigraniline form; E and F -polaronic and bipolaronic structural defects), and PPSA (C).Figure 2. Flow chart of micro Genetic Algorithm used on this work.

Figure 2 .
Figure 1.Possible structural monomeric units for copolymers of PPS with PANI.PPS (A); PANI (B -leucoemeraldine form; Dpernigraniline form; E and F -polaronic and bipolaronic structural defects), and PPSA (C).Figure 2. Flow chart of micro Genetic Algorithm used on this work.

( 3 )
To search for structures that presents a pre-specified gap value, we can define f(x) as a linear function depending on the variable gap values as follow: interval [a,ϕ] (4.1) interval ]ϕ,b] (4.2)

Figure 3 .
Figure 3. a) Density of electronic states for the proposed GA solution.The arrow indicates the position of the Fermi level; b) Square of corresponding HOMO's expansion coefficients.In the lower part of Fig 3b is indicated the actual chain composition in terms of A, C and F units.

Figure 4 .
Figure 4. a) and b), Density of electronic states for the proposed GA solution (A 7 B 5 C 1 D 30 F 57 in generation 74 and A 1 D 45 F 54 in generation 328).The arrow indicates the position of the Fermi level; c) and d), Square of corresponding HOMO's expansion coefficients.In the lower part of Figs.5c and 5d is indicated the actual chain composition in terms of units A, B, C, D and F.

Table 1 .
Results obtained for systematic search and GA.β = 2.5 eV.