Sequential Allocation to Balance Prognostic Factors in a Psychiatric Clinical Trial

OBJECTIVE: This paper aims to describe and discuss a minimization procedure specifically designed for a clinical trial that evaluates treatment efficacy for OCD patients. METHOD: Aitchison’s compositional distance was used to calculate vectors for each possibility of allocation in a covariate adaptive method. Two different procedures were designed to allocate patients in small blocks or sequentially one-by-one. RESULTS: We present partial results of this allocation procedure as well as simulated data. In the clinical trial for which this procedure was developed, successful balancing between treatment arms was achieved. Separately, in an exploratory analysis, we found that if the arrival order of patients was altered, most patients were allocated to a different treatment arm than their original assignment. CONCLUSION: Our results show that the random arrival order of patients determine different assignments and therefore maintains the unpredictability of the allocation method. We conclude that our proposed procedure allows for the use of a large number of prognostic factors in a given allocation decision. Our method seems adequate for the design of the psychiatric trials used as models. Trial registrations are available at clinicaltrials.gov NCT00466609 and NCT00680602.


INTRODUCTION
For certain specific psychiatric disorders, large trials are fairly rare. Obsessive-compulsive disorder (OCD), for instance, has only been studied in small trials. First-line treatments such as clomipramine and selective serotonin reuptake inhibitors are typically studied in trials with no more than two hundred patients. 1 For second-line treatments such as pharmacological augmentation strategies, the situation is worse, albeit understandably so. 2,3 To our knowledge, no trials have investigated these strategies using more than one hundred patients. Indeed, most such studies have presented final sample sizes of no more than 15 patients per arm. 3 This is in stark contrast to studies of common clinical diseases such as hypertension and diabetes. Therefore, certain specific aspects of trial design need to be taken into consideration when implementing clinical trials involving psychiatric patients, especially those with OCD.
In small trials, such as those typically used to study psychiatric disorders, an imbalance in prognostic factors between treatment arms can affect the interpretation of the results. One of the first methods created to address this problem was stratified allocation of individuals. 4 The use of this method also reduces the need to adjust statistical data analyses for covariates that serve as prognostic factors. The limitation of stratified allocation is that it can only be applied when a small number of covariates are involved. Otherwise, the sample has to be divided into a large number of strata with only a very few patients in each. 5 In order to overcome this limitation, it is necessary to devise alternative methods of minimizing imbalance among treatment arms.
Pocock and Simon 6 devised a general procedure for treatment assignment that minimizes imbalance among individual prognostic factors. Independently, Taves 7 also developed a minimization procedure that will not be discussed in this article, as it is encompassed by the Pocock and Simon method.
Since 1975, various other authors have developed procedures to minimize the risk of imbalance between arms, as reviewed elsewhere. 8,9 Each method has specific limitations and complexities. For example, the optimal allocation technique allows the use of continuous variables in their non-categorical format as covariates. 10 However, the complexity and associated difficulty has limited its practical implementation. No allocation procedure is ideal -accordingly, novel methods are still required in specific situations.
In response to the need for specific minimization procedures in clinical trials involving individuals with OCD, we developed an allocation method that uses the Aitchison distance, [11][12][13] which is a measure for compositional data (data that contains quantitative descriptions of the parts of some whole). The Aitchison distance was chosen because all prognostic factors used are better classified as categorical data with relative frequencies that can be taken as compositional data in nature.
We present allocation results for a partial sample collected during the first year of the study for which this procedure was developed, as well as results for simulated samples that feature variable patient's order of inclusion.

METHOD
The trials reported in this manuscript received prior approval from our local ethics committee.

Allocation
For clarity, let us first consider the simplest design. Consider a clinical trial in which patients are enrolled sequentially, according to the order in which they commence treatment at the clinic. Each patient is to be assigned to one, and only one, of two alternative treatments. Imagine that a new patient arrives after the study already included a considerable number of patients in each of the two arms: n 1 and n 2 . In addition, consider that age, denoted a, is a factor for which we think adjustment should be made. Possible ages are divided into three different categories: a 1 if a < 30; a 2 if 30 < a < 45; and a 3 if a > 45. Consider now the following notation: 1. Represent the sample absolute frequencies of n 1 and n 2 , respectively, as (n 11 ,n 12 ,n 13 ) and (n 21 ,n 22 ,n 23 ), i.e., for i = 1 or 2 and j = 1, 2 or 3, n ij is the number of patients of age group a j in arm i. 2. Represent the compositional vectors (relative frequency vectors), respectively, as A 1 = (a 11 ,a 12 ,a 13 ) and A 2 = (a 21 ,a 22 ,a 23 ). Thus, a ij = n ij /n i represents the relative frequency of patients in age group a j in arm i given the proportion of age group a j in arm i. Note that we make our decision based on the values of the distances between the compositional vectors. We use the term "compositional vectors" to designate vectors for which the sum of their components is fixed and known. In our case, with relative frequencies, the values of the components are numbers in the interval [0;1], and the sum of these components is one. Aitchison 11,12 argued that, for compositional data, the correct distance measure is not the usual standard Euclidian measure but is instead as described below.
In general, consider factor a having k (>1) possible alternatives, i.e., A i = (a i1 ,a i2 ,…,a ik ) is the relative frequency vector of arm i (=1,2). For category j, consider the natural logarithm of the between-arm relative frequency ratio, r j = (a 1j ÷ a 2j ), denoted by ln(r j ), and the mean of these logarithms, denoted by L: ln(r j ) = ln(a 1j ) -ln(a 2j ) and L = {ln(r 1 ) + ln(r 2 ) + … + ln(r k )} ÷ k The Aitchison distance measure between the two compositional vectors A 1 and A 2 is defined as follows: To demonstrate the allocation procedure, let us consider the case of the three age categories a 1 , a 2 , and a 3 , as before.
Suppose that, in one stage of the process, we encountered the following vectors of absolute frequencies: n 1 = (3;7;5) and n 2 = (5;6;6). A new patient enrolls and falls under age category a 2 . Table 1 presents the vectors used to calculate the required distances. The figures above indicate that a new patient in age group a 2 should best be allocated to n 2 . This conclusion might appear obvious to the decision-maker, as it is desirable to increase the frequency of a 2 in n 2 . Note that this choice increases the n 2 sample size. We are also interested to control the arm sample sizes. Let us transform the sample size (s) into a compositional factor. Let us consider the vectors S 1 = (s 1 ;s 2 ) ÷ s and S 2 = (s 2 ;s 1 ) ÷ s as the relative frequencies of n 1 and n 2 , respectively. In our example we would have the following: Clearly, the best allocation would be to the first arm, which features a smaller sample size. Bear in mind that, given the age categories above, our best choice would have been to allocate the new patient to the second arm. However, considering the sample size factor, the first arm is the more appropriate allocation for the new patient. If the decisionmaker feels that age is more important than sample size, a larger weight could be assigned to age than to sample size. For example, consider our decision to use a weight of 2 for age and a weight of 1 for sample size. The weights reflect the importance of covariates and should be prospectively made in agreement with the investigators. We may use the weighted average of the distances to guide this decision. The results for our example are as follows: The overall distance using the two factors indicates that the new patient should be allocated to n 2 in order to bring the arms closer in terms of age and sample size. It may also be possible to assign equal weights to the two factors. Regarding the constructed factor for sample size, to appropriately compare factors, we need to adopt the same range for all factor distance measures. Hence, the definitions of S 1 and S 2 seem quite adequate.
The examples discussed below are more specific and demonstrate several interesting particularities. The first, related to phase one of the trial described, deals with two arms, four prognostic factors, and a simultaneous allocation of three patients each time. The second example, related to phase two, deals with seven prognostic factors and allocation of patients one at a time into the three groups.

Two-phase clinical trial
A study group specializing in OCD wished to conduct a clinical trial consisting of two phases. The objective of phase one was to compare patient responses to pharmacological and psychotherapeutic treatments. They expected that 360 OCD patients would be enrolled in phase one over a 3-year period. However, for logistical reasons, they did not anticipate that the numbers of patients allocated to the individual treatment arms would be similar in phase one. They anticipated that the pharmacological arm may receive more patients than the psychotherapeutic arm at certain times, and vice-versa at other times. These variations are required because, for practical reasons, each therapeutic arm must accommodate a different number of patients at different times in the study. As the psychotherapeutic arm involves group psychotherapy, when groups are first created, a rapid influx of patients is necessary, but when groups are well-established, the speed of inclusion should diminish as only two groups can be conducted simultaneously.
From previous experience, we expected that 60% of the patients treated in either of the two arms of phase one would report less than adequate symptom improvement. Patients who participated in the pharmacological arm (n 1 ) and did not respond to treatment would be invited to participate in phase two. Patients who were non-responders in the psychotherapeutic arm (n 2 ) of phase one would be invited to participate in the pharmacological arm (n 1 ), and, if treatment resistance were to persist in n 1 , they would also be invited to participate in phase two. Phase two consists of three arms, the objective being to compare three different pharmacological augmentation strategies. In phase two, unlike in phase one, it is expected that a similar number of patients will be allocated to each of the three arms. Considering expected response rates, drop-out rates, and frequency of refusals to participate in a study using a placebo, it is likely that 30 to 40 patients will be included in each arm. This sample size seems adequate according to the investigator's hypothesis. The entire clinical trial design is illustrated in Figure 1.
The allocation strategies for the two phases of the trial are different. To accommodate the logistical characteristics of phase one (that require a different pace of patient enrollment for each arm at different study time points), the researcher will receive groups of 3 patients to be allocated into the two groups. On some occasions, it might be necessary to allocate 2 patients to n 1 and 1 to n 2 . On other, well-defined occasions, the situation is reversed, and n 1 will receive 1 patient while the other 2 patients will be allocated to n 2 .

Prognostic factors to be balanced across arms
When the allocation program was developed for phase one, a smaller number of variables was chosen, as we did not know how many covariates this new procedure could accommodate without compromising its efficiency in minimizing arm imbalance. After the initial results of allocation were analyzed for phase one, it became clear that the procedure could accommodate a greater number of covariates. Therefore, the program for phase two was designed with a greater number of hypothesized prognostic factors that could also have been covariates in phase one. The factors used to establish the allocating strategy were chosen based on previous studies. Although the appropriateness of the factors used might be obvious, discussion of these choices is outside the scope of this paper. Our sole intention is to present the allocation strategy used in the study described. The factors and their categories were as follows: 1. Current age (age) was categorized into three classes: under 30 years of age (a 0 ); between 31 and 45 (a 1 ); and over 45 (a 2 ). indicate no OCD symptoms, and we had no patients in this class. 3. Treatment history (his) was divided into three categories in each phase. In phase one, h 0 indicates no previous appropriate treatment, h 1 indicates one previous course of appropriate treatment without response, and h 2 indicates two or more previous courses of appropriate treatment without response. In phase two, h 0 indicates no Y-BOCS score reduction or Y-BOCS score increase after n 1 in phase one, h 1 indicates a 1-20% reduction in Y-BOCS score after n 1 in phase one, and h 2 indicates a 20-35% reduction in Y-BOCS score after n 1 in phase one. 4. Level of education (sch) was divided into four categories: sc 0 indicates no schooling, sc 1 indicates ≤ 8 years of schooling, sc 2 indicates 9-12 years of schooling, and sc 3 indicates higher education (undergraduate or graduate work). 5. Marital status (mar) was categorized as si 0 (married, divorced or widowed) or si 1 (single, never married). 6. Genders (gen) are indicated by m for male and f for female. 7. Sample sizes (sam) are denoted as s 1 , s 2 , s 3 , and s = s 1 + s 2 + s 3 represents the total sample size. Note that there are three arms only in phase two. The compositional vector of sample size in arm i (i = 1,2,3) is the vector (p i ;1-p i ) where p i = s i ÷ s. For each of these factors, the distance between arms can be computed. The distance values for the seven variables listed above are given by the following symbols: Δ age , Δ OCD , Δ his , Δ sch , Δ mar , Δ gen , and Δ sam . The global distances for phases one and two are, respectively, Δ 1 and Δ 2 , defined as follows: Δ 1 = (2Δ age + 3Δ OCD + 3Δ his + Δ gen ) ÷ 9 Δ 2 = (2Δ age + 4Δ OCD + 5Δ his + 2Δ sch + 3Δ mar + Δ gen + 4Δ sam ) ÷ 21 As there are three arms in phase two, there are also three vectors. Hence the global distance for phase two should be the average of the three values of Δ 2 , i.e., the distance for phase two should be as follows:

Phase one allocation strategy
Based on previous studies, factors measurable before treatment initiation may influence the treatment response. Therefore, it is desirable that certain prognostic factors be distributed as homogeneously as possible between the two arms. We assume that previous treatment response (assessed through patient interviews) and severity of the disorder (assessed using a specific scale) at the time of inclusion in the study are the most important prognostic factors for clinical response. In addition, although of lesser importance to treatment response, gender and age should also be homogeneously distributed between arms. For logistical reasons, it was not possible to enroll the same number of patients in each phase one treatment arm. The healthcare professionals who administer the treatments can assist more patients in the pharmacological arm than in the psychotherapeutic arm. Over the course of the study, there are times at which the pace of patient enrollment into the psychotherapeutic arm is, of necessity, more rapid than that of patient enrollment into the pharmacological arm. We expect that the pharmacological arm will account for a higher percentage of the final sample. Due to the time needed to perform the clinical evaluations, there is a twoweek gap between inclusion and allocation. Consequently, it is possible to allocate patients simultaneously in groups of three, one to one arm and the remaining two to the other arm (the arm that receives two patients at certain time points will only receive one at other time points). This simultaneous inclusion of patients guarantees that provider blinding is not compromised by allocation in small blocks.
The choice of prognostic factors to be balanced between arms is based on reports in the literature that suggest that these prognostic factors may influence the results considerably. In addition, having information about these factors at the time of the initial evaluation is feasible and provides useful clues for balancing between intervention groups. As previous treatment response and initial severity are known to be associated with treatment response in clinical trials evaluating OCD patients, 14-18 they were included as prognostic factors in order to balance the arms.

Phase two allocation strategy
Patients entering phase two were those that failed to completely respond to the treatment in n 1 of phase one (regardless of treatment history). The investigators assumed that treatment response and current severity were highly predictive of phase two treatment response, and that these patients should therefore be distributed homogeneously among the three groups. In addition, it is understood from previous studies that level of education and marital status are significantly associated with poor treatment response and should be included in the phase two strategy model. Gender and current age should also be distributed homogeneously among the three arms. Another important factor is sample size, which should be as similar as possible among the three arms.
For each patient in phase two, the allocation to one of the three arms needs to be determined during the evaluation conducted at week 12 of treatment. The strategy at this point involves a sequential, one-by-one allocation of patients who had participated in phase one. Here, in addition to having more than two factors to consider in our attempts to balance the arms, we are dealing with three arms. Bear in mind that though the distance measures are defined for two vectors in order to facilitate decisions regarding patient allocation, we consider the average of the three distances between pairs to represent the overall distance of the three arms. The proposed solutions for the two phases are described below.

Partial results for phase one
To perform stratified allocation of patients, a Microsoft Excel macro was created. This macro divided patients into homogenous intervention subgroups consistent with the factors chosen by investigators. To avoid empty categories in the allocation process, for a factor with k (>1) categories, we added to each category the fraction 1/k. Therefore, no category had a frequency of 0 at the time of Δ distance calculation.
The partition that provides the best degree of homogeneity between groups is defined as the division that minimized the difference between factor vectors of categorized relative frequencies. To calculate this measure, the Aitchison distance between two vectors was used, as previously discussed. This distance was chosen based on the assumption that it preserves the sub-compositional coherence of the simplex space. After the values of Δ 1 were calculated for each possible allocation, the allocation that resulted in the least intragroup distance was chosen as the optimal allocation.
At the time of writing, 152 individuals had been allocated to n 1 and 107 had been allocated to n 2 . The partial allocation results are shown in Table 2. The results of the reverse order allocation of patients appear in parentheses.

Simulated results for phase two
The phase two allocation system has the same objectives as that of phase one, namely to bring the arms as close together as possible in terms of the relevant prognostic factors. However, in phase two, the allocation logistics are different than those in phase one. We now have three arms, and the patients are allocated sequentially one-by-one. In addition, the balancing factors in this phase are different from those considered in phase one. As well as now accounting for level of education, marital status, and sample size, the previous treatment history factor, unlike in phase  one, is based on the relative reduction, during phase one, in the Y-BOCS score. Patients who present a > 35% reduction in Y-BOCS score are not included in this phase. In order to consider sample size as a factor, the proportion of patients in each group will also be included in the distance calculation. Table 3 presents the allocation results for 90 patients selected from those in n 1 of phase one.

DISCUSSION
The objective of using this strategy of random allocation of patients in different treatment arms is to avoid intentional bias in group allocation. However, treatment group imbalance for prognostic factors can still occur. 9 These unfortunate consequences of randomization can only be prevented by using techniques that produce an intentional optimal allocation of patients to each treatment arm.
Another interesting point is the choice of the Aitchison distance to treat compositional data. To illustrate the difference between the Aitchison distance and the standard Euclidian distance, we calculate the two distances for the pairs (A 1 ;A 2 ) and (B 1 ;B 2 ) of relative frequency vectors defined below: Note that, for any category j (= 1, 2, 3), the ratio between frequencies is the same for the two pairs: a 1j ÷ a 2j = b 1j ÷ b 2j We also note that problems occur if a frequency of 0 is listed for a particular class. To address this, we recommend that prior correction be used, commencing the allocation process at a small real number for each category that should be added to the observed absolute frequency. As an analogy to Bayesian categorical data analysis, we adopt an equally constant for each category, as with the standard strategy. Our recommendation is to add 1/k to the absolute frequencies for each category, thereby avoiding the occurrence of a relative frequency of 0 in any category.
The "random" aspect of treatment arm allocation is preserved due to the random sequence in which patients arrive for inclusion in clinical trials. The most important property of the allocation strategy defined here is that we can verify post hoc adherence to the desired strategy. The use of a purely randomized strategy cannot be proven after the trial has been concluded. To verify that the order in which the patients entered the experiment was indeed random, we considered the reverse order of the first 259 patients (phase one) and 90 patients (phase two). In Tables 2 and 3, it can be seen how close the results from the reverse order are to those obtained for the original order. An interesting fact is illustrated in Tables 4 and 5. We can see that more than half of the patients would change arms by reversing the order in which they enter into a phase. In phase one, 130 of the 259 patients would have changed arms if the reverse order had been considered. For phase two, under the reverse order, 48 of the 90 patients would have changed arms.
This article describes a method of computer-based intentional allocation that has the advantage of being flexible, effective, and feasible for the allocation of each consecutive patient, or groups of three patients, in terms of multiple covariates. Future studies are still needed to evaluate the effectiveness of this procedure when a large number of covariates are included. The procedure presented here is reliable for the inclusion of up to six compositional covariates with twenty-one possible alternative results. When using this procedure, researchers could include any allocation strategy variables suspected of influencing treatment response.
One limitation of the proposed procedure is the need to know the value of each prognostic factor, for each patient entering the study, prior to allocation. Consequently, values of prognostic factors that can only be obtained through long-term evaluation cannot be included in the model. Another issue is the weights given to the allocation factors. For the trial in question, the second author (the psychiatrist responsible for conducting the trial) based the choice of variable weights on the evidence levels of the prognostic  factors that influence treatment response. Although this was an arbitrary decision, it does not favor any allocation tendency that might result in arm imbalance. Thus, there is no imbalance that might favor a particular result in terms of treatment response.

CONCLUSION
In conclusion, the procedure described here can be easily adapted to different study designs. The results of the trial, used as illustration, show that the methodology was reliable for the sample tested. There are, however, some reliability issues that require further investigation, such as the inclusion of cases presenting a large number of factors or factors that feature a large number of possible alternative categories. Nevertheless, we believe that this paper presents a procedure that will be useful in avoiding imbalance among treatment arms when patient allocation is performed sequentially.