Short and long period optimization of drug doses in the treatment of AIDS

Numerical optimization techniques are useful in solving problems of computing the best inputs for systems described by mathematical models and when the objectives can be stated in a quantitative form. This work concerns the problem of optimizing the drug doses in the treatment of AIDS in terms of achieving a balance between the therapeutic response and the side effects. A mathematical model describing the dynamics of HIV viruses and CD4 cells is used to compute the short term optimal drug doses in the treatments of patients with AIDS by a direct method of optimization using a cost function of Bolza type. The model parameters were fitted to actual published clinical data. In order to simplify the numerical procedures, the control law is expressed as a series and the sub-optimal control is obtained by truncating the higher terms. When the patient reaches a clinically satisfactory state, the LQR – Linear Quadratic Regulator technique is used to determine the long period maintenance doses for the drugs. The doses computed using the LQR technique tend to be smaller than equivalent constant-dose therapy in terms of increase in the counts of CD4+T cells and reduction of the density of free viruses.

models can be used to optimize the drug doses required in the treatment.The model used in this work was proposed by Tan and Wu (Tan and Wu 1998), and is similar to Perelson (Perelson et al. 1993).The model paremeters are fitted using the actual clinical data found in Pontesilli et al. (1999).
This mathematical model comprises four differential equations representing the uninfected CD4 + T cells, latent infected CD4 + T cells, active infected CD4 + T cells and free viruses.
At present, one of the most popular treatment scheme is HAART (highly active antiretroviral therapy) which uses an association of reverse transcriptase inhibitors (eg.zidovudine and lamivudine) and protease inhibitors (eg.saquinavir, indinavir and ritonavir).After several months of treatment with those drugs, some patients have experienced increase in the side effects such as abdominal girth, abdominal fullness, distention or bloating (Mittler et al. 1998).Others patients reported adverse effects such as headache, malaise, fatigue, nausea, vomiting, cough, nasal symptoms and musculoskeletal pain.
The effectiveness of a particular treatment scheme must be measured in an objective manner.One possibility is to construct a cost function that takes into account the number of non-infected CD4 cells and the administered doses of the drugs.The CD4 cells indicate the effectiveness of the treatment, while the doses of the administered drugs reflect the intensity of the side effects.Once the system model and the cost function are chosen, a variety of optimization techniques can be invoked to compute the best treatment scheme.
It was shown in a previous work (Caetano and Yoneyama 1999a) that it is possible to improve the treatment effectiveness by using a closed loop drug administration strategy.It was also shown (by computer simulation) that it is possible to use optimal control theory to minimize the side effects during a short term treatment scheme.However, the problem becomes hard to solve numerically when long term treatment is to be optimized.Here, after the patient reaches a clinically satisfactory state, a long term maintenance scheme is proposed based on Linear Quadratic Regulator theory (LQR).Small changes in the patient's state are compensated using small changes in the drug doses that are optimal in the sense of minimizing a quadratic type performance index.The LQR control is widely used in the engineering area and references may be found, for instance, in Lewis (1986) and Kirk (1970).

The Dynamic Model
A number of mathematical models have been proposed in the field of immunology, which can be found, for instance, in (Murray et al. 1998, Mittler et al. 1998, Wick 1999, Behrens et al. 1999, Tan and Xiang 1999, Nowak et al. 1991, 1995, 1997and Wein et al. 1998), among other works.The model in (Nowak et al.1991) considers a system of ordinary differential equations with four variables x i , y i , v i and z i which denote respectively strain-specific CD4 cells, total CD4 cells, virus population and cross-reactive CD4 cells.The authors simulate the mathematical model for a number of immunotherapy starting at different times after infection.In (Nowak and Bangham 1996) three dynamic models of HIV infection are compared.The first is the simplest and contains three variables x, y and v denoting, respectively, uninfected cells, infected cells and free viruses.Another model uses four variables where the three first variables are the same x, y, v as before and the new variable z represents CTL lymphocytes.The last model has four variables and includes the variability of the viruses.The model by Phillips (1996) involves four differential equations in variables R, L, E and V which represent, respectively, uninfected CD4, latent infected cells, infected cells and free virions.This model can be used to simulate the initial phase of infection.The model in (Nowak et al. 1995) considers the interaction between CTL and the multiple epitopes of a genetically variable pathogen.The version proposed in (Nowak et al. 1997) includes a population of mutant viruses, and provides analytic approximation for the rate of emergence of resistant viruses.This model comprises five equations and the results match the experimental data of three infected patients treated with Neverapine (NVP).In (Murray et al. 1998), the proposed model uses eight differential equations with the variables that represent the naïve cells.The model in (Wick 1999) takes into account the dynamics of T cells in which rising activation rates yield decreasing T cell counts and where apoptosis and proliferation must nearly balance.This model has four differential equations describing the dynamics of naïve T cells and memory cells in active and latent states.The model in (Zaric et al. 1998) focuses on the simulation of protease inhibitors and development of drugresistant HIV strains.The model is composed of eleven differential equations, including couplings between organisms that are infected with resistant and non-resistant HIV strains.The model in (Tan and Wu 1998 and Tan and Xiang 1999) describe the HIV pathogenesis under treatment by antiviral drugs.The model has four differential equations and stochastic terms in the variables that represent the number of latent infected T cells.It has also stochastic components on infectious free HIV and non-infectious free HIV variables.
The mathematical model used in this work is a simplification of a more general version that includes stochastic terms, as originaly presented by (Tan and Wu 1998).The dynamics is described by the differential equations where ẋ represents the time derivative dx/dt, (2) with x 1 = x 1 (t) ≡ uninfected CD4 + T cells; x 2 = x 2 (t) ≡ latent infected CD4 + T cells; x 3 = x 3 (t) ≡ active infected CD4 + T cells; x 4 = x 4 (t) ≡ free viruses HIV; s: rate of generation of x 1 from precursors; r: rate of stimulated growth of x 1 ; T max : maximum T cells population level; µ 1 : death rate of x 1 ; µ 2 : death rate of x 2 ; µ 3 : death rate of x 3 ; µ v : death rate of x 4 ; k 1 : infection rate from x 1 to x 2 by viruses; k 2 : conversion rate from x 2 to x 3 ; N : number of infectious virions produced by an actively infected T cell; θ: viral concentration needed to decrease s.The coefficients k 1 and k 2 are functions of the drug doses where k 10 , k 20 , α 1 and α 2 are constants.Basically, x 1 cells are stimulated to proliferate with rate λ(x 1 , x 2 , x 3 ) in the presence of antigen and HIV (equation 3).Without the presence of HIV, the rate of generation is S(x 4 ) (equation 2).In the presence of free HIV (x 4 ), uninfected cells x 1 can be infected to become x 2 cells or x 3 cells, depending on probability of the cells to become actively or latently infected with rate ω.The x 2 cells can be activated to become x 3 cells.The activation rate is k 2 .The x 3 cells are short living and will normally be killed upon activation with death rate µ 3 .The x 1 , x 2 cells and x 4 free viruses have finite life and the death rates in this model are µ 1 , µ 2 and µ v respectively.When x 3 cells die, free viruses x 4 are released with rate N(t) described by (4).Drugs such as reverse transcriptase inhibitors (zidovudine and lamivudine) and protease inhibitors (saquinavir, indinavir and ritonavir) affect the dynamics via parameters k 1 and k 2 .

The Sub-optimal Control
The objective in a general optimal control problem is to find a control input m(t) = [m 1 (t)m 2 (t)] T that minimizes the cost function where t 0 and t f are the initial and final times, fixed a priori.The functions h and g are required to be positive.Moreover, x(•) and m(•) are constrained by the state equation In the specific problem treated in this work, where The biological interpretation of the proposed cost functional is that the first term h(x(t f ), t f ) represent the target of maximizing non-infected CD4 after a pre-specified time horizon.The coefficients φ 1 and φ 2 in the integrand g(x(t), m(t), t) are weights that reflect the dose-related side effects of the two drugs (m 1 and m 2 ) which must be adequately balanced.The last term in g(x(t), m(t), t) is included to force x 1 (uninfected CD4 + T cells) to increase with treatment.
Optimal control problems can be solved by indirect or direct methods.In the solution using an indirect method, one is required to solve a boundary value problem with 2n equations corresponding to n state and n adjoint variables if Maximum Principle is invoked or to solve a partial differential equation if Dynamic Programming is used (Kirk 1970, Lewis 1986and Bulirsh and Stoer 1980).In the solution using a direct method, one attempts to minimize directly the performance measure (7) after a suitable parametrization of the admissible control inputs m(t).Here, a direct method proposed by Jacob (1972) is used.The parametrization of the input functions m(t) involves, in the present case, a subset of the coefficients in the series expansion employing sine functions.Therefore, only approximations to the actual optimal m(t) can be obtained.Those approximations are sub-optimal, in the sense that the cost achieved is generally greater when the higher terms of the series expansion are neglected, compared to the case resulting from the use of the actual optimal control.However, those sub-optimal control inputs were found to be satisfactory in the present problem.

Parametrization of the Control Input
The numerical algorithm proposed in (Jacob 1972) is available in the form of a computer program called EXTREM.Each component of the control input m(t) is represented by an expansion over the interval [0, t f ] with the form where the coefficients c i,j are to be determined by minimizing ( 7).

Linear Mathematical Model
Once the patient's state reaches a clinically satisfactory region after the sub-optimal short term treatment, the idea is to use the LQR (Linear Quadratic Regulator) as the long term treatment.In order to obtain a linearized model to be used within the clinically satisfactory region, the data corresponding to the patient A by (Pontesilli et al. 1999) is considered for illustration purposes.For other patients, the clinical data gathered before and during the short-term treatment is used to identify the numerical values of the parameters for the linearized model.The estimated numerical values for the model parameters of patient A are shown in Table I.The graph in Figure 1 shows the values determined by simulation and the actual data for the uninfected CD4 + T cells and HIV free viruses of patient A by (Pontesilli et al. 1999).In the case of constant doses therapy, these were adjusted to be the same as in the actual treatment reported by Pontesilli, i.e., 900 mg of a transcriptase inhibitor and 1200 mg of a protease inhibitor during 224 days.
1 1 1.10 -6 1.10 -6 250.10 3 250.10 3 0.005 0.005 1.10 6 1 1.10 -1 65470 357 10 100 133352 224 For this patient, with the parameters of Table I, one is able to find the equilibrium points by solving the equations (1) after letting dx/dt = 0: The critical points for the patient A are found to be: The third critical point is not of interest, because negative values of uninfected CD4 + T cells can not occur in practice.The second point is the state of a patient with AIDS and the first is the state when the patient is free of HIV.A local analysis of these points show that the first point is unstable while the second is stable.In fact, the eigenvalues of the linearized model at the first point are: and the eigenvalues at second point are: Therefore, the idea of using LQR scheme is to keep the patient's state near the first critical point, which is unstable without feedback control.Initially, the patients use either sub-optimal or constant doses therapy of reverse transcriptase inhibitor plus protease inhibitor until the level of CD4 + T is near to the first critical point, shown in Figure 2.For instance, after a short period constant doses treatment with m * = (900, 1200), the reached state is Now, it is possible to linearize the equations (1) about this state.Thus, for small perturbations near the critical point: An Acad Bras Cienc (2002) 74 (3)

387
corresponding to small adjustments in the control variable (drugs doses) and adopting an additional assumption of negleting the variations of the active and latent cells, one can write In the region of interest, the model becomes: and the observation equations, assuming that only CD4 + T cells and free virus are monitored, becomes: The Linear Quadratic Regulator The LQR is widely used in control engineering because of its simplicity and robustness properties.The action of the control variable is to minimize a performance index: where A, B, Q and R are matrices of appropriate dimensions and R is positive definite.
The solution of the LQR problem is well known and can be found, for instance, in Lewis (1986) and Kirk (1970): where P is to be found by solving the algebraic Riccati equation: The Q matrix is the weight on the states x and R matrix is the weight on the control variable m.The values of Q and R used here are: A larger weight was placed on the reverse transcriptase inhibitor because the objective is to reduce the number of active infected cells.

RESULTS
The actual clinical data were extracted from Pontesilli et al. (1999) and refers to a patient (patient A) that contracted AIDS and his T cell counts were measured during 224 days.Firstly, the model parameters were adjusted by an identification procedure to match the available data.The model parameters are presented in Table I.
Computer simulations were carried out using the model described by equations ( 8)-( 11) and with parameters of Table I.Numerical results were obtained for both constant-dose and sub-optimal treatment schemes with computer simulations.Figure 1 shows the variations on the counts of uninfected CD4 cells and the density of free viruses corresponding to the actual clinical data found in Pontesilli and the computer-generated sub-optimal response.Figure 1 also shows the path A-B followed by a recently infected patient with an infinitesimal quantity of viruses.The point C corresponds to the initial condition used in the simulations.The other components of the state (x 2 and x 3 ) are also close to the actual data.The doses in the case of sub-optimal treatment scheme are lower than the constant dose strategy, as seen in Figure 2.
In order to evaluate the long period maintenance doses for the drugs, the equations ( 8)-( 11) were integrated assuming constant drug doses with values that equal those used in the treatment of patient A in (Pontesilli et al. 1999).After 224 days, the drug administration strategy was switched to the optimal control law based on LQR.
Figure 3 shows the results of the count of uninfected CD4 + T cells and free viruses, respectively, under LQR drug doses.It is seen that the level of CD4 + T increases while the free viruses are further reduced.Figure 4 shows two curves correspondig to the case of a two phase treatment scheme, where in the short term the doses are kept constant and in the long term the LQR is used.The doses are seen to adjust themselves towards stationary values with a slight and progressive reduction of the reverse transcriptase inhibitor and a significant reduction of the protease inhibitor.

DISCUSSIONS AND CONCLUSIONS
Numerical optimization methods based on mathematical models can be of value in providing drug administration schemes that lead to a good compromise between therapeutic and side effects.A drawback is the necessity to estimate the model parameters for each patient, which may be a formidable task.However, simulation studies carried out with clinical data indicate that even models with considerable uncertainty can yield useful results.The sub-optimal treatment scheme may produce inferior results when compared to the actual optimal scheme, but in view of the subjective nature of the cost function and of the required computational effort, the approximation obtained by the proposed method may still be adequate.An interesting fact is that for other patients reported in Pontesilli, (patients B, C and D) the proposed method yields good numerical results obtained by computer simulation.
The combined short and long period treatment schemes aims at providing a standard and effective treatment in the attack phase and, after the improvement of the patient's condition, a maintenance treatment based on optimized doses that has less side-effects yet keeps the health condition in a clinically satisfactory region.In the previous results by the authors, the original non-linear model was used to compute the optimal strategy for the whole control horizon (see, for instance, Caetano and Yoneyama 1999b, Felippe de Souza et al. 2000, Kirschner and Webb 1996and Kirschner et al. 1997).However, it was very difficult to extend them for computing long term treatment schemes.In the proposed new approach, the sub-optimal technique (or even a standard constant-doses treatment) is required only to bring the health conditions to a satisfactory state so that the much simpler LQR method can be used to compute the doses for the long term treatment.Moreover, because of the usual robustness properties of the LQR regulator, the controller tends to be less sensitive to modeling errors (Lewis 1986).

Fig. 1 -
Fig. 1 -Comparison between sub-optimal (computer simulation) and constant-dose treatment schemes (actual data).The point A indicates initial infection; B represents an advanced stage on AIDS without treatment; C represents the condition at the beginning of the sub-optimal treatment.

Fig. 2 -
Fig. 2 -Drug doses used in the computer simulations.