Appraisal of Phlebotomus argentipes habitat suitability using a remotely sensed index in the kala-azar endemic focus of Bihar , India

The transmission of kala-azar, or visceral leishmaniasis (VL), is strongly influenced by the available moisture, medium to high temperatures and high relative humidity (RH) because the sandfly, Phlebotomus argentipes, requires damp surfaces and humid atmospheric conditions for prolonged survival, which is needed to transmit the infection (Bhunia et al. 2010a, Picado et al. 2010, WHO 2010a). Few studies have demonstrated the usefulness of remote sensing data in mapping the environmental risk factors, including diurnal temperature variations, ecoenvironments, vegetation health and land use practices, that control, in part, the distribution of tropical diseases, including leishmaniasis, schistosomiasis, trypanosomiasis and malaria (Robinson et al. 2002, Graves et al. 2009, Bhunia et al. 2012). Using remote sensing to identify the biophysical and environmental variables that are adequate for the development of various infectious diseases (Beck et al. 1994, Gillies & Carlson 1995, Kustas & Norman 1996, Gillies et al. 1997, Combie et al. 1999, Oscar & Malone 2001), allows for the determination of risk factors and the delimitation of areas at risk, thereby enabling a more rational allocation of resources for cost-effective control (Beck et al. 1997, 2000, Bhunia et al. 2010b). Such environmental features may be used to appraise favourable environments for the development of vectors implicated in disease transmission (Werneck et al. 2002, Lindgren et al. 2004). The land surface temperature (LST) and the renormalised difference vegetation indices (RDVI) were the two parameters used in the present study of a kala-azar focus in Bihar, India (Fig. 1). Temperature affects kala-azar transmission in two ways: either the minimum temperature may be so low that it prevents parasite and vector development or the temperature may be too high, resulting in the increased mortality of the vector. A monthly mean maximum temperature of < 37oC and a monthly mean minimum temperature of > 7.2oC are favourable ecologic factors for the transmission of kalaazar (Napier 1926). Temperature is an important factor when determining the distribution of the sandfly, previous research showed that regions with temperatures that drop to 7oC are rarely at risk for kala-azar epidemics and can be disregarded. Temperature can be measured at ground stations or using satellite instruments, that have the ability to measure the land surface temperature; such measurement ability is an important advantage when meteorological stations are non-existent. The products derived from the LANDSAT-5 Thematic Mapper (TM) of 2009 and 2010 were applied to calculate the available LST and RDVI for the peak and lean seasons of sandflies. These are considered to be environmental risk factors for infection with kala-azar in India. For example, in a previous study by Gebre-Michael et al. (2004) in East Africa, Advanced Very High Resolution Radiometer (AVHRR) satellite data were used to map the occurrence of Phlebotomus orientalis and Phlebotomus (Synphlebotomus) martini, which were best predicted by Financial support: ICMR Senior Research Fellowship SK and GSB contributed equally for this work. + Corresponding author: drpradeep.das@gmail.com Received 11 June 2012 Accepted 12 November 2012 Appraisal of Phlebotomus argentipes habitat suitability using a remotely sensed index in the kala-azar endemic focus of Bihar, India

Such environmental features may be used to appraise favourable environments for the development of vectors implicated in disease transmission (Werneck et al. 2002, Lindgren et al. 2004).
The land surface temperature (LST) and the renormalised difference vegetation indices (RDVI) were the two parameters used in the present study of a kala-azar focus in Bihar, India (Fig. 1).Temperature affects kala-azar transmission in two ways: either the minimum temperature may be so low that it prevents parasite and vector development or the temperature may be too high, resulting in the increased mortality of the vector.A monthly mean maximum temperature of < 37ºC and a monthly mean minimum temperature of > 7.2ºC are favourable ecologic factors for the transmission of kalaazar (Napier 1926).Temperature is an important factor when determining the distribution of the sandfly, previous research showed that regions with temperatures that drop to 7ºC are rarely at risk for kala-azar epidemics and can be disregarded.Temperature can be measured at ground stations or using satellite instruments, that have the ability to measure the land surface temperature; such measurement ability is an important advantage when meteorological stations are non-existent.The products derived from the LANDSAT-5 Thematic Mapper (TM) of 2009 and 2010 were applied to calculate the available LST and RDVI for the peak and lean seasons of sandflies.These are considered to be environmental risk factors for infection with kala-azar in India.For example, in a previous study by Gebre-Michael et al. (2004) in East Africa, Advanced Very High Resolution Radiometer (AVHRR) satellite data were used to map the occurrence of Phlebotomus orientalis and Phlebotomus (Synphlebotomus) martini, which were best predicted by Appraisal of Phlebotomus argentipes habitat suitability using a remotely sensed index in the kala-azar endemic focus of Bihar, India the wet and dry season models, respectively, based on remotely sensed variables, normalised differential vegetation indices (NDVI) and LST.That study, however, was limited by the coarse spatial resolution (1 km resolution at the nadir) of the AVHRR sensor.
The present study examines the usefulness of environmental parameters, such as the LST and the RDVI, to assess the association with the abundance of the vector P. argentipes to develop local maps of risk and map the kala-azar transmission in endemic areas of the Indian sub-continent.

MATERIALS AND METHODS
Study area -The district of Muzaffarpur was selected as a representative region of endemic focus in Bihar.It lies between north latitudes 25º54'00"-26º23'00" and east longitudes 84º53'00"-85º45'00".The total population of the district is 4,778,610 with a density of 1,506 persons per km 2 ; based on the 2011 census, the district had a decennial growth of 27.54% (census2011.co.in/census/district-/68-muzaffarpur.html).All 14 public health centres within the district are affected by this disease (Fig. 1).The drainage system of the area originates from the Himalayas and converges into the major rivers of the district, which are primarily drained by the rivers Burhi Gandak, Baghmati and Baya, which generally flow in the south-easterly direction.The district experiences a severe winter followed by a very hot summer (44ºC) and then a heavy monsoon downpour.The district receives an average rainfall of 1,280 mm (muzaffarpur.bih.nic.in/) and has an average elevation of 47 m.The soil of the entire district is highly fertile, well drained and sandy, white coloured and very soft.The annual recharge of ground water bodies constitutes a replenishable or dy-namic resource (cgwb.gov.in/District_Profile/Bihar/Muzaffarpur.pdf).
Sandfly collection -Adult sandflies were collected between September 2009-February 2010 within the study area.The identification of houses for the sandfly collection was performed in two steps.First, 51 villages were randomly selected within the districts based on high cases incidence data (an average incidence rate of more than 10/10,000 population in the last three consecutive years (2006)(2007)(2008)(2009) and in each village, 10 households were selected randomly at the centre of the village, for the sandfly collection (Fig. 1).The sandfly collection was performed separately in each season for the same villages and at similar collection sites.To determine the sandfly density, flies were collected for 10 min from two indoor (i.e., living room and cattle shed) resting places; this collection was performed by trained field workers, using the hand-held aspirator technique with three-celled torches (Kumar et al. 2009, WHO 2010b).The sandfly collection was performed at dawn and dusk.The collected sandflies were stored in 70% ethanol in vials that were labelled with the area, village name and number of sandflies caught.All species were mounted on micro slides using Canada balsam, as a measuring media (Remaudière 1992).Lewis (1978) was followed for species identification.However, sandfly density [man-hour density (MHD)] was calculated using the total number of sandflies collected per man per hour (Kumar et al. 2009, Mishra et al. 2012).
Image pre-processing -Landsat-5 TM images (path/ row: 141/42) dated 22 October 2009 and 11 February 2010 were used in this study.The data acquisition dates had very clear atmospheric conditions and the images were acquired through the United States Geological Sur-United States Geological Survey's (USGS) for Earth Resource Observation Systems Data Centre.The Landsat images were further rectified to a Universal Transverse Mercator projection system and World Geodetic System (WGS) 84 datum based on 1:50,000 scale topographic maps and were resampled using the nearest neighbour algorithm with a pixel size of 30 m by 30 m for all bands, including the thermal band.The resultant root mean square error was found to be less than 0.5 m/pixel.
Retrieval of LST -The LST was derived from the corrected TM thermal band (10.40-12.50µm).Satellite thermal infrared (TIR) sensors measure the top of the atmosphere radiances, from which the brightness temperatures (also known as the blackbody temperatures) were derived using Planck's law (Dash et al. 2002).
The following equation was used to convert the digital number of Landsat TM TIR band into spectral radiance (L λ ), following Jensen (2005): where K = radiance per bit of sensor count rate = (L max -L min )/C max , BV ijk = brightness value of pixel, C max = maximum value on the colour correlated temperatures (e.g.8-bit = 255), L max = radiance measured at detector saturation (Wm -2 sr -1 µm -1 ) and L min = lowest radiance measured by a detector (Wm -2 sr -1 µm -1 ).
Fig. 1: location map of the district of Muzaffarpur, Bihar, India.
The next step was to convert the L λ to the at-satellite brightness temperature [i.e., the black body temperature (T b )] using an inversion formula (Singh et al. 1998, LPSO 2002) of Planck's function.The conversion formula is: where T b is the effective at-satellite temperature in Kelvin, L λ is the L λ in Wm -2 sr -1 µm -1 and K 1 and K 2 are the pre-launched calibration constants.For Landsat-5 TM, K 1 = 607.76 and K 2 = 1260.56Wm -2 sr -1 µm -1 .
The temperature values obtained above are referenced to a blackbody.Corrections for spectral emissivity (ε) therefore become necessary according to the nature of land cover.We used a formula proposed by Van de Griend and Owe (1993) to calculate the ε using visible and near-infrared (NIR/RED)spectral reflectance.In previous studies, Artis and Carnahan (1982) and Sobrino et al. (2004) developed a model that used spectral surface ε and NDVI values of the particular scene.To calculate the LST we used the following equation developed by Artis and Carnahan (1982): where λ = wavelength of emitted radiance [for which the peak response and the average of the limiting wavelengths (λ = 11.5 µm) (Markham & Barker 1985) was used], ρ = h x c/σ (1.438 x 10 -2 mK), σ = Boltzmann constant (1.38 x 10 -23 j/k), h = Planck's constant (6.626 x 10 -34 Js) and c = velocity of light (2.998 x 10 8 m/s).
Derivation of RDVI -Amongst the classic and more recent vegetation indices based on the NIR/RED slope, only the RDVI index showed a comparable correlation with biophysical parameters, primarily with the leaf area index (LAI) (Vincini et al. 2007).RDVI is a hybrid index (Roujean & Breon 1995) between different vegetation indices (= NIR-RED) (Tucker 1979) and NDVI (Rouse et al. 1974) and should combine the advantages of low and high vegetation coverage.RDVI obtained greater field segmentation than NDVI indices that saturate a low LAI (Zarco-Tejeda et al. 2005).To calculate the RDVI, we used the following equation: The RDVI values were obtained for a 500 m diameter buffer zone on the 51 survey points.The relationship between the sandfly density and the minimum, maximum and mean RDVI values were obtained through correlation analyses.
Environmental information extraction for model development -The RDVI and LST values for the peak (September) and lean (February) seasons were extracted from a circular 500 m buffer area around a survey site (51 sampling sites).For each buffer zone, the minimum, maximum and mean values for LST and RDVI were extracted.Analysis using a scatter diagram made by plotting the extracted mean values against the sampling sites allowed for the definition of a range of minimum, maximum and mean LST and RDVI values.
Statistical analysis -Data were analysed using statistical software Stata version 10 (stata.com/).The month variable was transformed into "season" as an ordinal variable, considering the lowest vector density in winter (lean season) and the highest during and September and October (peak season).We explored the relationships between the seven explanatory variables (i.e.minimum RDVI, maximum RDVI, mean RDVI, minimum LST, maximum LST, mean LST and season) and the independent variable (P.argentipes density) by computing the Pearson's correlation coefficient in the 'r' environment.Student's t test (2-tailed) was used to assess the significance.However, in our analysis, only the density of female P. argentipes was considered because it was the proven vector of Indian kalaazar.Because of the correlations and interactions among the explanatory variables, the correlation coefficient may reveal only part of the relationship between P. argentipes density and the explanatory variables.Therefore, we also applied a multivariate linear regression analysis to identify variables that explain the density in combination with the other variables.Furthermore, multivariate linear regression analysis provided the percentage of variability in P. argentipes density explained by the chosen explanatory variables.We applied a backward selection method to eliminate the variables that added little to the overall explanation of P. argentipes density.The results were considered to be significant if p < 0.05.

RESULTS
Sandfly collection and density estimation -A total of 1,481 sandflies belonging to three species of the genus Phlebotomus and Sergentomyia were collected.Amongst the total collected flies, P. argentipes was found to be the most abundant species, accounting for 70.49% of sandflies (Table I), while Sergentomyia comprised 27.84% of the flies that were identified within the districts.By contrast, Phlebotomus papatasi was very rare (4.62%) in the Muzaffarpur.During the study period, the aggregate population of sandflies was found to be lowest during the lean season (26.74%) whereas, during the peak season, the relative abundance of the sandfly density was 73.26%.
Fig. 2 shows the MHD of the lean and peak seasons for all of the collection sites.As observed in Fig. 2, the MHD of the P. argentipes was relatively low (average MHD 2.13) during the lean season (December-February) and relatively high (average MHD -5.60) during the peak season (September-November).During the peak season, the maximum MHD was collected from the village of Chapra Bahar (10.90 MHD) of Mushari, whereas in the lean season the maximum MHD was collected from the village of Bajidpur Manjhauli (5.5 MHD) of Bochaha.The MHD of P. argentipes during the lean season ranged from 0.12-5.50[standard deviation (SD) ± 1.18], while in the peak season the MHD varied between 1.2-10.90(SD ± 2.61).
LST and its relation to the P. argentipes density -Fig.3 shows the spatial distribution of the surface temperatures during the lean and peak seasons derived from Landsat-5 TM.The LST ranged from 23.07-39.27ºC(mean ± SD 31.01ºC ± 5.05) for the peak season and 18.34ºC-31.05ºC(mean ± SD 25.50 ± 4.76) during other times.The image indicated that the central part of the region exhibited a high temperature primarily due to the presence of waste land, bare soil and fallow land.Some other parts of the image also showed high temperatures i.e., in the south and south-west, primarily due to waste and fallow land.
The linear association between the minimum, maximum and mean LST with sandfly density was examined.The results showed that there is a strong and positive relationship between sandfly density and the maximum and mean LST values (r = 0.57, p < 0.035; r = 0.63, p < 0.002, respectively).The smallest correlation was found with the minimum LST (r = 0.31) during lean season.The strongest positive correlation also existed between sandfly density and the minimum LST (r = 0.65, p < 0.026), followed by the mean LST values (r = 0.64, p < 0.016) during the peak season.In the peak season, the maximum LST exhibited a moderately significant relationship.
Overlaying the LST map on the spatial distribution of P. argentipes density demonstrated that areas with LST values of 20-24ºC, generally coincided with areas with high numbers of MHD (Fig. 3) during the lean season.Alternatively, during the peak season, the maximum MHD of the female P. argentipes density was recorded with LST values of 29-32ºC.

RDVI and its relation to P. argentipes density -
The spatial distributions of RDVI for the peak and lean seasons derived from the Landsat image were mapped (Fig. 4).The RDVI values were estimated in the range of 0.08-1.18(mean ± SD 0.63 ± 0.32) for the peak season and 0.08-1.85(mean ± SD 0.96 ± 0.51) for lean season.A lower RDVI value (blue colour) corresponded to a high density of water bodies and built-up areas within the study site.A lower value of RDVI indicated less vegetation associated with a saturation deficit, whereas high RDVI values indicated the highly dense and healthy vegetation cover in the study area.Higher RDVI values (green colour) were observed in the central and southern part of the image due to land covered with mango and lychee plantations.Medium RDVI values (light green areas) were observed over agricultural croplands, in the central, northern and south-eastern parts of the image.This result indicated not only distinct computation procedures for deriving the vegetation density, but also that the area covered by less vegetation has a saturation deficit.
The Pearson's correlation coefficient between RDVI values and P. argentipes density was calculated.The results of our analysis showed that the minimum and mean RDVI values of both seasons (i.e., lean and peak) tended to be negatively correlated with the sandfly density.The highest negative correlation was found with the mean RDVI (r = -0.66,p < 0.002) and the maximum RDVI (r = -0.55,p < 0.020) during the peak season, followed by the minimum RDVI (r = -0.53,p < 0.045) during the lean season.The smallest correlation was observed with maximum the RDVI (r = 0.13, p < 0.238) during the lean season.RDVI indicators for both the lean and peak seasons showed a strong, negative correlation with the female P. argentipes density; thus a higher sandfly abundance may be associated with a lower RDVI value or less dense vegetation cover.As shown in Fig. 4, the spatial distribution of P. argentipes density and RDVI composites during the lean season illustrated that areas with medium RDVI values generally corresponded to areas with the highest P. argentipes MHD.The analysis also demonstrated a similar pattern of P. argentipes abundance during the peak season; e.g., the maximum MHD was recorded in the areas with medium to low RDVI values.This suggests that P. argentipes has a preference for areas that are relatively wet and have lower vegetation density.
A statistical model for delaminating the association between environmental variables and P. argentipes abundance -Multivariate regression analysis was performed to determine the significant environmental variables that affect P. argentipes density.After removing the non-significant variables from the full regression model, the following explanatory variables remained: mean RDVI, mean LST, minimum LST and season (lean and peak).The regression coefficients and significance levels are shown in Table II.The final model used to assess sandfly density was the following equation: Y = -15.69+ (-2.19 x mean RDVI) + (0.57 x mean LST) + (0.50 x minimum LST) + (-2.88 x season) where Y (MHD) is the estimated sandfly density.The final model was highly significant (F = 68.66,p < 0.0001), which meant that these four variables, when considered together, were significantly associated with P. argentipes abundance.The adjusted R 2 = 0.74 indicated that nearly 74% of the variance of P. argentipes density could be explained by these four significant environmental variables.The results indicated that sandfly density increases with (i) decreasing mean RDVI, (ii) increasing mean LST, (iii) increasing minimum LST and (iv) peak season or lean season (i.e., lean season considered as a referent category).

DISCUSSION
Like many other diseases, kala-azar is a communicable and infectious disease and its distribution, incidence and prevalence are greatly influenced by environmental factors.Our primary aim was to use satellite-derived environmental variables (e.g., LST and RDVI), as proxies for air temperature and vegetation conditions for P. argentipes habitat suitability.In the present study, medium resolution satellite data were used to reflect different aspects of the natural environment of P. argentipes in the study area during the peak and lean seasons.Previously, researchers have used different approaches to study the risk of leishmaniasis transmission in different parts of the world (Thomson et al. 1999, Gebre-Michael et al. 2004, Valderrama-Ardila et al. 2010, Barón et al. 2011, Ölgen et al. 2012).However, these studies were not based on variables measured using medium or high-resolution spatial data and the studies were conducted in different vector species.Thus, this study is the first attempt to identify the suitable habitat for P. argentipes abundance using these environmental variables in the Indian subcontinent.In previous studies, the density of the vector P. argentipes started increasing in the pre-monsoon and post-monsoon season, when the mean temperature ranged from 27.5-31ºC and the RH ranged from 73-93% (Sharma & Singh 2008, Bhunia et al. 2010b).These studies were based on ground observations and data derived from the Indian Meteorological Department (IMD) station; satellite data were not used to estimate the surface temperature.During the warmer months, the density is minimal (Napier 1926, Smith 1959, Ranjan et al. 2005) and the temperature in the area ranges between 40-46ºC; the species also disappeared during the winter months, i.e., the lean season (Smith 1959, ICMR 2010).This contrasts with results obtained from remote sensing data and the percentage surface area occupied by the LST and the RDVI may be used to estimate the abundance of female P. argentipes on the Indian sub-continent.The analysis can identify the probable areas of P. argentipes abundance such that areas mapped as transmission and non-transmission zones appear to accurately fit the real situation.Such a study will help control kala-azar cases vis-à-vis the vector on the Indian sub-continent.
The intricacies of LST determination may be relevant to epidemiologists when used as a proxy index in a kala-azar risk model to delineate the favourable areas for vector abundance.Determination of the spatial and temporal variabilities in LST, for example, may be used as correlative index of vector abundance (Malone et al. 1994, Rogers et al. 1996).The LST is computed from a combination of spectral thermal channels of the Landsat TM (channel 6 ).For each single measurement covering an area, LST integrates to the temperature at the surface of that area, e.g., the soil and top of the canopy temperatures.A vector abundance vis-à-vis case was lower and negligible when the temperature increased and/or decreased.However, in our study, we found that the mean and minimum LST values were significantly associated with P. argentipes abundance.However, the maximum MHD during the peak season (i.e., October) was recorded, with LST values ranging from 29-32ºC.Thus, we suggest that the utility of LST in disease monitoring may be significantly enhanced in epidemiological research, especially for kala-azar transmission.For a more accurate result, LST data collected monthly would be required for input into the kala-azar risk model to build a better average representative reading of the studied year's LST through multi-temporal analysis.
In epidemiology and more generally, vegetation type may be most relevant, in that it reflects and modifies land surface processes such as energy or materials exchange modelling.For example, there are trends towards the deemphasis of species composition and an increased focus on rate-limiting factors associated with nutrient availability, resource scaling and carbon allocation (Maguire et al. 1996, Goetz & Prince 1999, Carneiro et al. 2004, Bavia et al. 2005).In this circumstance, an indirect link is established between the vegetation index and P. argentipes density, such that the vegetation index may be used as a secondary variable for prediction.In our findings, relationships between the RDVI and the saturation deficit and vector density, which have been shown to be negatively associated, are likely to be non-linear, which are negatively associated with saturation deficit and vector density, when they do occur, are likely to be non-linear.Mean RDVI values are extremely valuable and effective for analysing the conditions of P. argentipes abundance.In this study, we produced a detailed map of RDVI (Fig. 4) that was calculated for a 500 m distance from the centre of the sampling sites and analysed for relationships with the abundance of a sandfly species.In this event, our results prompted the hypothesis that the green biomass would have responded to the same environmental triggers as sandflies at that location.Similarly, there are frequently robust associations of disease and vector abundance with the amount and density, rather than the species composition of vegetation cover (Rejmankova et al. 1991, Hay et al. 1998, Thompson et al. 2002).
Our study suggests that these two factors (i.e., LST and RDVI) are important for the successful determination of P. argentipes abundance on the Indian subcontinent.The findings from this study also support the hypothesis that the extent to which remotely sensed data have been useful for analysing the local environmental conditions where VL cases as well as vectors have and have not been observed, depends on LST and RDVI.However, an important limitation of our study was the sampling of sandflies once during each season for only 10 min.This limited collection might have led to high variability in the data and bias in the associations be-tween the analysed factors and the P. argentipes abundance.However, our results are of interest and to add to the obtainable knowledge in the field by elucidating possible relationships between P. argentipes density and environmental variables; our results further provide opportunities for investigation.The information from this study improves our understanding of the effects of on-going ecologic processes that affect P. argentipes abundance and might be useful for developing new input in kala-azar risk models for effective VL control programmes on the Indian sub-continent by providing valuable information on the preferred periods and sites for applying insecticides.
Financial support: ICMR Senior Research Fellowship SK and GSB contributed equally for this work.+ Corresponding author: drpradeep.das@gmail.comReceived 11 June 2012 Accepted 12 November 2012

Fig. 4 :
Fig. 4: distribution of the renormalized differential vegetation indices (RDVI) and vector density in 51 villages during lean season (A) and distribution of the RDVI and vector density in the same villages during the peak season (B).

Fig. 3 :
Fig. 3: distribution of the land surface temperature (LST) and vector density in 51 villages during lean season (A) and distribution of the LST and vector density in the same villages during the peak season (B).

TABLE I
Season wise collection of sandfly in district ofMuzaffarpur, Bihar, India (September 2009-February 2010)