Can chlorophyll-a in meso-oligotrophic shallow waters be estimated using statistical approaches and empirical models from MODIS imagery ?

Accurate estimation of chlorophyll-a (Chl-a) concentration in inland waters through remote-sensing techniques is complicated by local differences in the optical properties of water. In this study, we applied multiple linear regression (MLR), artificial neural network (ANN), nonparametric multiplicative regression (NPMR) and four models (Appel, Kahru, FAI and O14a) to estimate the Chl -a concentration from combinations of spectral bands from the MODIS sensor. The MLR, NPMR and ANN models were calibrated and validated using in-situ Chl -a measurements. The results showed that a simple and efficient model, developed and validated through multiple linear regression analysis, offered advantages (i.e., better performance and fewer input variables) in comparison with ANN, NPMR and four models (Appel, Kahru, FAI and O14a). In addition, we observed that in a large shallow subtropical lake, where the wind and hydrodynamics are essential factors in the spatial heterogeneity (Chl-a distribution), the MLR model adjusted using the specific point dataset, performed better than using the total dataset, which suggest that would not be appropriate to generalize a single model to estimate Chl-a in these large shallow lakes from total datasets. Our approach is a useful tool to estimate Chl -a concentration in meso-oligotrophic shallow waters and corroborates the spatial heterogeneity in these ecosystems.


INTRODUCTION
Chl-a is an important indicator of the trophic state in lakes and reservoirs, showing patterns associated with internal processes and natural stressors (HONEYWILL; PATERSON; HAGERTHEY, 2002;DUKA;CULLAJ, 2009;SCHALLES et al., 1998).Detection and quantification of Chl-a are essential for assessing water quality in these ecosystems, as the concentration of this compound provides critical knowledge of the phytoplankton community.However, most traditional methods used to retrieve Chl-a concentrations are based on in-situ measurements, and are time-consuming and difficult to apply to large areas (GHOLIZADEH; MELESSE; REDDI, 2016;KASPRZAK et al., 2008), especially with frequent measurements.In recent years, Chl-a concentrations have often been estimated by remote sensing, which is an effective means of rapid, high-frequency data acquisition (ALLAN et al., 2011;OLMANSON;BREZONIK;BAUER, 2011;SHI et al., 2015).In addition, this technique allows one to obtain information on remote sites and larger areas.
Currently, bio-optical algorithms based on radiometric measurements (e.g., solar irradiance, sky radiance) have been used to obtain quantitative information of substances present on water bodies (OGASHAWARA; MISHRA; GITELSON, 2017).These algorithms can be classified in empirical, semiempirical, semi-analytical, quasi-analytical or analytical algorithms (OGASHAWARA, 2015), being the empirical and empirical algorithms commonly used to retrieve Chl-a concentrations (DUAN et al., 2010;LE et al., 2013;RITCHIE;ZIMBA;EVERITT, 2003;ROSA NETO et al., 2015).Limitations of Chl-a detection by remote sensing include atmospheric correction methods and sensor limitations, as well as the influences of detritus, the presence of colored dissolved organic matter (CDOM), and scattering by Total Suspended Matter (TSM), which are difficult to detect because they affect the optical properties of water (DARECKI; STRAMSKI, 2004;HU, 2009;WU et al., 2009).Another limitation is the light reflected off the bottom, which may affect the accuracy of the empirical algorithm because the signal received by the sensor varies as a function of the wavelength and with the clarity of the water (CARDOSO et al., 2012;SHUBHA, 2000;LEE et al., 2001).
Monitoring Chl-a concentrations from remote sensing in freshwater environments such as shallow lakes has limitations, which are associated with low signal noise ratio and the optical complexity of these waters being highly variable between and even within water bodies (GHOLIZADEH; MELESSE; REDDI, 2016; PALMER; KUTSER; HUNTER, 2015b).In addition, in large shallow lakes, which are rarely studied, especially in meso-oligotrophic environments submitted to intense wind regimes, limitations such as scale factors (e.g., representative monitoring at a single point), spatial gradients and hydrodynamics might significantly affect the interpretation of results from remote-sensing data and the performance of the empirical algorithms (CHAVULA et al., 2009;OGASHAWARA et al., 2014;RUIZ-VERDÚ et al., 2016).
In this study, we (1) proposed and compared multivariable empirical models to retrieve Chl-a concentrations, using MODIS data, (2) determined the accuracy of these models in a large shallow subtropical, and (3) verified if generalized models can represent the spatial heterogeneity of the lake.We used multiple linear regression, artificial neural network, nonparametric multiplicative regression, and four models (Appel, Kahru, FAI and O14a) to retrieve the Chl-a concentration in the Lake Mangueira, a large shallow subtropical lake located in Southern Brazil.This approach is the first attempt to assess the application of remote sensing techniques in such a system, which the oligotrophic conditions are common most of the time (CROSSETTI et al., 2013;LIMA et al., 2016), and the spatial heterogeneity has large differences between the littoral and pelagic zones (CARDOSO et al., 2012;CROSSETTI et al., 2013;THEY;MARQUES;SOUZA, 2012).
Differently from previous studies, our approach is a first step toward to obtain comprehensive and reliable empirical models for meso-oligotrophic shallow waters, which may be further tested with remote-sensed data in order to retrieve the Chl-a concentration and help to understanding the shallow lakes heterogeneity.

Study area
Lake Mangueira is a large shallow subtropical waterbody located in Rio Grande do Sul, Brazil.The lake is 820 km 2 in area and has a mean depth of 2.6 m, a maximum depth of 6.9 m, and is 90 km long and 3-10 km wide (Figure 1).Its trophic state ranges from oligotrophic to mesotrophic (CROSSETTI et al., 2013;FRAGOSO JUNIOR et al., 2011).The regional climate is Munar et al. subtropical with a mean annual temperature of 16 °C and annual rainfall between 1,800 and 2,200 mm (KOTTEK et al., 2006).

Field data
The field data were collected over a period of 9 years (2001 to 2009) at three sampling points in the North, Central and South parts of Lake Mangueira, TAMAN, TAMAC and TAMAS respectively, measured two to three times a year.The dataset contained concentrations of Chl-a, Secchi disk depth and other limnological variables (i.e., Total Suspended Solids (TSS), Dissolved Oxygen (DO), pH and Depth), collected from the three sampling points, and indicated a wide range of Chl-a concentrations at the Central point (Table 1).We used the data from the 2001 -2005 period for calibration and from the 2006 -2009 period for model validation.
For Chl-a determination, surface water samples were filtered through Whatman GF/F with 90% ethanol and measured by spectrophotometry (JESPERSEN;CHRISTOFFERSEN, 1987).Dissolved Oxygen, pH and Depth were measured with a multiparameter probe (YSI 6600).Total Suspended Solids were assessed gravimetrically by water evaporation in porcelain dishes (EATON; CLESCERI, 1999) and the Water Transparency (WT) was measured using a Secchi disk.

Satellite data and image processing
MODIS-Terra (MOD09GA) and MODIS-Aqua (MYD09GA) Level-2 Surface Reflectance products with a daily frequency and a 500-m spatial resolution were downloaded via the web interface The Land Processes Distributed Interface Active Archive Center (LP DAAC), from The US Geological Survey EROS Data Center (LPDAAC, 2015).These products have been successfully used in different inland waters, including oligo-to mesotrophic reservoirs (surface area: 778 km 2 ) (CURTARELLI et al., 2015;OGASHAWARA et al., 2014), Amazon floodplain lakes (surface area: 2000 km 2 ) (NOVO et al., 2006), Minnesota lakes (lake surface area > 160 ha) (KNIGHT; VOTH, 2012), Poyang Lake (surface area: 3000 km 2 ) (QI et al., 2016), and coastal lagoons (900 km 2 ) (SRICHANDAN et al., 2015).The cloud cover fraction was obtained using a cloud mask product from MODIS product level 2 (named M*D35L2) (ACKERMAN et al., 1998).In addition, MODIS quality product was also used to assess image quality (ROY et al., 2002).The sinusoidal projection images (ISIN) from the LP DAAC platform were transformed to geographic projection (Lat/Long) using the MODIS Reprojection Tool..The MOD09GA, MYD09GA and M*D35L2 products were processed and analysed using a MatLab  routine.To retrieve Chl-a concentrations, we used the surface reflectance from the first six spectral bands of the MOD09GA and MYD09GA products, with the centre wavelength location respective (band 1: 648 nm, band 2: 858 nm; band 3: 470 nm; band 4: 555 nm; band 5: 1240 nm; and band 6:1640 nm).
We downloaded a total of 50 MODIS images (32 MODIS-Terra and 18 MODIS-Aqua) matching the in-situ measurements of Chl-a monitoring days (Table 2).From this total, we selected 16 images for model development and calibration, and 10 images for model verification; the remaining 24 images were discarded due to the presence of clouds, shadows on the measurement   Can chlorophyll-a in meso-oligotrophic shallow waters be estimated using statistical approaches and empirical models from MODIS imagery?sites or to the bad quality of the images from MODIS quality product control.

Multivariable models to retrieve Chl-a concentration
Approaches based on MLR (using exclusion-inclusion sequential models), ANN and NPMR were used to retrieve Chl-a concentrations, using the MOD09GA and MYD09GA reflectance data products.
An empirical model based on MLR was developed, using two search criteria.The first used a step-wise forward regression from the reflectance for each of the spectral bands from MODIS.This approach was carried out taking into account that the F value was low and the p-value was high.This method generates a MLR of the bands with the highest correlation.The second development criterion was based on a sequential automatic search of exclusion-inclusion using the Akaike Information Criterion -AIC (BOZDOGAN, 1987).Using this criterion, an R code was developed to fit a sub-model for each of the three datasets (North, Central and South points) and for the total dataset in Lake Mangueira.
An ANN was applied using a hidden layer (Figure 2).The reflectances for each of the spectral bands for three points (North, Central and South) and the total dataset were the input values and the Chl-a concentration was the output node.The training algorithm used was the back-propagation method with heuristic techniques acceleration (VOGL et al., 1988) performed in MatLab  .The Purelin function (DEMUTH; BEALE; HAGAN, 2008) was used as the activation function in the hidden and output layers.
NPMR using the software HyperNiche 1.0 was also used to select the best models describing the relationships between Chl-a and the reflectance values from the MODIS spectral bands (McCUNE, 2006;2004).Two matrices were used as input in NPMR: (1) a matrix of in-situ Chl-a measurements, and (2) a matrix of reflectance values from the MODIS spectral bands.Here, the local mean NPMR, uniform weights (Species Occurrence model -SpOcc) was used (PETERSON, 2000;McCUNE et al., 2003).This model uses a simple kernel function to estimate the response variable (Chl-a in our case) and gives equal weight to all sampling points within the window, while all observations outside the window are given zero weight.In addition, we assessed the performance of four models (three empirical models (Kahru, Appel and O14a) and one semi-empirical model (FAI), Table 3), that are commonly used to estimate Chl-a concentrations in aquatic environments (CURTARELLI et al., 2015;HUANG et al., 2014;HU et al., 2010;TARRANT;NEUER, 2009).

Model accuracy assessment
To evaluate the accuracy of the models, the coefficient of determination (R 2 ), Bias, Root-Mean-Square-Error (RMSE), and relative RMSE (RMSE%) were calculated between in-situ observed and model-estimated Chl-a concentrations.

Model development and calibration
The performance of the multiple linear regression, ANN, and NPMR models for the three points (North, Central and South) and the total dataset for Lake Mangueira showed good agreement for the Central and South datasets.In addition, the MLR model performed best for each dataset (Figure 3).
The MLR model assessment indicated that it performed best for the South point (R 2 = 0.83, Bias = 0.00, RMSE% = 10.6%), compared to the Central point (R 2 = 0.77, Bias = -0.31,RMSE% = 14.6%) and the North point (R 2 = 0.61, Bias = 0.00, RMSE% = 19.6%).Also, the MLR model indicated that the most important MODIS channels for the correlation with in-situ Chl-a values at North point were the spectral bands 2, 3 and 6, while that for the Central and South points, were the spectral bands 1, 4, 5, 6 and 1, 2, 5, 6 respectively.For the total dataset, spectral bands 1, 3, 5 and 6 were the most important MODIS channels.

Model validation and comparative assessment
The performance of the different models during the validation phase showed, in general, lower values of the accuracy metrics (Figure 5).However, the MLR and ANN models showed a smaller loss of performance (R 2 = 0.50, Bias = -1.36,RMSE% = 29.9% and R 2 = 0.46, Bias = -1.78,RMSE% = 24.6%)and maintained satisfactory accuracy compared to the NPMR model (e.g., Central point).
In all cases, the performance of our multivariable models using the total dataset was lower than with the specific point dataset.The same occurred in comparisons against four commonly used models, Appel, Kahru, FAI, and O14a (Figure 6).
We found that the MLR was reliable in terms of its estimation of Chl-a concentrations for the three points in the lake, although it was less accurate for the South point (R 2 = 0.37, Bias = -3.54,RMSE% = 34.40%).The ANN model performed best for the Central point (R 2 = 0.46, Bias = -1.78,RMSE% = 24.60%),but did not provide satisfactory results for the North and South points.The NPMR model had limitations in retrieving the Chl-a concentration for the North (N), Central (C) and South (S) points (R 2 ≤ 0.19, Bias ≤ 5.37, RMSE% ≤ 42.50%).The assessments with the four Can chlorophyll-a in meso-oligotrophic shallow waters be estimated using statistical approaches and empirical models from MODIS imagery?models (Appel, Kahru, FAI and O14a) showed lower accuracy than the MLR model (Figure 6).The Kahru model gave the best Chl-a retrieval (R 2 ≤ 0.37, Bias ≤ 9.09, RMSE% ≤ 36.4%),followed by the O14a model (R 2 ≤ 0.35, Bias ≤ 5.53, RMSE% ≤ 39.97%); the Appel model gave the poorest results (R 2 ≤ 0.26, Bias ≤ 9.07, RMSE% ≤ 35.04%).Taking into account the total dataset, the accuracy yielded by the MLR, ANN and NPMR models and the four models (Appel, Kahru, FAI and O14a) also showed low performance compared with the in-situ Chl-a values.

Performance of Chl-a algorithms
For the models proposed here, the MLR and ANN models performed better than the other models (R 2 and RMSE%, Figure 6), in agreement with the performance found previously for oligoto mesotrophic waters using MLR models (CURTARELLI et al., 2015;OGASHAWARA et al., 2014) and for eutrophic shallow waters using Back Propagation neural networks (BPs) and Radial Basis Function neural networks (RBFs) (WU et al., 2009).Nevertheless, the performance of models proposed and tested during the calibration period was better than for the validation period.This loss of performance during the validation models is a behavior expected, which can be explained by the size of samples and the difference in the datasets used, as has been observed in others studies (GIANCRISTOFARO; SALMASO, 2007;ORTH et al., 2015).
Although the NPMR model is generally stronger than the MLR model (McCUNE et al., 2003;YOST, 2008), our results indicated a significant difference in the performance of the two models.Also, the NPMR model showed lower performance, which was apparent during both the calibration (Figure 3) and the validation (Figure 5) periods.This result indicates that the application of NPMR models to retrieve the Chl-a concentration from MODIS imagery in Lake Mangueira would not be recommended based on the dataset available.For the four models assessed (Appel, Kahru, FAI and O14a), our findings obtained from the total datasets were consistent with those of El-Alem et al. (2012), who obtained satisfactory results for Chl-a > 50 µg.L −1 using the Kahru and FAI models In Lake Mangueira, the Chl-a concentration estimated using the total dataset indicated that the correlation was the lowest for every model proposed as shown in the calibration (Figure 3), validation (Figure 5) and the four models assessed (Figure 6).Therefore, it would not be appropriate to generalise a single model to retrieve the Chl-a concentration in Lake Mangueira from a total dataset, but rather would be necessary to subdivide the datasets into regions.This can be explained by meso-oligotrophic conditions and the spatial heterogeneity, where the Chl-a concentration decreases from Can chlorophyll-a in meso-oligotrophic shallow waters be estimated using statistical approaches and empirical models from MODIS imagery?north to south, as has been observed or taken into account in other studies (CROSSETTI et al., 2013;FRAGOSO JUNIOR et al., 2011;RODRIGUES, 2009).External forcing factors, such as the wind (both speed and direction) also can affect the temporal and spatial distribution of phytoplankton in large shallow lakes (CARDOSO et al., 2012;CARRICK;ALDRIDGE;SCHELSKE, 1993;KONG;FAO, 2005;WEBSTER;HUTCHINSON, 1994).
In these environments, wind conditions impose stress on the system, in which the biological responses (e.g., Chl-a) can be interconnected, in contrast to environments such as temperate lakes and oceans.In Lake Mangueira, a previous study (FRAGOSO JUNIOR et al., 2008) showed a trend of phytoplankton aggregation in the southwest and northeast areas, varying with the wind velocity and direction, and coinciding with the orientation of the lake.
In addition, another key influencing factor is the larger biomass of submerged macrophytes in the South area, which can inhibit phytoplankton production in this area (THEY et al., 2014) and increases the water transparency (FERREIRA, 2009).
Regarding the proposed models, our results corroborated the spatial heterogeneity in the Chl-a distribution in the Lake Mangueira, as observed previously in other studies (CROSSETTI et al., 2013;FRAGOSO JUNIOR et al., 2011;2008).
The use of models to retrieve Chl-a based on MLR from MODIS data is not new.However, our approach showed an improvement over previously published models (e.g., CHANG et al., 2012;OGASHAWARA et al., 2014;WU et al., 2009) due to the implementation of search criteria based on a sequential automatic search of exclusion-inclusion algorithms for selecting parameters.In addition, the estimation of Chl-a concentrations based on MLR presented advantages (e.g., few input variables and explicit calculation of the coefficients of selected independent variables) in comparison with the ANN and NPMR models, which is difficult to interpret the relationship between input and output variables ("blackbox" models) (WU et al., 2014).Besides the limitations from the MLR model related to the applicability and transferability between different sensors and water bodies, and the biological interpretation of principal components of the regression (GRAHAM, 2003), the MLR model is parsimonious, requiring only a small number of parameters compared to the other models.

Algorithm limitations for estimating Chl-a from MODIS imagery
Several studies have shown that algorithms for retrieving Chl-a from MODIS imagery may indeed be precluded due to the low signal-to-noise ratio and the noise-equivalent Chl-a concentration of MODIS (e.g., CARDER;CANNIZZARO;LEE, 2005;GOWER;BORSTAD, 2004).This is a limitation in meso-oligotrophic environments such as Lake Mangueira, which can clip the performance of algorithms based on MODIS imagery, due to low Chl-a contents, and the adjacency effects that can be caused by sun glints, as founded in similar environments (CHAVULA et al., 2009;MATTHEWS;ODERMATT, 2015;WATANABE et al., 2015).Furthermore, at the South point of Lake Mangueira, located in the shallower area of the lake, water transparency is high (Table 1) and the light reflected off the bottom also affects the accuracy of the empirical algorithms assessed.Another limitation associated with the use of satellite remote-sensing techniques is the detection of Chl-a in surface waters only.However, this limitation is not applicable to polymitic shallow-water systems such as Lake Mangueira, where the Chl-a concentrations are relatively uniform throughout the water column except for short time periods (hours, see FRAGOSO JUNIOR et al., 2011).

CONCLUSION
This study evaluated the use of several models to retrieve Chl-a concentrations from MODIS Terra and Aqua reflectance data in a large shallow subtropical lake.Our findings suggested the MLR model as the better alternative compared to ANN and NPMR.In addition, the simple MLR model proposed in this study showed two main advantages: better performance and fewer input variables compared with others empirical and semi-empirical models that are commonly used (Appel, Kahru, FAI and O14a).Finally, although the purpose of this study was not to replace standard monitoring methods, the MLR model using MODIS reflectance data can provide a dynamic, broader-scale view of Chl-a concentrations.Further investigation could be focused on MLR model application to analyze in long-term scale, the temporal and spatial distribution of Chl-a concentration in Lake Mangueira, in order to understand the seasonal dynamic and the inter-annual variations.The overall methodology proposed here can be used as a tool to help reduce the effort and costs associated with conventional limnological monitoring of shallow waters, and as a data source to support distributed ecological mathematical modelling of these ecosystems.

Figure 2 .
Schematic diagram of an artificial neural-network model.

Figure 3 .
Figure 3. Chl-a concentrations predicted by MLR, ANN and NPMR models versus in-situ chlorophyll-a measurements at three points (North, Central and South) and the total dataset in Lake Mangueira during calibration.The dotted lines represent the best fit.

Figure 4 .
Figure 4. Plots of the Appel, Kahru, FAI and O14a models values versus in-situ chlorophyll-a measurements at three points (North, Central and South) and the total dataset in Lake Mangueira during calibration.The dotted lines represent the best fit.

Figure 5 .
Figure 5. Chl-a concentrations predicted by MLR, ANN and NPMR models versus in-situ chlorophyll-a measurements at three points (North, Central and South) and the total dataset in Lake Mangueira during validation.The solid lines represent the 1:1 (one-to-one) relationship, and the dotted lines represent the best fit.

Figure 6 .
Figure 6.Plots of the Appel, Kahru, FAI and O14a models values versus in-situ chlorophyll-a measurements at three points (North, Central and South) and the total dataset in Lake Mangueira during validation.The dotted lines represent the best fit.

Table 2 .
Number of available MODIS images for calibration/validation periods for the North (N), Central (C), and South (S) points in Lake Mangueira.