Assessment of TerraClass and MapBiomas data on legend and map agreement for the Brazilian Amazon biome

Reliable environmental monitoring and evaluation require high-quality maps of land use and land cover. For the Amazon biome, the TerraClass and MapBiomas projects apply different methodologies to create these maps. We evaluated the agreement between land cover and land use maps generated by TerraClass and MapBiomas (Collections 2 and 3) for the Brazilian Amazon biome, from 2004 to 2014. Specifically, we: (1) described both project legends based on the LCCS (Land Cover Classification System); (2) analyzed the differences between their classes; and (3) compared the mapping differences among the Brazilian states that are totally or partially covered by the Amazon biome. We compared the classifications with a per-pixel approach and performed an evaluation based on agreement matrices. The overall agreement between the projects was 87.4% (TerraClass x MapBiomas 2) and 92.0% (TerraClass x MapBiomas 3). We analyzed methodological differences to explain the disagreements in class identification. We conclude that using these maps together without a properly adapted legend is not recommended for the analysis of land use and land cover change. Depending on the application, one mapping system may be more suitable than the other.


INTRODUCTION
Monitoring Land Use and Land Cover Change (LULCC) and its dynamics is essential for a variety of scientific purposes and political strategies, which consequently leads to more efficient management of land resources. Reliable information about LULCC improves the evaluation of phenomena such as urban expansion, flooding, drought and desertification (Verburg et al. 2011;Bontemps et al. 2012). This is specially important in areas of tropical rainforests, and the Amazon forest is a relevant example. The Amazon plays a key role in worldwide environmental processes, such as the carbon cycle, and can influence climate change (Phillips et al. 2009).
The deforestation dynamic in the Brazilian Amazon has been monitored with remote sensing images since VOL. 50(2) 2020: 170 -182 ACTA AMAZONICA 1988 through the PRODES project (Brazilian Amazon Deforestation Monitoring Program -INPE 2019). By 2015, an area of 76,990,300 hectares of the Amazon had been deforested, which amounts to 19.2% of the total Amazon forest area (INPE 2016). The TerraClass project was created in 2010 to identify and quantify the main land use and land cover classes linked to deforestation and to map the LULC in the Amazon deforested areas (Almeida et al. 2016). More recently, since 2015, MapBiomas (Brazilian Annual Land Use and Land Cover Mapping Project) has also produced LULC maps for all Brazilian biomes (MAPBIOMAS 2017).
Maps from both projects have been widely applied to land use and land cover modeling and climate change research (e.g. Rufin et al. 2015;Crouzeilles et al. 2017;Müller-Hansen et al. 2017;Tyukavina et al. 2017). They also can be used to support the development of governmental projects and other initiatives (Brito 2017). However, each mapping system generates different areas and patterns for land use and land cover classes, and thus different trajectories. Consequently, distinct dynamics and impacts will be indicated by different products based on these maps, which are used for public policy planning, such as the estimation of ecosystem services (Kangas et al. 2018), soil quality assessment (Efthimiou and Psomiadis 2018), modeling of LULCC scenarios (Dalla-Nora et al. 2014) and initiatives related to REDD+ (Buurman et al. 2015).
Although the maps resulting from MapBiomas and TerraClass projects produce valuable LULCC information, they have been created using different algorithms and methodologies, hence, it is important to compare and evaluate their data regarding their usefulness, potential and limitations for different goals. In this context, we provide a detailed evaluation of the agreement between the TerraClass and MapBiomas classifications. Specifically, we (1) describe both project legends based on the LCCS (Land Cover Classification System); (2) analyze the differences between their classes; and (3) compare the mapping differences among the Brazilian states that are totally or partially covered by the Amazon biome.

Study area and products analyzed
The TerraClass project covers the Legal Amazon, an area of 510 million ha, comprising the nine Brazilian states covered wholly or in part by the Amazon biome: Acre, Amapá, Amazonas, Pará, Rondônia, Roraima, Mato Grosso, Tocantins and the western part of Maranhão. The MapBiomas project covers the Amazon biome, which excludes parts of Mato Grosso, Tocantins and Maranhão which are covered by other biomes. Figure 1 gives an overview of the differences between both areas covered. The Amazon biome was chosen as the study area for our analysis, as it is fully represented in both projects. Thus, all classes were analyzed solely for the Amazon biome.
TerraClass was created in 2010 by the Brazilian National Institute for Space Research (INPE) and the Brazilian Agriculture Research Corporation (Embrapa). It is a complement to PRODES and adds information about the past LULC spatial distribution and regional statistics in deforested areas. Currently, TerraClass products are available for 2004,  Table 1. NFNV is the Non-forest natural vegetation class. This figure is in color in the electronic version. 2008, 2010, 2012 and 2014. TerraClass uses Landsat-like images (30 m spatial resolution) to identify the following classes: Urban area, Mining, Mosaic of uses, Annual crops, Herbaceous pasture, Shrubby pasture, Pasture with exposed soil, Regeneration with pasture, Secondary vegetation, Annual deforestation, Others and Non-observed area (Almeida et al. 2016). Forest, Hydrography, and Non-forest classes are also included in TerraClass products, but they are taken directly from the PRODES Project classification.

AMAZONICA
The TerraClass methodology includes image processing techniques, such as segmentation and Linear Spectral Mixture Model (Shimabukuro et al. 1998), to assist in map classification. However, most of its classification is conducted through visual interpretation. Only for the Annual crop class an automated classification is used to identify the targets based on the Normalized Difference Vegetation Index (NDVI) time series from MODIS images (De Faria et al. 2005;Rudorff et al. 2011;Arvor et al. 2013).
In 2015, MapBiomas was created through an initiative of the Greenhouse Gas Emissions Estimation System (SEEG) of the Climate Observatory (http://www.observatoriodoclima. eco.br/) as a fast and low-cost methodology to classify land use and land cover over the last decades. It is produced by a network of NGOs, universities and technology companies, each responsible for certain biomes or specific themes (Agriculture, Pasture and Coastal zone). They produce annual maps from 1985 onwards for all Brazilian biomes (MapBiomas 2017).
The classification data from MapBiomas is available by year. Four data collections, which were generated during different phases of the project, have already been released. Collection 1 had a simplified legend and produced maps for 2008 to 2015, while Collection 2 had an improved legend and methodology and extended the mapping period from 2000 to 2016. Collection 3 covered the period from 1985 to 2017, and Collection 4 included 2018. We used Collections 2 and 3 of the classification data for the Amazon biome. Herein, "MapBiomas 2" stands for MapBiomas Collection 2, "MapBiomas 3" for MapBiomas Collection 3, and "MapBiomas" for the joint use of Collections 2 and 3.
The MapBiomas methodology is fully automated and integrated with the Google Earth Engine. It also uses Landsat images, and its methodology involves the construction of a spectral library for performing Spectral Mixture Analysis (SMA). The fraction images resulting from the SMA are employed to calculate the Normalized Differencing Fraction Image (NDFI) (Souza Jr. et al. 2005). The SMA and NDFI features were used to build an empirical decision tree classification for MapBiomas 2. In MapBiomas 3, they were used in a Random Forest classifier (IMAZON 2017). For the Amazon biome, the classes identified in MapBiomas are presented in Table 1.

Reclassification and product comparison
We used land cover maps from TerraClass and MapBiomas for 2004, 2008, 2010, 2012, all at a 30 m spatial resolution. All classification maps are available at: http://www. inpe.br/cra/projetos_pesquisas/dados_terraclass.php for the TerraClass data, and at https://mapbiomas.org/download for the MapBiomas data. They were downloaded in raster format and referenced in WGS84 (EPSG: 4326). Each product has its own classification legend, so it was necessary to reclassify them to reconcile the legends (Table 1). Equivalent classes were identified, and some classes had to be grouped into a All adopted classes (Table 1) were described using the Land Cover Classification System (LCCS) to standardize class descriptions. Thus, data produced through different methods could be used and compared, regardless of scale, level of detail, or geographical location. This system uses a set of rules based on the physiognomy and the stratification of the biotic and abiotic elements (FAO 2016). Each land cover class is composed of one or more horizontal macro patterns (arrangements of landscape elements). There are also vertical patterns, or strata, that characterize the horizontal patterns of the landscape. The strata are described by element properties, such as leaf phenology and tree height. The horizontal pattern must have a mandatory stratum and may have other optional ones.
The spatial overlaps considering the entire study area and a per-pixel comparison between the reclassified maps were performed for each year, using a Boolean approach (Herold et al. 2008;Tchuenté et al. 2011). From this spatial analysis, maps were generated showing the areas of agreement among the adopted classes for two cases: TerraClass versus MapBiomas 2, and TerraClass versus MapBiomas 3. The areas that were classified as Non-observed in either of the maps were considered as disagreement, and so were the minority classes cited above.
Both whole maps were cross-tabulated, resulting in agreement matrices, which were calculated in the same way as a standard confusion matrix, but first considering each of the MapBiomas maps as reference datasets and then considering the TerraClass map as the reference dataset. Pixels classified as the same class (main diagonal) were considered as agreement and the remaining were considered as disagreement. From the cross tabulations and from the metrics, we analyzed the differences in the products among the adopted classes in each Amazon biome state. All the processing steps were performed in R language (Ihaka and Gentleman 1996).
From the agreement matrix, it was possible to generate the metrics of Overall Agreement (OA), TerraClass Agreement (TCA) and MapBiomas Agreement (MBA). OA was the percentage of total observed pixels that agreed. TCA was the percentage of pixels that agreed with MapBiomas as reference. This metric was used to evaluate the TerraClass mapping. MBA was the percentage of pixels that agreed with TerraClass as reference. This metric was used to evaluate the MapBiomas mapping. Both TCA and MBA were calculated for each class.

RESULTS
The reconciled legend reclassification (Table 1) was described in terms of LCCS (Table 2). Forest had only one horizontal pattern, with a maximum of four vertical patterns. Trees were mandatory in this class, and one of the strata could be composed of water bodies, representing the Flooded forest class of MapBiomas (Table 2a). The Non-forest natural vegetation (NFNV) represented mostly vegetation patches typical of other biomes (such as Cerrado, the Brazilian savanna) that persist within the Amazon biome (Table 2g).
The Agriculture class had only one stratum, which was composed of graminoids, forbs or bare soil. Every element in this pattern was dependent of the temporal sequence of crop phenological cycles (Table 2h). The class Others had only one stratum, which was composed of loose and shifting sands (Table  2i). It represented the Beaches and dunes class of MapBiomas and the Others class of TerraClass (land cover categories such as river beaches and sandbars) (Coutinho et al. 2013).
After the reclassification of the maps, we converted the number of pixels of each adopted class into area units (km² 10 -³) for the five years ( Figure 2). Forest is not represented in Figure 2 because its area was much larger than that of the other classes. Pasture was the main land use in the Amazon in all years according to both systems, but the mapped area was usually larger in TerraClass. Water bodies, Urban areas and Agriculture had similar areas in both projects, but agreement was not high.
TerraClass began to classify Planted forests in 2010, so this class did not appear for this project in 2004 and 2008. In MapBiomas, a time series of images is analyzed to perform a classification with few Non-observed areas, caused by clouds and cloud shadows (IMAZON 2017), while TerraClass uses images with the least cloud cover from the dry season. Compared with TerraClass, MapBiomas usually had fewer Non-observed areas, specially MapBiomas 3, which has considerably smaller Nonobserved areas than MapBiomas 2 and TerraClass.
The overall agreement between the projects was 87.4% (TerraClass x MapBiomas 2) and 92.0% (TerraClass x MapBiomas 3). Forest had the highest agreement and was always close to or higher than 90%. A small percentage of the Forest class in MapBiomas 2 was classified as NFNV (4.1%), Secondary vegetation (3.1%) and Pasture (2.8%) by TerraClass. Considering the Forest class in MapBiomas 3, 4.2% and 2.2% of the area was classified as NFNV and Pasture by TerraClass, respectively. Planted forest in MapBiomas 2 had 0% agreement with TerraClass, but in MapBiomas 3, agreement was higher at 6.9% and 71.4%, respectively (Tables 3 and 4).
The exclusion of Forest modified the overall agreement from 87.4% to 81.6% (MapBiomas 2 x TerraClass) and from 92.0% to 92.3% (MapBiomas 3 x TerraClass). This meant that, despite the Forest high agreement, this class was also a source of confusion for other classes. For example, 80.3% of the TerraClass Secondary vegetation was classified as Forest Table 2. Adopted classes following the reconciled legend reclassification (see Table  1  by MapBiomas 2 (Table 3). Therefore, the low agreement observed for Secondary vegetation (3.6% and 26.5%, respectively) (Tables 3 and 4) could have been caused by the methodological differences between the projects. Whenever a Forest area is converted into deforestation, it enters into a mask of deforestation in the PRODES mapping, which is used as the basis for TerraClass mapping. Even if the area regrows, it is classified as Secondary vegetation and cannot be considered as Forest again.
Regarding the analysis of minority classes, Agriculture or pasture (Figure 3c) can be considered as a mixed class, but in MapBiomas 2 most corresponded to Pasture, NFNV and Forest; in MapBiomas 3 mostly to Forest, Pasture, NFNV and Secondary vegetation. The identification of targets of the Mosaic of uses class in TerraClass was prone to generate classification errors due to the spatial resolution of the images. According to MapBiomas 2, these areas were composed mainly of Forest (61.9%), Pasture (21.1%) and Agriculture or pasture (11.4%) (Figure 3a). In MapBiomas 3 ( Figure  3b), the Mosaic of uses was also mainly composed of Forest (66.7%), Pasture (25.2%) and Agriculture or pasture (4.2%). Specially in MapBiomas 3, which did not have a Secondary Agriculture or pasture (9.5%).
The Mining class exists in the MapBiomas 2 legend, although it is not present in the maps of the Amazon biome, so TerraClass Mining areas were classified by MapBiomas mainly as Pasture, Forest and NFNV (Figure 3a). The Annual deforestation in TerraClass indicates recently deforested areas with no defined land use at this stage (Almeida et al. 2016). In MapBiomas 2, these areas were classified as Forest In MapBiomas 3, they were also classified as Forest (46.9%), Pasture (28.5%) and Agriculture or pasture (20.8%).
Although the highest values of OA occurred in the states with larger Forest areas (Amazonas, Acre and Pará) (Figure  4), the exclusion of Forest in the calculation of OA had no large impact in most states, which confirmed that Forest was also a source of confusion for other classes. In Maranhão and Tocantins, OA increased by more than 10% because these states had smaller properties, which may have led to confusion even in the visual interpretation. Therefore, excluding one class (Forest) reduced the confusion among the other classes, such as Secondary vegetation and NFNV.
Regarding the spatialization of the agreement areas, there was a concentration of Non-observed areas in the northern portion of the biome, near the Equator, where there is a higher cloud concentration ( Figure 5). Large consolidated areas of   disagreement occurred close to the Amazonas River channel in eastern Amapá and southwestern Roraima, most of which corresponded to TerraClass NFNV that were mapped into other classes by MapBiomas. In Maranhão, Tocantins and the northeastern Pará, there was a high concentration of small polygons, and the disagreement between the two projects was also very visible. Close to the Amazonas River channel and in northeastern Pará, there were also large areas of Secondary vegetation that showed disagreement between the projects.

DISCUSSION
The purpose of this study was not to determine which project performed the highest-quality mapping. The assessment of one product with the other as a reference was conducted to evaluate the correspondence between the projects and, in doing so, to analyse whether the LULC classes had the same meanings in TerraClass and in MapBiomas. The per-pixel assignment of Pasture as the largest area (when not considering Forest) in comparison to other classes agrees with other studies (IBGE 2006;INPE 2014). The area assigned to Pasture cannot be directly associated with the presence of livestock, as pasture areas in the Amazon are related not only to cattle ranching, but also to land speculation (Mertens et al. 2002). The use of a time series is helpful in the identification of agricultural areas due to the seasonal patterns of the target areas (Esquerdo et al. 2011). TerraClass uses the MODIS NDVI time series for identification of agricultural areas (Almeida et al. 2016), while MapBiomas (MAPBIOMAS 2017) uses Landsat images from four different dates (start and end of the growing and off seasons) to calculate the maximum and minimum Enhanced Vegetation Index 2 (EVI2) for the Crop Enhanced Index (CEI) (Rizzi et al. 2009) and to obtain a classified image of the agricultural areas. Likely due to these differential approaches, the agreement for Agriculture was only near 50% between TerraClass and MapBiomas 2, and below 80% between TerraClass and MapBiomas 3.
TerraClass is based on the PRODES deforestation mapping, which considers only the deforestation of primary forest and classifies all regeneration areas as Secondary vegetation. This restriction does not exist in MapBiomas, so that, as a result, TerraClass mapped many more Secondary vegetation areas than did MapBiomas 2. There was no Secondary vegetation class for MapBiomas 3. On the other hand, the Forest class in MapBiomas is analyzed separately each year, allowing the estimation of forest areas for a longer period and the analysis of LULCC dynamics for a larger area (all Brazilian biomes). However, MapBiomas 2 maps included regeneration areas in the Forest class and lacked a clear definition of when Secondary vegetation turns into Forest. This method could generate more uncertainty in estimations of the amount of carbon emitted by deforestation over time.
The TerraClass mapping system generates consolidated regions (polygons) because some of its classes are mapped by visual interpretation, while MapBiomas has a fully automated per-pixel classification. Thus, in the polygons classified by TerraClass, MapBiomas identifies pixels of other classes, such as Agriculture or pasture in areas of NFNV or Forest in Urban areas ( Figure 6). The much larger TerraClass NFNV in comparison to MapBiomas may be due to that the MapBiomas Forest class was more inclusive than the TerraClass Forest class. This does not mean that the MapBiomas mapping is more detailed than TerraClass, once the projects were created with different purposes. TerraClass, by being a complement to PRODES and a governmental project, focuses on discriminating the LULC in the Amazon deforested areas and does not map NFNV, which is taken from PRODES. The classification of NFNV is used as a mask and does not change over the years. It is important to note that TerraClass does not intend to map different kinds of land uses within the NFNV areas. On the other hand, MapBiomas was conceived to provide a historical series of LULC maps for the entire Brazilian territory. Thus, NFNV is mapped annually in this project. Nevertheless, we kept the comparison between the NFNV from TerraClass and from MapBiomas in order to make available some information regarding the differences in LULC information provided by the projects. If we calculate OA for all classes excluding NFNV (as we previously did with Forest), the OA would reach 91.8% (MapBiomas 2) or 96.1% (MapBiomas 3) for the five study years. The comparison with the OA considering all classes of 87.4% (MapBiomas 2) or 92.0% (MapBiomas 3) confirms that NFNV can be an important source of disagreement between both projects.
Among several possible reasons for the variation in class agreement, we highlight that each Brazilian state of the Amazon biome has a particular spatial configuration and land use composition and diversity, depending on political and economic factors, among others. In general, the classes in Pará state showed a similar behavior to that already shown by Neves et al. (2017) for 2014. Mato Grosso is one of the main producers of large-scale crops in Brazil (IBGE 2006). These areas are visible as larger and more homogeneous polygons with geometrical shapes, which are easily identifiable on Landsat images. For this reason, Mato Grosso was the only state that achieved agreements close to 60% and 70% for the Agriculture class.
Different interpretations of LULCC between the mapping systems can have significant implications for environmental policies both in disagreement and agreement scenarios. For example, an area of intense mining near an important highway (BR-163), in Pará state (Figure 7a), was classified as Mining by TerraClass, but mainly as Pasture or NFNV by the MapBiomas automated methodology. This kind of misclassification could generate environmental damage if the data were used in public policies to combat illegal mining in the Amazon region. In another example (Figure 7b), there is predominant agreement between Forest classifications, however, the MapBiomas Forest classification is more inclusive than the TerraClass Forest classification. Almost all areas of regeneration were included in the Forest class by MapBiomas, but were classified as Secondary vegetation by TerraClass. The inclusion of regeneration areas into Forest by MapBiomas makes it more difficult to discern primary from secondary regrowth forest, and thus to estimate the quantity of carbon that the Amazon forest has already lost or still has in stock. This estimation is important for the development of REDD+ initiatives, a global instrument to encourage developing countries to take action to reduce the emissions from deforestation and forest degradation (Gallo and Albrecht 2019). Within this framework, the proper monitoring of secondary vegetation is important for carbon stock assessment in the Amazon forest.
Among other important initiatives that subsidize environmental policies in the Brazilian Amazon, there is the Soy Moratorium (SoyM), an agreement by soybean traders not to buy soy cultivated on deforested lands after July 2006 (Gibbs et al. 2015), and the Rural Environmental Registry (CAR, in Portuguese), which requires every Brazilian rural private property to provide georeferenced information on its extent, boundaries and land use to a governmental database. The SoyM is an example of the direct relationship of the LULCC dynamics with the commodities market. Interventions in the supply chains of soy and other commodities, such as cattle and timber, can be directly linked to the reduction in deforestation (Gibbs et al. 2015). To achieve this goal, SoyM, CAR and other schemes to monitor commodities require accurate and detailed mapping. For these purposes, the TerraClass semiautomated methodology, which checks each polygon visually, likely still produces the more accurate maps.
The complete automation of mapping, as performed in MapBiomas, is a positive feature that allows the cost-and time-effective coverage of all Brazilian biomes, as the system can run entirely on the Google Earth Engine free of charge, requiring less human resources than TerraClass. A possible downturn of the MapBiomas methodology is the dependency on a non-Brazilian platform, which can potentially lead to sovereignty issues to Brazil regarding the production of its own land-use and land-cover monitoring data.

CONCLUSIONS
The use of TerraClass or MapBiomas can result in different interpretations regarding land-use and land-cover mapping. Our study provides an analytical background to support users of both databases to decide which mapping strategy is better suited for their applications. The TerraClass methodology has several visual stages and produces data every two years, while MapBiomas uses a fully automated process and still produces some data inconsistencies, such as noisy pixels in already-consolidated areas, as expected in an automated per-pixel classification. Despite the high overall agreement, the methodological differences between projects resulted in significant disagreements. The complementary use of TerraClass and MapBiomas maps without a proper adaptation of legends is not recommended for LULCC analysis. The LCCS provides some criteria for reconciling the project legends, but introduces uncertainties in the data, as some classes are not represented in the same manner in both mapping systems. MapBiomas has a higher temporal resolution (yearly) and its legend allows some subdivisions of the Forest class. TerraClass is precise in the delimitation of minor classes in terms of area, such as Urban and Mining areas. Additionally, the use of a deforestation mask in TerraClass allows the separate mapping of primary Forest and regeneration or Secondary vegetation areas.