Evaluation of South Atlantic Thermohaline Properties from BESM-OA2.5 and Three Additional Global Climate Models Ocean and Coastal Research

Important global oceanic processes, such as the meridional overturning circulation, are governed by the temperature and salinity of the ocean. As such, it is essential that these properties be correctly represented in high-quality global climate models. This study aims to evaluate thermohaline properties both historically and under two simulations of the Brazilian Earth System Model BESM-OA2.5 in the South Atlantic Ocean (Representation Concentration Pathway (RCP) 4.5 and 8.5). Since error assessment in the global climate model (GCM) is fundamental to infer climate change projections, comparisons were made for thermohaline properties among four GCMs (HadGEM2-ES, MIROC-ESM-CHEM, MIROC5, and BESM-OA2.5) against data from ocean monitoring programs and from ORAS5-ECMWF. The results show common surface spatial pattern errors in all models, commonly related to mesoscale processes. Specific to BESM-OA2.5 over the Southern Ocean, we observed an increase in the temperature bias during autumn and summer, probably due to subsurface temperature overestimation linked to North Antarctic Deep Water (NADW) formation. With respect to salinity, the underestimations in the Subtropical/Subantarctic Zones and in the north of the South Atlantic subtropical gyre were linked to simulation errors in the Malvinas current. All models presented overestimated annual historical temperature rates, with BESM-OA2.5 being the closest to ORAS5. In the subsurface, the BESM-OA2.5 did not easily simulate the South Atlantic Central Water (SACW) formation, though in deep water, the model was able to better simulate the Antarctic Intermediate Water and NADW patterns. Statistically, the multi-model means performed better, while the BESM-OA2.5 performed worst among the models in both methodologies applied. In terms of projected scenarios, the models demonstrated sensitivity to variations in greenhouse gas emissions between the RCPs, with higher magnitude warming predicted in the equatorial zone, except for BESM-OA2.5.


INTRODUCTION
Evaluating and understanding the South Atlantic Ocean (SAO) thermohaline properties in climate models requires high quality simulations of important global physical processes. The heat transport in this region directly impacts the meridional overturning circulation and formation of intermediate and deep waters. In addition, SAO connects the Atlantic Ocean and the rest of the oceans through the Southern Ocean (Talley et al., 2011). Broggio et al. Global Climate Models (GCM) are the primary tools to project climate scenarios. Understanding their reliability is essential because they are a significant resource for planning action with respect to adaptation for climate change. Individual and collective evaluations of GCMs are crucial to better understand their strengths and weaknesses, during which preterit model outputs must be compared with observed data in which errors and uncertainties require quantification (Flato et al., 2013).
Reports by the Intergovernmental Panel on Climate Change (IPCC) Assessment rely heavily on the results of experiments by the Coupled Model Intercomparison Project (CMIP). CMIP5 historical simulations span the modern industrial period . The historical simulations are forced by the observed natural and anthropogenic atmospheric changes and consider the evolution of the land cover (first time this is considered in a CMIP experiment). The models are integrated for scenarios between 2005-2100. These consider different future radiative forcing, aerosol availability and concentrations, land use, and land cover, and can be classified by the Representative Concentration Pathway (RCP). The used RCP scenarios in this research corresponds to an approximate radiative forcing of 4.5 and 8.5 W/m 2 by the end of the 21 st century (Moss et al., 2010;Taylor, Stouffer and Meehl, 2012).
The CMIP5 models are robust tools that are still being explored. On a SAO scale, Sallée et al. (2013a) assessed the present-day and future projected changes in the Austral Ocean mixed layer of CMIP5 models. The results of their simulations represented the shallow, mixed layer, and shifted equatorward compared to observations. Sallée et al. (2013b) investigated the ability of CMIP5 models to represent the hydrological properties of the Southern Ocean and its overturning. The models show consistent warmth and a slight bias of the entire water column, the greatest of which occurs in the ventilated layers. The intensity of water mass overturning is consistent between the models and shows a slight positive bias compared to the observed data at shallow to intermediate depths. Beadling et al. (2019) evaluated the representation of the Antarctic Circumpolar Current (ACC) in CMIP5 models and analyzed the impact of governing mechanisms, showing that eight of the thirty-one models simulate the ACC effectively due to adequate representation of wind stress forcing and the difference in meridional density across the current.
The BESM-OA2.5 did not participate in CMIP5. However, several studies in recent years have documented its performance. Farias et al. (2013) measured the model's variability of air-sea CO 2 fluxes in the South Atlantic between 1996 and 2016 and found positive CO 2 fluxes (sea-to-air) dominating over the tropical region and negative fluxes (air-tosea) in mid-latitude regions. Chou et al. (2014) evaluated the Eta regional climate model using the BESM-OA2.3, HadGEM2-ES, and MIROC global models as large-scale forcings. Giarolla et al. (2015) explored the tropical South Atlantic Ocean dynamic given by the BESM-OA2.5 model in comparison with coupled global models, in which the BESM-OA2.5 accurately represented the equatorial sea surface temperature variations and the slope of the equatorial thermocline (east-west). Casagrande et al. (2016) assessed the effectiveness of the BESM-OA2.5 model compared to CMIP5 models to reproduce Arctic sea-ice variability between 1980 and 2012 and in future scenarios between 2006 and 2100. For these, all models simulated a decrease in the sea-ice extent in response to an increase in radiative forcing.
The focus of this study is to evaluate the BESM-OA2.5 results and comparing them with other CMIP5 GCMs. We present a comparison between the European Centre for Medium-Range Weather Forecasts Ocean ReAnalysis System 5 product (ORAS5-ECMWF, reference data), the historical, and two RCP experiments (RCP 4.5 e 8.5) considering four GCMs: BESM-OA2.5, HadGEM2-ES, MIROC-ESM-CHEM and MIROC5. We evaluate the results for sea surface temperature (SST), potential temperature (θ), sea surface salinity (SSS) and subsurface salinity (SO) fields in the South Atlantic Ocean (SAO). We performed additional analyses between GCM outputs (with different RCPs), data from ocean monitoring programs described by NOAA/AOML (2004), Macdonald and Baringer (2011), and Barbero and Wanninkhof (2014), and satellite data described by NASA/OBPG (2014a, 2014b, 2014c. In addition to performing a statistical evaluation, this study seeks to identify patterns of possible errors atypical to other analyzed models and emphasizes the differences presented by the BESM-OA2.5 model. Broggio et al.

MeTHODs sTUDy aRea
The study area covers the tropical and SAO and is limited to between 60°S to 25°N and 70°W to 20°E, with an emphasis on the western region ( Figure 1). In this region, the most prominent feature is the winddriven anticyclonic South Atlantic subtropical gyre (SASG). In the SAO, the meridional heat transfer occurs from south to north, and makes a large contribution to the global thermohaline circulation by the Atlantic Meridional Overturning Circulation (Garzoli and Matano, 2011;Marcello, Wainer and Rodrigues, 2018;Pezzi, de Souza and Quadro, 2016;Talley et al., 2011;Stramma and England, 1999).
The South Equatorial Current flows westward along the northern edge of the SAO. On the western boundary, the Brazil Current flows southward along the South American coast. On the southwestern edge, the South Atlantic Current flows eastward along the southern boundary, while on the eastern side, the Benguela Current flows northward/northwestward, forming the Benguela upwelling system (BUS) (Peterson and Stramma, 1991;Talley et al., 2011;Goes et al., 2019). Pacific water masses enter the SAO sector through the Drake Passage, partly branching northward as the Malvinas/Falkland Current, while the ACC flows eastward, farther south of the SASG and with the Sub-Antarctic Front (SAF) as its northern frontier. The interaction between the Malvinas Current and the Brazil Current occurs between 36°S 4 Broggio et al. and 38°S, the so-called Brazil-Malvinas confluence zone (BMC) (Talley et al., 2011). This is an important region with the capacity to modulate winds in the Marine Atmospheric Boundary Layer through its strong SST gradients, as shown primarily by Tokinaga, Tanimoto and Xie (2005) and corroborated by Pezzi et al. (2005). The Agulhas retroflection is another mechanism of water mass entry into the SAO, where warm waters enter from the Indian Ocean and help promote a distinct meridional heat transfer (Gordon, 1986). The SAO also receives an enormous amount of freshwater from the discharge of the Amazon river, in addition to from the La Plata river (the second largest in South America) and from the African Congo, Niger, Ogooué, and Sanaga rivers in the Gulf of Guinea (GG) (Berger et al., 2014;Piola et al., 2005;Tzoetzi et al., 2013). In general, the water masses in depths up to 3000 m found in the western boundary region are: the Tropical Water (TW) (θ > 20 °C and SO > 36), the South Atlantic Central Water (SACW) (6° < θ < 20°C and 34.6 < SO < 36), the Antarctic Intermediate Water (AAIW) (3 °C < θ < 6 °C; 34.2 < SO < 34.6), and the North Atlantic Deep Water (NADW) (3 °C < θ < 4 °C and 34.6 < S < 35) (Möller, 2008;Pereira et al., 2014, Piola andMatano, 2017;Rossi-wongtschowski and Madureira, 2006;Silveira et al., 2000).

ObseRvaTIONal DaTaseTs
We used surface and subsurface variables (sea surface temperature (SST), potential temperature (θ), sea surface salinity (SSS) and subsurface salinity (SO)) in different time and space scales. Only the ORAS5 and MOVAR (shown in supplementary materials) were used directly as reference data for the GCMs performance evaluation. The other products (GO-SHIP, MODIS, and SACD) were used to assess the performance of the ORAS5, and participated indirectly in the primary evaluation. We conducted the experiment in this manner because the time series of these products are not concise, as they have a short duration and temporal gaps; a reanalysis avoids such issues.
The primary reference dataset used in this study is the European Centre for Medium-Range Weather Forecasts (ECMWF) Ocean ReAnalysis System 5 (ORAS5), a reanalysis developed by the ECMWF fully described by Zuo et al. (2019). ORAS5 has a spatial horizontal resolution grid of 0.25° latitude/longitude and 75 vertical levels. Here, we used the monthly temperature and salinity at surface and subsurface between 1979-2017. We used the ORAS5 due to satisfactory spatial resolution on a global scale; its output includes all physical variables used in this study, the time series is long and concise, and the product was among the most available at inception of this study (mid-2018).
GO-SHIP was established by the International Ocean Carbon Coordination Program (IOCCP) and Climate and Ocean: Variability, Predictability, and Change (CLIVAR) as a sustainable program for hydrographic observations on a global scale that enable the evaluation of the ocean's sequestration of heat and carbon, changes in ocean circulation and ventilation patterns, and their effects on the Earth's climate. This program comprises basin wide surveys along hydrographic sections that constitute a global system for observing the ocean and climate. GO-SHIP combines data on physical hydrography, carbon cycle, marine biogeochemistry, nutrients cycles, and ecosystems. The measurements are conducted approximately decennially, and their protocols require high-quality datasets from ocean and coastal regions, providing a quality product for model initialization, calibration, and validation (Hood et al., 2010;Sloyan et al., 2019). Figure 1 shows the GO-SHIP transects of oceanographic cruises whose data are used in the evaluation process. The A10 GO-SHIP is a zonal cruise that crosses the SAO and extends between ~50°W and ~15°E with a mean latitude of 30°S. Its sampling regions are in the SASG center and at the BUS. The A16S GO-SHIP is a near meridional cruise that samples from the equatorial zone, crossing the SASG center toward the ACC (~0 to 60°S).
The Moderate Resolution Imaging Spectroradiometer (MODIS) is a sensor used in the EOS-AM1 and PM1 missions (AQUA and TERRA satellites) (King, Herring and Diner, 1995), which observe the entire Earth's surface every two days using 36 spectral bands at horizontal resolutions of 250, 500 and 1000 m, depending on the waveband (Barnes, Pagano and Salomonson, 1998). We used the monthly 9 km-resolution sea surface temperature from 2003 to 2017 provided by both satellites.
The Satélite de Applicaciones Científicas (SAC)-D Aquarius was a mission to collect global sea surface salinity data for use as inputs for climate forecasting Broggio et al. and to help estimate the global hydrological budget. The sensor has a spatial resolution of 100 to 300 km and covers the Earth's surface in seven days, i.e., four times per month (Lagerloef et al., 2008). We used the monthly one degree-resolution sea surface salinity from 2012 to 2014.
Table S1 (shown in supplementary materials) displays the variables and periods of the reference data and monitoring programs used in this study.

THe gCMs OUTpUTs
We used monthly mean model output data from the BESM-OA2.5 and compared it to three GCMs provided by the CMIP5 experiment (HadGEM2-ES, MIROC-ESM-CHEM and MIROC5). Although a comparison with only three models does not entirely demonstrate the uncertainties of the simulations, it serves to indicate possible divergences and to direct further research. The models were selected based on the study by Chou et al. (2014) that used the models HadGEM2-ES and MIROC5, together with BESM-OA2.5, to evaluate the model Eta. To improve the assessment, we added the MIROC-ESM-CHEM because it considers more physical processes than MIROC5.
We interpolated all GCMs ocean variables outputs to a horizontal latitude-longitude grid resolution of 0.5° x 0.5°. In the vertical, we interpolated the GCMs to the same grid resolution of the BESM-OA2.5 in order to conduct statistical and performance analysis. However, in the subsurface spatial comparisons ( Figure S2 and 6) and in the development of the T-S diagram ( Figure 6), we interpolated the GCMs, including the BESM-OA2.5, using the same resolution grid as the monitoring programs, GO-SHIP and MOVAR, respectively (see Table S1). All GCM ensembles members used were the r1i1p1, where "r" denotes the "realization" number or the moment that the runs is initialized, "i" denote "initialization method indicator" or the initial conditions, and "p" denote "perturbed physics" or forcing combinations of simulations ; the GCMs outputs used in this study have the "same" simulation settings. In addition to the four GCMs, we developed two multi-model means: the mean of the four GCMs, called "all models average" (AMA), and the mean between HadGEM2-ES, MIROC-ESM-CHEM, and MIROC5, called "selected models average" (SMA). The purpose was to analyze possible differences between the BESM-OA2.5 and AMA datasets.
• BESM-OA2.5 The BESM-OA2.5 is an evolution of BESM-OA2.3, as presented by Nobre et al., 2013. It is a model system based on ocean, atmosphere, and sea ice data, as well as on changes in greenhouse gas concentration (Capistrano et al.,2018, Veiga et al., 2019. The ocean-atmosphere coupling is done by the FMS coupler developed by the Geophysical Fluid Dynamics Laboratory (GFDL) of the National Oceanic and Atmospheric Administration (NOAA) (Veiga et al., 2019). The Brazilian Global Atmospheric Model (BAM) (Figueroa et al., 2016) is the model responsible for the atmospheric component of BESM. This model was developed at the Center Weather Forecasting and Climate Studies (CPTEC/INPE) and has a horizontal resolution grid of approximately 1.875° x 1.875° (T62 grid) and 28 sigma coordinate vertical levels. The oceanic component is governed by the Modular Ocean Model (MOM) version 4p1, developed by GFDL-NOAA, which uses a tripolar grid with non-uniform meridional resolution with spacing of 1/4° between 10° S and 10° N, decreasing uniformly to 1° at 45° and 2° at 90 o in both hemispheres, and a uniform zonal resolution of 1°. In the vertical, the model has 50 levels with 10 m of resolution to a depth of 220 m, decreases gradually in deeper layers to reach the resolution of approximately 370 m. The main differences between the 2.5 and 2.3 versions are associated with the atmospheric component, with simpler and computationally cheaper parametrizations of shortwave radiation, longwave radiation, cloud microphysics, and continental processes (land surface model) (Capistrano et al., 2018). For more details on the BESM versions, see Capistrano et al. (2018), Figueroa et al. (2016) and Veiga et al., (2019).
• HadGEM2-ES The HadGEM2 Earth System model (HadGEM2-ES) is the most complete version developed by Met Office Hadley Center/UK (MOCH) for CMIP5 experiments. The model considers troposphere, land surface and hydrology, aerosols, ocean and sea-ice, terrestrial carbon cycle, ocean biogeochemistry, and tropospheric chemistry (Martin et al., 2011). The atmospheric component has 38 vertical levels and a horizontal resolution grid of 1.25° x 1.875° (N96 grid) of latitude and longitude, respectively. The oceanic component uses 40 non-uniform levels in the vertical. In the horizontal grid, the zonal resolution is 1°, and the meridional resolution is 1/3° between 30° S and 30° N and 1° between 30° and the poles (Collins et al., 2011). See more information in Collins et al. (2011) andMartin et al. (2011).
• MIROC-ESM-CHEM The MIROC-ESM-CHEM was developed in cooperation between the University of Tokyo, the National Institute for Environmental Studies/Japan (NIES), and the Japan Agency for Marine-Earth Science and Technology (JAMSTEC). The model resembles the HadGEM2-ES in the modules considered in its calculations. The horizontal resolution of the atmospheric component is approximately 2.8° x 2.8° (T42 grid) and 80 levels in the vertical. The ocean component has 1.4° of the zonal resolution, while the spacing of the meridional resolution gradually varies from 0.5° at the equator to 1.7° at the poles (Watanabe et al., 2011). See Watanabe et al. (2010) for further details.
• MIROC5 Developed by the same collaborators as MIROC-ESM-CHEM, the MIROC5 is a simpler climate model that considers atmospheric processes and aerosols, ocean and sea-ice, land surface, and hydrology (Watanabe et al., 2010), but does not consider the terrestrial carbon cycle, ocean biogeochemistry, and tropospheric chemistry (Flato et al., 2013). The atmospheric component has 1.4° x 1.4° (T85 grid) of horizontal resolution and 40 levels in the vertical. The ocean component has a non-uniform vertical resolution with 49 levels. In the horizontal, the zonal resolution is fixed at 1.4°, and the meridional resolution is 0.5° between 8° N and 8° S, gradually decreasing to 1.4° at 65° and remaining constant until the poles. For more information on MIROC5, see Watanabe et al. (2010). Table S1 displays the variables and periods of the GCMs outputs used in this study. Further details of CMIP5 models are available on the CMIP5 website (https://esgfnode.llnl.gov/search/cmip5/).

peRfORMaNCe evalUaTION Of THe gCMs
Using a similar methodology proposed by Pincus et al. (2008), we used the climatological time series  for computing the performance metrics of the SST, θ, SSS, and SO. We calculated the climatological time series from the monthly data of each model's output using the spatial average of the study area (latitude of 60°S to 25°N and longitude of 70°W to 20°E) for each time step.
We used the statistics described by Taylor (2001) and Pincus et al. (2008) and estimated the correlation coefficient (R), root mean square error (RMSE), centered root mean square error (CRMSE), and mean error (bias) between simulated and reference patterns. The use of these parameters is justified by Pincus et al. (2008), where the authors highlight that the statistical standards used by weather forecasting centers for verification of simulation performance are RMSE, bias, and anomaly correlation for each variable. In this research the anomaly correlation is replaced by correlation because we seek to evaluate the climatology, and at the anomaly correlation, this signal is removed. The bias associated with spatial patterns was calculated with the mean values for all historical and scenarios periods.
In Pincus et al. (2008), the metrics results are showed in two ways, the Taylor diagram and the normalized relative error (NRE) diagram. The Taylor diagram is presented here while the NRE diagram is shown in the supplementary material.
The Taylor diagram provides a quantitative representation of the main statistical results of the evaluation. The origin of radial distances represents the patterns of standard derivations. R is presented in the azimuthal angle, and the observed radial distances are proportional to CRMSE (Taylor, 2001). We normalized the standard deviations of the reference data for all physical proprieties to the value 2, which allowed us to represent several variables in a single diagram since reference data has the same position for all physical properties.
To identify linear trends, we calculated the 12-month moving average. We applied the Mann-Kendall test to the linear trend significance test, as described by Mann (1945) and Kendall (1970). The monthly rates for each model were calculated using the difference between the last and first data from linear regressions. The annual rates were calculated from the monthly rates, and the linear trends are shown in the supplementary material.
To verify ORAS5 consistency, a brief comparison was made between the ORAS5 data and all monitoring programs (GO-SHIP, MOVAR, MODIS Aqua/Terra Broggio et al. and SACD Aquarius). These comparisons are based in bias analyses of the monitoring programs climatology and the ORAS5. We always use the available periods of the monitoring programs as a parameter for the evaluations, matching the time series. The evaluations are available in the results sector herein.

ResUlTs
The results are organized in four main topics. First, we addressed the spatial patterns of surface biases together with the analysis of the annual rates in the study area (linear trends in supplementary material). Next, T-S diagrams were developed to compare model results with the TS profiles measured by GO-SHIP profiles (see supplementary material). The third topic presents the GCMs statistical metrics. The final topic compares the performance of the models in the RCP 4.5 and 8.5 scenarios.
The bias found between the mean climatology of SST-MODIS and SST-ORAS5 ( Figure 2c) and of SSS-SACD and SSS-ORAS5 ( Figure 2k) was considered low in the of satellite data periods (for MODIS, 2003(for MODIS, -2017for SACD, 2012for SACD, -2014. For SST, the values were found to be between ±0.4°C in practically the entire study area. We observed two regions with larger differences: BMC and near latitude 60°. These differences (+2°C and +0.8°C, respectively) are smaller than the evaluated models, especially in the BMC, where all evaluated models overestimated at least 6°C compared to the ORAS5, consequently, ~8°C of the MODIS dataset. For SSS, we observed values between ±0.25 on most of the map. In the subtropical zone (STZ), subantarctic zone (SAZ) and polar zone (PZ), we observed salinity values of 0.5. For a small region near the plume of the Amazon River, and over the Gulf of Guinea (GG), the values approached +1 and may be higher. Although ORAS5 still diverges from the observed data, region errors are common in simulations. Considering both the reasonable representations and that errors are attenuated, going forward we adopt ORAS5 as the reference data to evaluate the GCMs surface data.
In GCMs bias patterns, we observed common errors between all GCMs in the SST simulation (Figures 2d to 2h). Positive SST errors are identified at latitudes greater than 45°S (subtropical zone (STZ), subantarctic zone (SAZ), polar zone (PZ)), while minor errors are found in the MIROC-ESM-CHEM (+3°C) ( Figure  2g) and major errors are found in the BESM-OA2.5 (+6°C) (Figure 2e). Positive SST biases were also found in the BUS/GG and BMC regions. In the BMC region, all GCMs showed similar bias magnitudes (+9°C). Cold biases were found in the Zapiola gyre (ZG) and Agulhas retroflection regions, with a lower intensity and regional impact. The MIROC-ESM-CHEM simulations (Figure 2g) showed the most impactful cold bias, with values of -6°C and -4°C in the Zapiola gyre and Agulhas retroflection regions, respectively. Specifically, for BESM-OA2.5, generalized warming occurs in the SST field ( Figure 2e) in nearly all studied areas, except the equatorial zone, and more specifically north of the SASG and along the Brazilian coast. Nonnegative errors were found in the SST, though it exhibits slight positive or neutral bias in places where the other GCMs show negative biases such as the Zapiola gyre and Agulhas retroflection.
Figures 2l to 2p show the SSS bias patterns. Though the SSS biases are generally negative, we observe overestimations in some GCMs. Errors in regions with runoff from rivers and their adjacent regions are noticed overall (e.g. GG region and the Amazon River plume). In the Amazon River plume region, SSS bias values are greater than 3 for all GCMs. We also observed a large diversity of biases among the models. For MIROC-ESM-CHEM, (Figure 2o) and MIROC5 (Figure 2p), underestimations stand out in the GG. For HadGEM2-ES, an underestimation is associated with the center of the SASG, while an overestimation occurs in the GG and BUS regions. In BESM-OA2.5, the bias in SSS patterns ( Figure 2m) is visible in large regions. We found some errors in SSS with magnitudes and regions similar to the other models, such as over the Amazon River plume, GG, and BMC regions. However, two discrepant biases stand out. First, we observed a less intense bias (approximately -2 of salinity) near the subequatorial (~10°S), extending along the northeastern Brazilian coastal zone. Second, we noticed a more intense SSS (less than -2.5), located in the STZ and the SAZ (~40°S to 50°S) regions. Table 1 and Figure S1 (supplementary materials) show the comparisons between the SST (°C) and SSS annual rates for all simulated datasets between 1979 and 2017. Here, the calculations are applied for the entire domain of the SAO. The Mann-Kendall test was applied at the 0.95 significance level. Only SSS historical data of the MIROC-ESM-CHEM did not reject the null hypothesis, with 0.625 of p-value.
Annual rates of 0.008°C/year and -0.0009 salinity/ year were estimated for the ORAS5 dataset (Table 1) for SST and SSS, respectively. The BESM-OA2.5 had the best performance (0.012°C/year) for the SST annual rate, while the HadGEM2-ES had the worst results (0.021°C/year). However, the average presented by BESM-OA2.5 (19,263°C) was the highest among the models, at 1.6°C above ORAS5 and 1.29°C above the nearest individual model (MIROC5). In this context, the model that best replicated the mean value of historical data was HadGEM2-ES (17.606°C). For salinity, the GCM results did not simulate negative annual rates, preventing comparisons. However, the mean values are subject to comparison. All salinity Broggio et al. 38-year (1979-2017) period. Ø -Not reject the null hypothesis in the Mann-Kendall test.
underestimations are reflected with imbued errors in the average values in a particular way for each model. These systematic errors are perpetuated in the studied period, causing deviations with close amplitude between the models. The BESM-OA2.5 was an exception, once again, showing the largest deviation from the reference data (underestimation of -0.66 salinity compared to ORAS5, and mean value of 34.651). We observed the best performance in the MIROC5, with an underestimation of -0.28. Figure 3 illustrates the Hovmoller diagram of seasonal cycle bias for the zonal means of SST (°C, left column, a to f ) and SSS (right column, g to l) in the SAO for the period between 1979 and 2017. Figure 3 shows the best performance for the SST in the MIROC-ESM-CHEM (Figure 3e), with small underestimations in the ~10°N and between 30°S and 50°S. The HadGEM2-ES, MIROC5, and SMA simulated positive SST biases in the STZ and SAZ regions, and negative bias in the equatorial zone throughout the year (Figures 3b, 3d, and 3e). The BESM-OA2.5 showed the largest bias ( Figure 3c). We observed overheating in the summer season (December to June), with a maximum value (+9°C) in February and a slight negative bias between 30°S and 60°S between July and November, and a positive bias (3 to 4°C) with little seasonal signal at latitudes between 30°S and 0°.
The SSS annual variation observed in the ORAS5 data occurs at low latitudes, near the equatorial zone and with small amplitudes. Those low salinities were found between July and November near 10°N and at the equator with minor intensity between January and June (Figure 3g). In the simulations, none of the GCMs demonstrated this variation. We see positive salinity biases for the BESM-OA2.5 (+1.25) and HadGEM2-ES (+1) (Figures 3i and 3j) between July and November near 10°N, and negative biases nearly year-round in the latitudes of ~10°S for BESM-OA2.5 (-2.25) and MIROC-ESM-CHEM (-2.5) (Figures 3i and  3k), and in the latitudes between 10°S and 0° for the MIROC5 (Figure 3l). For BESM-OA2.5 (Figure 3i), the negative bias patterns near the Brazilian coast (~10°S) and in the STZ and SAZ regions are persistent year-round.

TeMpeRaTURe aND salINITy sUbsURfaCe ClIMaTOlOgy
Figures 4 and 5 show the bias of the θ (°C, panels c to h) and SO (panels k to p) for the A10 and A16S transects. The A10 transect is a zonal profile at latitude ~30°S, and the A16S transect is a meridional profile at longitudes between 25 and 35°W (Figure 1). Figures 4c/5c and 4k/5k show a comparison between the GO-SHIP data and the ORAS5 reference data, corresponding to the periods of October 2011 (A10) and January 2014 (A16S). The remaining figures show the evaluated bias between GCMs and ORAS5. Three contour lines are shown in these figures to emphasize the water mass interfaces based on Silveira et al. (2000) and Pereira et al. (2014). We calculated the potential density adopting a reference value in the sea surface (0 dbar of pressure).     . The same descriptions apply to the SO data in panels i to p.
In both Figures 4 and 5, we observed layers near the surface (above ~700 m) that had larger variations of spatial bias amongst all models. Variations in the subsurface were also identified, albeit at lower magnitudes and less diverse in the spatial patterns.
In Figure 4, for the BESM-OA2.5 (Figures 4e and  4m), the magnitude of errors was similar to the other models in both physical properties. However, we observed differences in spatial patterns and in the position of isopycnal lines. The spatial patterns of temperature (Figure 4e) differed from those represented by the ORAS5. A positive bias regarding nearsurface temperature was found between 25°W and 15°W. Another highlighted anomaly occurs in the SACW temperature, with an error of approximately -3°C covering the entire A10 transect. The simulated TW showed a larger thickness between 25°W and 0°. For the salinity field of the BESM-OA2.5 (Figure 4m), we noticed an underestimation between 15°W and 0°. The BESM-OA2.5 (Figures 4e and 4m) and MIROC-ESM-CHEM (Figure 4g and 4o) provided good representations for waters at depths below 700m, showing minor bias in the representations for the AAIW and NADW. In these water masses, the BESM-OA2.5 showed small polarization in the θ and SO fields in NADW. Another error can be found in the AAIW/ NADW interface, especially in the eastern part of the transect, which is simulated at lower depths of nearly 900m. In the GCMs and ORAS5, this interface occurs at depths greater than 1000m.
The observed patterns in Figures 5 and 4 are similar. The largest biases for BESM-OA2.5 (Figures 5e and 5m) are again associated with underestimations of SACW. However, in this transect (A16S), this pattern is observed in both variables (θ and SO) and more intensely than in Figure 4. The peaks seen between 40°S and 30°S had approach values of -7°C and -2.5 of salinity, respectively, but occur at different depths. The region with the highest temperature bias was observed in the subsurface, at depths near 200 m, while the highest salinity bias occurs at the surface. It is also possible to observe a greater thickness of the TW between 30°S and 15°S and the simulated AAIW/NADW interface above depths of 1000m, patterns already presented in Figure 4.

skIll MeasURes
The Taylor diagram is shown in Figure 6. As mentioned previously, standard deviations are normalized to 2 for all physical proprieties, allowing for the representation of several variables on the same diagram.
The Taylor diagram presents better performances linked to temperature variables (SST and θ). For SST, the HadGEM2-ES, MIROC-ESM-CHEM, MIROC5, and SMA showed the best results. The BESM-OA2.5 had the worst performance, directly affecting the AMA set due to its high values of StdD and CRMSE. For θ, MIROC-ESM-CHEM and MIROC performed best. SMA and AMA sets had intermediate performance, as they were misrepresented by the BESM-OA2.5 and HadGEM2-ES. We highlight BESM-OA2.5, which had a low correlation (~0.4) when compared to other models (R > 0.9).
For SSS and SO, we observed greater dispersion and distance from the ORAS5, especially in SSS, the variable in which the models performed worst. For this variable, we highlighted the mean multi-models (SMA e AMA) that presented better performances. The HadGEM2-ES was the best individual model, but its seasonal cycle behavior is the inverse of the ORAS5, reflected by the negative R value (~ -0.5). This also occurred with BESM-OA2.5, which had higher errors (StdD and CRMSE) and more accentuated negative R value (~ -0.7). For SO, we observed a similar case occurring with θ. Two models performed well (HadGEM2-ES and MIROC-ESM-CHEM), while two others showed larger differences to the ORAS5, affecting SMA and AMA performance. Once again, the BESM-OA2.5 had a negative highlight, beyond the MIROC5. The negative R values of both models are more accentuated than in SSS. Figures 7 and 8 show the long-term differences of the RCPs from SST and SSS, respectively. These were calculated using the mean value from the final twenty years (2078 to 2098) minus the mean value of the initial twenty years (2006 to 2026). This allows us to consider interannual variabilities, such as the El Niño -Southern Oscillation (ENSO), within the mean values.

fUTURe sCeNaRIOs
In general, the models are similar in both RCPs in terms of amplitude and spatial distribution of changes in the long-term projections of SST (Figure 7). There is a small difference in amplitude with MIROC-ESM-CHEM, especially at RCP 8.5 (Figure 7k), where the model stands out in regions that see temperature increases reaching 4°C, especially in the equatorial zone. The BESM-OA2.5 stands out with divergences to the spatial distribution of the warming in both RCPs. Whilst other GCMs show greater warming in the equatorial zone, the BESM-OA2.5 (Figure 7c and    2026). a-f ) The differences for the RCP4.5 scenario; g-l) The differences for the RCP8.5 scenario.

South Atlantic thermohaline properties evaluation
Ocean and Coastal Research 2021, v69:e21029 14 Broggio et al. 7i) shows the opposite, with higher warming at higher latitudes (35°S to 60°S), which includes the STZ, SAZ, and polar zone. We noted that the BMC showed the most warming impact among the models, especially in RCP 8.5, with values above 5°C. In this case, the BESM-OA2.5 (Figure 7i) showed a lower amplitude (~4°C) and a minor warming region, with a small center close to the coast, differing from the other models, which extended to the east in the entire longitudinal section.
As with the SST, we analyzed the spatial distribution of changes in the long-term projections of SSS (Figure 8). Between the RCPs, we observed few variations of spatial patterns. Only the GG region presented variations, specifically for MIROC-ESM-CHEM and MIROC5, which do not present negative biases for the local in RCP 4.5, though it began to do so at RCP 8.5. The largest differences occur in the increasing magnitudes of long-term differences between RCP 4.5 and 8.5. We observed higher differences in the RCP 8.  Table 2 and Figure S4 (shown in supplementary materials) present the annual rates and linear trends of the SST (a and c) and SSS (b and d), respectively, in the SAO for the RCPs.
The annual rate (Table 2) in SST and SSS increases according to the differences in the pathway of emissions (RCP 4.5 and RCP 8.5), except for SSS (RCP8.5) in the MIROC-ESM-CHEM ( Figure S4d). For this property, MIROC-ESM-CHEM simulated an increase in the moving average until approximately 2050, followed by a decrease through 2098. Nonetheless, the annual rate is positive (0.0003) and is the smallest among the models (Table 2), verified by the Mann-Kendall test (0.95 significant level). The BESM-OA2.5 simulated a warming of ~0.013°C/year for the RCP4.5 scenario ( Figure S4a and Table 2) and ~0.029°C/year for the RCP8.5 scenario (Table 2 and Figure S4c). The increase of emissions also impacted the salinity; we observed 0.0008 salinity/year and 0.0014 salinity/year for the RCP4.5 (Table 2 and Figure S4b) and RCP8.5 (Table  2 and Figure S4d), respectively. On average, BESM-OA2.5 showed higher values of temperature and lower salinity compared to other models, approximately  1.31°C warmer (RCP 4.5 and RCP 8.5) and 0.38 less saline (RCP 4.5)/0.39 (RCP 8.5). These comparisons were applied between the mean of BESM-OA2.5 and the individual model that was closest to that mean.

DIsCUssION sURfaCe aND seasONal CyCles bIases
We observed three regions in our study area in which all models had low performance by simulating the temperature fields: BMC, BUS/GG, and the Southern Ocean.
The BMC and BUS/GG are very dynamic regions with complex processes and are thus difficult to simulate, especially for climate models. The BMC comprises the convergence of two intense opposing flows: the Brazilian Current and the Malvinas Current (Pontes, Gupta and Taschetto, 2016), resulting in a region with a high potential for vorticity and generation of swirls (Piola and Matano 2017). Several mechanism studies are proposed to explain the dynamics of the BMC, including the transport of Malvinas Current (Matano, 1993), local wind fields (Garzoli and Giulivi, 1994), ACC transport (Gan, Mysak and Straub, 1998), and seasonal changes in wind stress (Fetter and Matano, 2008). Given the amount of processes that are susceptible to simulation errors, the main explanation currently points to the coarseness of GCMs resolution. In this highly energetic region, mesoscale processes are important but require high resolution simulations. However, the GCMs resolution render their representation unfeasible (Pontes, Gupta and Taschetto, 2016). In this context, for the BUS and GG regions, and BMC and Agulhas retroflection, Gent et al. (2010) and Sitz, Farneti and Griffies (2007), respectively, show a significant reduction in bias patterns resulting from simulations with improved resolution. However, these can still be found, albeit at lower intensities.
Another source of bias is the parameterization of the models. In the discretization of equations for more refined grids or in large-scale processes that cannot be solved explicitly, computational constraints compel the models to simplify physical processes (Randall et al. 2007). The approximations from parametrization allow for the highly energetic regions or mesoscale processes to be represented through these simplifications. As such, biases are commonly seen in these simulations. We did not identify studies describing the impact of specific parametrizations to explain the bias associated with modeling the BMC region. For this dynamic region, most studies propose that the coarse resolution is the main cause of simulated bias by GCMs. We believe this occurs due to the complexity of the BCM region, where explanations for the TS patterns remain unknown and under constant study (Piola and Matano 2017). On the other hand, bias caused by parametrization is better understood for modeling the BUS/GG region. The region is characterized by a complex ocean-atmosphere interaction, with processes such as wind stress, vertical and horizontal advective processes, low cloud formation, heat transport, and mesoscale vortices (Huang, Hu and Jha, 2007;Colas et al., 2011;Zheng et al., 2011). As it is a relatively complex zone, some studies discuss the reasons for recurring bias in the GCMs. The main cause of the warming bias in GG/ BUS region seems to be the low cloud covering parametrization. Various studies such as Huang, Hu, and Jha (2007), Colas et al. (2011), Zheng et al. (2011), and Wang et al. (2014 associate the warming biases of this region with less low cloud covering, generating excessive heat fluxes in the ocean. The authors also suggest that low cooling rates of the surface waters in the BUS/GG region are associated with underestimating upwelled waters due to the reduced wind field intensity (Colas et al. (2011), Zheng et al. (2011), Wang et al. (2014). In this case, the coarse horizontal grids in the atmospheric components of GCMs is the factor most responsible for simulated errors in upwelling regions (Colas et al., 2011).
For the Southern Ocean (region limited to latitudes above 40°S), only the MIROC-ESM-CHEM simulation showed a less intense bias. However, deviations are still present in the model. In this region, water masses in deep waters are identified as a possible culprits for the warming biases seen on the surface. Wang et al. (2014) describes the weakening of AMOC in simulations made by GCMs. This weakening is associated with reduced NADW densities and the strengthening of Antarctic Bottom Water. The lower density of NADW allows denser waters on the surface of the Southern Ocean to sink, strengthening the convective mixing and elevating warmer subsurface water to the surface.  Broggio et al. In the BESM-OA2.5, we observed SST behavior different from the other models. A systematic error generates a temperature bias over the entire area affecting the model's mean (Table 1 and Figure S1). Unfortunately, there are few references to this model, complicating efforts to find explanations for this behavior. Since the SST is tightly restricted by the ocean and atmosphere interaction, some processes can interfere in the incident radiation, misrepresenting the SST response (FLATO et al., 2013;RANDALL et al., 2007). In this case, the analysis of certain variables can help us understand this generalized bias; it is necessary to evaluate parameters such as clouds, heat flux, short and long-wave radiation, among others. In this context, we opted not to expand the analyses because the availability of BESM-OA2.5 datasets is limited, without online and open access, especially for historical data. In addition, this would greatly expand the research. The analysis of seasonal cycles ( Figure  3) allowed us to observe that this pattern seen in the BESM-AO occurs throughout the year, and more intensely in the summer (DJF). The greatest intensifications were found over the Southern Ocean. This pattern was also found in the other models, albeit at lower intensity. The biases discussed thus far (in BMC, BUS/GG, and Southern Ocean regions) are described by Wang et al. (2014) as occurring throughout the year, with intensifications in autumn (MAM) and especially in summer (DJF). The seasonality expressed by BESM-OA2.5 is consistent with other studies. However, the amplitude of its bias is larger. By the logic addressed earlier (Wang et al. (2014)), it is expected that BESM-OA2.5 simulates warmer subsurface waters in the Southern Ocean, generating a more intense convective mixture, which increases temperature error at the surface. In the meridional transect A16s (Fig.  5e), we can observe the Southern Ocean subsurface with a warm bias in BESM-OA2.5 simulations, which corroborates our hypothesis. A study on simulated transports in the South Atlantic Ocean offers another possibility (Sitz, Farneti and Griffies (2007)). Here, the authors analyzed the impacts on simulations caused by different resolutions grids and linked the positive biases (temperature and salinity) between 50°S and 60°S with the underestimate of the Pacific Ocean freshwater entrance, especially through the Magellan Strait. This underestimate would be caused by rough bathymetry of the coarsened grid spacing models.
Here, we believe that the previously described can influence the presence of biases. However, due to the amplitude and spatial dimension of biases, this should not be the primary explanation.
Various authors warn about the complexity of analysis of salinity fields. Salinity is more difficult to analyze because it is related to complex processes such as the difference between evaporation and precipitation, ice formation and melting, river runoff, and vertical advection, which has great impacts on salinity variations (Berger et al.,2014;Flato et al., 2013;Schmitt, 1995;Yu, 2011).
In the salinity fields (Figures 2i to 2p)), the bias presented had a greater variation in spatial patterns between the models. The common biases among the models were associated with three regions with high discharge of fresh water. In addition to the BMC and GG regions, which were discussed for temperature fields, the Amazon River plume region was also observed. In BMC, the impact of freshwater discharge comes from the Plata River, and in GG, from the Niger and Congo rivers, among others. The plume dispersions of these big rivers are governed by mixing processes and multi-scale flows, which involve mixing in stratified shear flows, frontal processes, geostrophic transport, and wind forcing (Horner-Devine, Hetland and MacDonald, 2015). Thus, as with the temperature biases, mesoscale processes are the possible error sources associated with coarse grid resolutions in salinity bias fields. Other sources of bias may be linked to the reproduction of the discharges of these rivers, arising from simulated precipitation by the continental process modules of the GCMs (Santini and Caporaso, 2018). Especially for GG, errors in the runoff of rivers or in precipitation values can be even more significant in SSS bias. The region is impacted by the West African monsoon regime, a process that is recognized for generating the worse performance in GCM simulations due to errors in latitudinal migrations of the intertropical convergence zone (ITCZ), affected by errors in the SST fields (Roehrig et al., 2013).
In BESM-OA2.5, we observed two large salinity biases that differ from other models, the north of the South Atlantic subtropical gyre, and the STZ and the SAZ. We believe that a single source of error generates negative biases in these two regions. In Figure  2m, we observe that the bias associated with STZ and SAZ originates prior to entering SAO, with an Broggio et al. underestimation of -1.5 in the region of displacement of the Malvinas current to the North, before mixing with the Brazil current. This underestimation intensifies with the mixing at BMC and is perpetuated to the STZ and SAZ regions, apparently influencing the SACW formation. In Figure 5m, which shows the meridional profile A16S, we observed this error being propagated in the subsurface to the North, carried by SACW. Results described by Poole and Tomczak (1999) show that part of the SACW disperses to the north, reaching the equator with little intermixing. These results are shown for the depth of 300m. In our results (Figure 5m), we show that the negative salinity bias emerges again for the surface between latitudes 20° S and 10° S, which coincides with the underestimated region, north of the South Atlantic subtropical gyre. The T/S diagram of the from BESM-OA2.5 ( Figure S2c) also shows lower near surface salinity (TW) values in comparison with the MOVAR and the other GCMs in the region (P in Figure 1). In Figure  3, we can observe this pattern throughout the year, with no defined seasonal cycle. This bias is reflected in the average salinity value of the model (Table 1), probably due to the large areas affected by such underestimations.

sUbsURfaCe aND WaTeR Masses bIases
Some errors in subsurface stood out in BESM-OA2.5. The TW/SACW interface (Figure 4e/4m (A10) and 5e/5m (A16S)) was represented with greater depth in comparison to the others GCMs. As is more evident in Figure 4, we would expect to find the highest concentration of this water body (TW) closer to the Brazilian coast due to the presence of the western boundary current. These currents are narrower, deeper, more confined zonally, and are responsible for transporting TW from the tropical region to the South by the Brazilian Current (Emílsson, 1961, Stramma andEngland, 1999). However, the BESM-OA2.5 simulated a great thickness in the SAO center. Apparently, the error in the TW/SACW interface pattern of the BESM-OA2.5 is caused by two anomalies at different longitudes. For these, seen at 25°W to 15°W (temperature anomaly) and at 15°S to 0° (salinity anomaly), we do not observe density compensation, though it can be clearly seen in the SACW region, especially in the A16S GO-SHIP cruise in the BESMs temperature and salinity anomalies ( Figure   5e and 5m), where the bias patterns are repeated in both variables. These errors refer to the processes described at the end of the previous subsection, where we associated the errors with the formation of the SACW. The primary origin of the SACW is associated with the BMC. The mixture of Malvinas Current cold waters and the runoff of Plata River waters with the Brazilian current promotes sinking through subduction by Ekman pumping and a subsequent dispersion of the water mass that forms towards to the equator (Emílsson, 1961;Sprinfall and Tomzac, 1993;Silveira et al., 2000;Stramma and England, 1999). Subduction zones are highlighted by Tailleux, Lazar and Reason (2005) as being regions intrinsically linked to density compensation caused by T/S anomalies. These anomalies are commonly caused by air-sea interactions and are transferred from the mixed layer to the thermocline, where they undergo changes by both diabatic and adiabatic processes (Tailleux, Lazar and Reason, 2005). The major salinity errors of the BESM-OA2.5 occur in the surface layer, while the temperature errors stand out near depths of 200 m and deeper, suggesting that SACW bias is related to salinity surface error. With the density compensation, the bias also occurs in temperature values as there is water mass subduction. This analysis is based on conjecture and requires better analysis of salinity data. It would be ideal to analyze the freshwater balance to better understand these errors in salinity at the surface. However, due to the lack of availability of BESM historical data, this process has been hindered and extends the scope of this research.
The BESM-OA2.5 simulations for AAIW and NADW showed better performance compared to the other models (Figures 4 (e and m) and Figures 5 (e and m)). However, we observed a divergence in the depth of the isopycnal of 27.53 kg/m³. Small, polarized anomalies in AAIW did not characterize density compensation, which probably causes the displacement of the isopycnal, as the local density is not preserved. In the other models, this isopycnal occurs at greater depths. In all of them, density compensation is observed from positive anomalies of temperature and salinity. This is more prominent in the HadGEM2-ES (Figures 4 and 5 (f and n)) and MIROC5 (Figures 4 and 5 (h and  p)) models. For NADW, the BESM-OA2.5 presented a temperature overestimation in the Southern Ocean subsurface higher than the other models, especially Broggio et al. at latitudes above 50°S. This may be the main cause of the simulated warming in the Southern Ocean, as discussed previously, following what was described by Wang et al. (2014). skIll MeasUReMeNTs Annan andHargreaves (2011), Gleckler, Taylor andDoutriaux (2008), Pincus et al. (2008), Randall et al. (2007), Sillmann et al. (2013), and others have discussed the performance of the multi-model means (SMA and AMA in this research). All of them demonstrated that the multi-model means are usually more consistent with observations than individual models. In both statistical analyses (Taylor diagram (Figure 6) and NRE (not shown - Figure S3)) we also observed better performance of the multi-model, especially in the NRE result. Overall, the SMA model performed best in nearly all statistical metrics for all physical properties ( Figure S3). In certain cases, however, we observed individual models having better performance than the multi-model means, such as with variables SO and θ (Figure 6), especially in the representation of the Taylor diagram. It is not surprising that these patterns occur in some cases. According to Tebalbi and Knutti (2007), the consistency of the multi-models mean tends to increase as the number of models increases. If we consider a greater number of GCMs, the errors in the means of multi-models would probably be suppressed. As such, a possible explanation for our difficulty in finding better performance in the SMA and AMA data sets in some cases may be because of the low number of models considered in this research. Flato et al. (2013) also warns of common errors that appear when using a certain number of models, or with poor performance of a specific model, which directly affects the multi-model means. This is observed in the Taylor diagram for SST, in which we observe that the BESM-OA2.5 misrepresents the AMA set with a greater difference than ORAS5.
In the Taylor diagram (Figure 6), we observed that the models have greater difficulty in reproducing the salinity fields, especially SSS. Taylor, Stouffer and Meehl (2012) state that temperature fields are almost exclusively controlled by ocean-atmosphere interaction, but that for salinity fields, the sources of SSS variation (evaporation / precipitation, sea ice and river runoff ) are less significantly related to the SSS fields. The greater number of disturbances associated with low significance allows uncontrolled errors to occur (Taylor, Stouffer and Meehl, 2012). As such, it is expected that the errors in the simulations be proportionally larger for salinity fields. In the statistical metrics of BESM-OA2.5, we see an improvement in salinity simulations compared to temperature. The NRE values ( Figure S3) are reflected in the Taylor diagram ( Figure 6). Although the model presents errors in the salinity fields (especially SSS), the other models also present errors in a similar proportion, providing a lower NRE value for the physical property. It is worth mentioning that the sources of variation in the SSS are potentially affected by the SST. Thus, poor performance in the temperature fields can distort the SSS fields.

fUTURe sCeNaRIOs
The long-term temperature differences (Figure 7) observed in the models can be summarized as greater warming in the equatorial zone and regionally, in the BMC. The exception is the BESM-OA2.5, where we see greater heating of STZ and SAZ. The patterns seen in the models are corroborated by the results demonstrated by Ruela et al. (2020), which assess global long-term temperature differences between averages of 1979/2005 and 2070/2100 for a multimodel mean. This heating pattern is reflected in the linear trends analysis ( Figure S4) and annual rate values (Table 2). In an approximate comparison with Ruela et al. (2020), that shows a spatial global heating map in their results. Their results for maximum heating present annual rates of ~0.02°C/year (RCP 4.5) and ~0.035°C/year (RCP 8.5), and minimum heating annual rates with values of ~0.008°C/year (RCP 4.5) and ~0.012°C/year (RCP 8.5) for the SAO. In our results, all models and the multi-models means, except MIROC-ESM-CHEM, showed results between these values. Even though this is a simple comparison, it is possible to verify that our annual rates are consistent and coherent with other published work.
As seen in Figure 7, Figure S4a, S4c, and Table  2, the models represented long-term warming and showed sensitivity when simulating higher heating rates in the RCP 8.5 scenario. These patterns directly affect salinity simulations. In the climate warming scenario, the pattern with salinity increase (decrease) in subtropical (temperate latitude, ~40°S to 60°S) Broggio et al. regions are described in several recent studies. The water cycle is enhanced and is expressed by the intensification of surface water fluxes (evaporation and precipitation), which affects salinity patterns, also called the "rich get richer" mechanism (Chou et al., 2006;Chou et al., 2009;Durack, Wijffels and Matear, 2012;Terray et al., 2012). In this context, all models evaluated herein presented this pattern in the SSS long-term differences (Figure 8). The amplification of the mechanism is observed in the RCP 8.5, where we have a scenario with greater warming and, consequently, more changes to the magnitudes of salinity patterns.
Solely in the case of BESM-OA2.5, the systematic errors previously described in historical linear trends ( Figure S1) are still observed in the linear trends ( Figure S4) and mean values (Table 2) for both RCPs scenarios.

CONClUsIONs
The primary objective of this research was to evaluate the performance of the BESM-OA2.5 and three other GCMs in the South Atlantic Ocean with regards to their thermohaline properties simulations. The historical GCMs errors were statistically evaluated with monitoring systems and the ORAS5 reanalysis. The spatial patterns of the thermohaline properties were examined, their temporal trends were calculated, and the scenarios were compared. There are limitations to using only three models for a GCMs evaluation. However, it was possible to find substantial results that contribute to the development of these models, especially the BESM-OA2.5.
In SST biases, we observed similar errors between all models in the following regions: BMC, BUS/GG, and Southern Ocean. Due to complex multi-scale dynamics, BMC and BUS/GG were regions where GCMs had difficulties in representing mesoscale processes, a well-researched fact studied by various authors. In the GG region, the probable existence of errors in the representations of low cloud covering and low cooling caused by insufficient Ekman flows can intensify these biases. In the Southern Ocean, the warming is possibly caused by the weakening of AMOC, which results in a simulation of NADW with reduced subsurface density. This strengthens the convective mixture, bringing warmer waters to the surface. This was the only bias seen year-round due to its size and intensity. SSS biases also showed common patterns in all models. The BMC, GG, and Amazon River plume are regions adjacent to large freshwater discharges, with great dynamics and, once again, subject to mesoscale processes. In addition to the bias created by coarse resolution grids, these locations are subject to over/underestimation of river runoffs, caused by precipitation errors over the continent. Here, we recommended assessments of freshwater balance and continental precipitation to better understand these biases.
Specific to BESM-OA2.5, the SST bias field showed warming over the entire study area, but we were unable to identify a plausible reason for this systematic error. However, over the Southern Ocean we observed an intensification of the bias in the autumn (MAM) and in summer (DJF). This is probably explained by a greater overestimation of subsurface temperature when comparing to other models and linked to NADW. In this manner, the convective mixture intensifies even more, generating a greater bias and larger surface. These uncommon biases relative to other models generate an overestimation in the average temperature of approximately 1.3°C. We believe that to better explain this generalized heating in nearly the entire study area, studies related to heat fluxes and radiative balance of the BESM-OA2.5 are needed. Likewise, certain salinity biases stood out only for BESM-OA2.5. The underestimations in the STF/SAZ and in the north of the South Atlantic subtropical gyre were linked to the misrepresentation of the SACW. The Malvinas current enters the South Atlantic with a negative salinity bias that is amplified in the process of creating SACW. This water body sinks in the STZ and is transported north in the subsurface, resurfacing near the equator with the salinity biases. In this case, we recommend expanding the study area to identify the source of the negative bias from the Maldives.
In the subsurface, the BESM-OA2.5 presents errors in the depth localization of the TW/SACW and AAIW/NADW interfaces. Difficulties in the representation of SACW associated with water mass formation are observed, and the same error patterns are seen for SSS and SST, leading to the density compensation effect. Based on the good patterns seen in the subsurface bias analyzed for the BESM-OA2.5, the deepwater representation is one of the highlights in this Broggio et al. assessment, because the BESM-OA2.5 presents the smallest errors in the AAIW and NADW simulations. On the other hand, the model simulated the AAIW/ NADW isopycnal interface with less depth, indicating errors in densities values.
Even with only four models contributing to this study, the multi-model means presented the best performance in the evaluation of statistical metrics, especially SMA, consistent with the findings of several authors. The AMA ensemble did not perform as well as the SMA. Frequently, the AMA was misrepresented by BESM-OA2.5, which presented the worst performance among the models in both applied methodologies.
In terms of analysis of the projected scenarios, the models demonstrated sensitivity to variations in greenhouse gas emissions between the RCPs. Warming at higher magnitudes were seen in the equatorial zone, except for BESM-OA2.5, which demonstrated greater heating in the STZ and SAZ. The heating triggered an increase (decrease) in salinity in subtropical regions (temperate regions, ~ 40 ° S to 60 ° S), following the "rich get richer" mechanism. This pattern was intensified in the RCP 8.5 scenarios.
All models presented overestimated annual historical temperatures, with BESM-OA2.5 the closest model to ORAS5. For historical salinity, none of the models showed the decreasing annual rate of ORAS5, rendering comparisons unfeasible. In these scenarios, the annual temperature rates showed an increase between RCP 4.5 and 8.5. This pattern was also seen in the annual salinity rates, with a decreasing rate only for the MIROC-ESM-CHEM in RCP8.5.
The complexity of global climate models demands continuous evaluations for improvement.For the BESM-OA2.5 specifically, we find few studies of the physical processes of the model. Therefore, additional study of other modeling physical properties should be conducted to clarify the divergences between the models observed herein.   Table S1 shows the variables and periods of the reference data, monitoring programs, and GCMs outputs used in this study.

sUppleMeNTaRy MaTeRIal
The 12-month moving average in the 1979-2017 period for SST (°C) and SSS (shaded line). Straight lines show linear trends and represent the annual rates for all simulated datasets.
Expendable bathythermograph (XBT) vertical profiles from surface to 700 m (Van Caspel et al, 2010) were obtained at a spatial resolution of approximately 27 km between Rio de Janeiro and Trindade Island by the Brazilian program Monitoring the Regional Variability of heat and volume transport in the surface layer of the South Atlantic Ocean between Rio de Janeiro (RJ) and Trindade Island (MOVAR). Near the coast and Trindade Island, the resolution increases to 18 km (Goes et al., 2019). The MOVAR cruises have a frequency of approximately five times per annum. The MOVAR data used herein spans from 2004 to 2017. In order to compare T/S diagrams, we randomly chose an intermediary point (see Figure, site P) to the transect sampled in MOVAR.
The ability of GCMs to represent water masses near the Brazilian continental shelf is illustrated in Figure S2, which shows the T-S diagrams for point P (35°W/21.5°S) (see Figure 1). We use the entire available MOVAR time series,with the exception of the June 2011 and January 2014 expeditions due to data inconsistencies. A total of 65 profiles were used. For the simulation datasets, the time series used precisely match the months available from the MOVAR program. The MOVAR data are used as a reference instead of ORAS5 because the observational program has a concise time series and excellent data quality control. The comparison between MOVAR and ORAS5 reanalysis data ( Figure S2a) reinforces the ORAS5 performance reviewed in Figures 2c/k, 4c/k and 5c/k, where low bias (approximately 0°C and 0 of salinity) can be found in nearly the entire studied area. In the T-S diagram ( Figure S2), the temperature and salinity of the ORAS5 data show errors only in TW when compared to MOVAR data. The comparisons between thermohaline data from the MOVAR program and the GCMs are shown in Figures S2b to S2f.
The normalized relative error (NRE), as proposed by Gleckler, Taylor and Doutriaux (2008) is presented in Figure S3. The method compares a certain model error (E mo ) with the median error of all models (TME). The NRE is given by: (1) The same procedure was applied to the statistical metrics (R, RMSE, CRMSE, bias, and StdDev) proposed by Pincus et al. (2008) in our study. These errors are defined as 1-R, RMSE, CRMSE, |bias|, and |1-StdDev|. If a model has an NRE of 0.1 (-0.1), the model error is 10% larger (smaller) than the TME. Figure S3 illustrates NRE values using a heatmap color pattern. Figure S4 shows the 12-month moving average (shaded line) for the RCPs during 2018-2098 for SST (°C) and SSS. The straight lines are the linear trends and represent the annual rates for all simulated datasets.