# %pip install "hypercoast[extra]"
Import libraries.
import earthaccess
import hypercoast
import pandas as pd
Search for PACE data¶
To download and access the data, you will need to create an Earthdata login. You can register for an account at urs.earthdata.nasa.gov. Once you have an account, run the following cell and enter your NASA Earthdata login credentials.
earthaccess.login(persist=True)
<earthaccess.auth.Auth at 0x7f2f7861ed20>
Search data programmatically¶
To search for PACE data programmatically, specify the bounding box and time range of interest. Set count=-1
to return all results or set count=10
to return the first 10 results.
results, gdf = hypercoast.search_pace(
bounding_box=(-83, 25, -81, 28),
temporal=("2024-07-30", "2024-08-15"),
short_name="PACE_OCI_L2_AOP_NRT",
count=10,
return_gdf=True,
)
Plot the footprints of the returned datasets on a map.
gdf.explore()
Download the first dataset from the search results. Note that the download may take some time.
hypercoast.download_pace(results[:1], out_dir="data")
Search data interactively¶
To search for PACE data interactively, pan and zoom to the area of interest. Specify the time range of interest from the search dialog, then click on the Search button.
m = hypercoast.Map(center=[30.0262, -90.1345], zoom=8)
m.search_pace(default_dataset="PACE_OCI_L2_AOP_NRT")
m
By default, the search_pace
method searches for the PACE_OCI_L2_AOP_NRT
dataset, but you can specify the dataset name by setting the default_dataset
parameter, such as PACE_OCI_L2_BGC_NRT
. For more information about the available datasets, see the PACE Data Products page.
Uncomment the following cell to display the GeoDataFrame of the search results.
# m._NASA_DATA_GDF.head()
Similarly, you can download the first dataset from the search results by uncommenting the following cell.
# hypercoast.download_pace(results[:1], out_dir="data")
results = hypercoast.search_pace(
bounding_box=(-83, 25, -81, 28),
temporal=("2024-07-30", "2024-08-15"),
short_name="PACE_OCI_L2_AOP_NRT",
count=1,
)
hypercoast.download_pace(results[:1], out_dir="data")
Let's make a scatter plot of the pixel locations so we can see the irregular spacing.
filepath = "data/PACE_OCI.20240730T181157.L2.OC_AOP.V2_0.NRT.nc"
plot = hypercoast.view_pace_pixel_locations(filepath, step=20)
Load the dataset as a xarray.Dataset
object.
dataset = hypercoast.read_pace(filepath)
# dataset
Visualize PACE AOP data¶
Visualize selected bands of the dataset.
hypercoast.viz_pace(dataset, wavelengths=[500, 510, 520, 530], ncols=2)
Add custom projection and administrative boundaries to the map. The default projection is PlateCarree
. You can specify a custom projection by setting the crs
parameter. For more information about the available projections, see the cartopy projection page.
hypercoast.viz_pace(dataset, wavelengths=[500, 510, 520, 530], ncols=2, crs="default")
Plot spectral signatures¶
Plot the spectral signature of a pixel using the extract_pace
function. Set return_plot=True
to return the plot object.
latitude = 29.9307
longitude = -87.9106
hypercoast.extract_pace(dataset, latitude, longitude, return_plot=True)
[<matplotlib.lines.Line2D at 0x7f2f18f4ce30>]
To return the extracted values as an xarray DataArray
, set return_plot=False
.
array = hypercoast.extract_pace(dataset, latitude, longitude, return_plot=False)
# array
To plot the spectral signatures of multiple pixels, you can specify the pixel locations as a list of tuples. All pixels within the specified latitude and longitude range will be extracted.
latitude = (29.49, 29.50)
longitude = (-88.10, -88.00)
hypercoast.filter_pace(dataset, latitude, longitude, return_plot=True)
Visualize a selected band of the dataset interactively use the add_pace
method and speccify the wavelengths
parameter.
m = hypercoast.Map()
m.add_basemap("Hybrid")
wavelengths = [450]
m.add_pace(dataset, wavelengths, colormap="jet", vmin=0, vmax=0.02, layer_name="PACE")
m.add_colormap(cmap="jet", vmin=0, vmax=0.02, label="Reflectance")
m.add("spectral")
m.set_center(-80.7382, 26.5295, zoom=6)
m
Click on the map to display the spectral signature of the selected pixel.
Convert the spectral data of the selected pixels to a DataFrame.
df = m.spectral_to_df()
df.head()
Convert the spectral data of the selected pixels to a GeoDataFrame.
gdf = m.spectral_to_gdf()
gdf.head()
Convert the spectral data of the selected pixels to a CSV file.
m.spectral_to_csv("data/spectral.csv")
Multi-band visualization¶
Select three spectral bands to visualize as an RGB image.
m = hypercoast.Map()
m.add_basemap("Hybrid")
wavelengths = [450, 550, 650]
m.add_pace(
dataset, wavelengths, indexes=[3, 2, 1], vmin=0, vmax=0.02, layer_name="PACE"
)
m.add("spectral")
m.set_center(-80.7382, 26.5295, zoom=6)
m
Change band combination¶
Click on the gear icon on the toolbar to change the band combination.
PACE BGC data¶
PACE has a variety of data products, including biogeochemical properties. For more information about the available datasets, see the PACE Data Products page.
The PACE Biogeochemical (BGC) data products include chlorophyll-a concentration, particulate organic carbon, and particulate inorganic carbon.
Download PACE BGC data¶
Let's download a sample PACE BGC dataset for the demonstration.
results, gdf = hypercoast.search_nasa_data(
short_name="PACE_OCI_L2_BGC_NRT",
bbox=(-90.5642, 29.9749, -89.7143, 30.42),
temporal=("2024-07-30", "2024-08-15"),
count=1,
return_gdf=True,
)
hypercoast.download_nasa_data(results, out_dir="data")
Load the downloaded dataset as an xarray.Dataset
:
filepath = "data/PACE_OCI.20240730T181157.L2.OC_BGC.V2_0.NRT.nc"
dataset = hypercoast.read_pace_bgc(filepath)
Let's inspect the data variables contained in the dataset:
dataset.variables
Frozen({'chlor_a': <xarray.Variable (latitude: 1710, longitude: 1272)> Size: 9MB [2175120 values with dtype=float32] Attributes: long_name: Chlorophyll Concentration, OCI Algorithm units: mg m^-3 standard_name: mass_concentration_of_chlorophyll_in_sea_water valid_min: 0.001 valid_max: 100.0 reference: Hu, C., Lee Z., and Franz, B.A. (2012). Chlorophyll-a alg..., 'carbon_phyto': <xarray.Variable (latitude: 1710, longitude: 1272)> Size: 9MB [2175120 values with dtype=float32] Attributes: long_name: Phytoplankton Carbon units: mg m^-3 valid_min: 0.0 valid_max: 1000.0 reference: Graff, J.R., Westberry, T.K., Milligan, A.J., Brown, M.B., Da..., 'poc': <xarray.Variable (latitude: 1710, longitude: 1272)> Size: 9MB [2175120 values with dtype=float32] Attributes: long_name: Particulate Organic Carbon, D. Stramski, 2022 (hybrid version) units: mg m^-3 valid_min: -32000 valid_max: -22000 reference: Stramski, D., et al. "Ocean color algorithms to estimate the ..., 'chlor_a_unc': <xarray.Variable (latitude: 1710, longitude: 1272)> Size: 9MB [2175120 values with dtype=float32] Attributes: long_name: Uncertainty in chlorophyll a concentration units: mg m^-3 standard_name: chlorophyll_concentration_in_sea_water standard_error valid_min: 0.001 valid_max: 100.0, 'carbon_phyto_unc': <xarray.Variable (latitude: 1710, longitude: 1272)> Size: 9MB [2175120 values with dtype=float32] Attributes: long_name: Phytoplankton Carbon standard uncertainty units: mg m^-3 valid_min: 0.0 valid_max: 1000.0 reference: Graff, J.R., Westberry, T.K., Milligan, A.J., Brown, M.B., Da..., 'l2_flags': <xarray.Variable (latitude: 1710, longitude: 1272)> Size: 9MB [2175120 values with dtype=int32] Attributes: long_name: Level-2 Processing Flags valid_min: -2147483648 valid_max: 2147483647 flag_masks: [ 1 2 4 8 ... flag_meanings: ATMFAIL LAND PRODWARN HIGLINT HILT HISATZEN COASTZ SPARE ..., 'longitude': <xarray.Variable (latitude: 1710, longitude: 1272)> Size: 9MB [2175120 values with dtype=float32] Attributes: long_name: Longitude units: degrees_east standard_name: longitude valid_min: -180.0 valid_max: 180.0, 'latitude': <xarray.Variable (latitude: 1710, longitude: 1272)> Size: 9MB [2175120 values with dtype=float32] Attributes: long_name: Latitude units: degrees_north standard_name: latitude valid_min: -90.0 valid_max: 90.0})
We can see that the dataset contains the following variables:
Visualize PACE BGC data¶
Since the datasets are not gridded, we need to transform them into gridded data to visualize them. We can use the grid_pace_bgc
function to transform the dataset into a gridded format.
First, transform the chlor_a
variable into a gridded format:
chlor_a = hypercoast.grid_pace_bgc(dataset, variable="chlor_a", method="linear")
Plot the gridded Chlorophyll Concentration data:
chlor_a.plot(vmin=0, vmax=20, cmap="jet", size=6)
<matplotlib.collections.QuadMesh at 0x7f2f0ab35430>
Plot the gridded Phytoplankton Carbon data:
carbon_phyto = hypercoast.grid_pace_bgc(
dataset, variable="carbon_phyto", method="linear"
)
carbon_phyto.plot(vmin=0, vmax=120, cmap="jet", size=6)
<matplotlib.collections.QuadMesh at 0x7f2f0a9d28d0>
Plot the gridded Particulate Organic Carbon data:
poc = hypercoast.grid_pace_bgc(dataset, variable="poc", method="linear")
poc.plot(vmin=0, vmax=1000, cmap="jet")
<matplotlib.collections.QuadMesh at 0x7f2f0a8b0590>
Plot the gridded BGC data on an interactive map.
m = hypercoast.Map()
m.add_basemap("Hybrid")
m.add_raster(chlor_a, layer_name="Chlorophyll-a", colormap="jet", vmin=0, vmax=20)
m.add_raster(
carbon_phyto, layer_name="Phytoplankton Carbon", colormap="plasma", vmin=0, vmax=120
)
m.add_raster(
poc, layer_name="Particulate Organic Carbon", colormap="coolwarm", vmin=0, vmax=1000
)
m.add_layer_manager()
m.add_colormap(cmap="jet", vmin=0, vmax=20, label="Chlorophyll-a (mg/m3)")
m.add_colormap(cmap="plasma", vmin=0, vmax=120, label="Phytoplankton Carbon (mg/m3)")
m.add_colormap(
cmap="coolwarm", vmin=0, vmax=1000, label="Particulate Organic Carbon (mg/m3)"
)
m.set_center(-80.7382, 26.5295, zoom=6)
m
PACE Chlorophyll Level 3 data¶
PACE Level 3 data products are gridded data products that are derived from Level 2 data. Once of the most common Level 3 data products is the Chlorophyll-Carotenoid Index (CCI) dataset.
Let's download some daily PACE Chlorophyll Level 3 data for the demonstration.
temporal = ("2024-07-30", "2024-08-15")
results = hypercoast.search_pace_chla(temporal=temporal)
hypercoast.download_nasa_data(results, "chla")
The downloaded datasets can be found in the chla
directory, which contains 17 daily files of CCI data in the netCDF format. The date range of the data is from 2024-07-30 to 2024-08-15.
files = "chla/*nc"
Load all the data files in the chla
directory as an xarray DataArray
array = hypercoast.read_pace_chla(files)
# array
Select a date and visualize the chlorophyll-a concentration data with Matplotlib.
hypercoast.viz_pace_chla(array, date="2024-07-30", cmap="jet", size=6)
<matplotlib.collections.QuadMesh at 0x7f2f0a768b00>
If the date is not specified, the data are averaged over the entire time range.
hypercoast.viz_pace_chla(array, cmap="jet", size=6)
<matplotlib.collections.QuadMesh at 0x7f2f0a4b9ee0>
To visualize the data interactively, we can select either a single date or aggregate the data over a time range.
First, let's select a single date from the data array:
single_array = array.sel(date="2024-07-30")
# single_array
Convert the data array to an image that can be displayed on an interactive map.
single_image = hypercoast.pace_chla_to_image(single_array)
Create an interactive map and display the image on the map.
m = hypercoast.Map(center=[40, -100], zoom=4)
m.add_basemap("Hybrid")
m.add_raster(
single_image,
cmap="jet",
vmin=-1,
vmax=2,
layer_name="Chlorophyll a",
zoom_to_layer=False,
)
label = "Chlorophyll Concentration [lg(lg(mg m^-3))]"
m.add_colormap(cmap="jet", vmin=-1, vmax=2, label=label)
m
The daily image does not have a global coverage. To visualize the data globally, we can aggregate the data over a time range.
mean_array = array.mean(dim="date")
Convert the aggregated data array to an image that can be displayed on an interactive map.
image = hypercoast.pace_chla_to_image(mean_array)
Create an interactive map and display the image on the map.
m = hypercoast.Map(center=[40, -100], zoom=4)
m.add_basemap("Hybrid")
m.add_raster(
image, cmap="jet", vmin=-1, vmax=2, layer_name="Chlorophyll a", zoom_to_layer=False
)
label = "Chlorophyll Concentration [lg(lg(mg m^-3))]"
m.add_colormap(cmap="jet", vmin=-1, vmax=2, label=label)
m
Hypoxia Cruise data¶
The Hypoxia Cruise collected water quality data in the Gulf of Mexico from July 21 to August 2, 2024. In this section, we will visualize the cruise sampling locations.
First, let's download an Excel file containing the cruise sampling locations.
url = "https://github.com/opengeos/datasets/releases/download/hypercoast/Hypoxia_Data_Sheet.xlsx"
xls_path = "data/Hypoxia_Data_Sheet.xlsx"
hypercoast.download_file(url, xls_path, overwrite=True)
'/home/runner/work/HyperCoast/HyperCoast/docs/workshops/data/Hypoxia_Data_Sheet.xlsx'
df = pd.read_excel(xls_path)
df.head()
Station | Station.1 | Time | Date | Lon | Lat | Depth (m) | Secchi (m) | Salinity | Water Temp | ... | Absorption | CDOM | LISST | Nano | Surface \npH | Sufface \nO2 | Bottom \nO2 | FL-ECO | FL-CDOM | Notes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | River stations | NaN | NaN | NaT | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | R1 | St1 | 09:39:00 | 2024-07-21 | -89.45114 | 28.89887 | NaN | 2 | 28.71 | 30.74161 | ... | 125ml | yes | yes | no | NaN | NaN | NaN | NaN | NaN | River Mouth |
2 | R2 | St2 | 09:47:00 | 2024-07-21 | -89.45306 | 28.90001 | NaN | 2 | 25.91325 | 30.91475 | ... | 150ml | yes | yes | yes | NaN | NaN | NaN | NaN | NaN | River plume, seaside |
3 | R3 | St3 | 09:59:00 | 2024-07-21 | -89.43833 | 28.89487 | NaN | 0.75 | 24.44862 | 30.59565 | ... | 100ml | yes | yes | yes | NaN | NaN | NaN | NaN | NaN | NaN |
4 | R4 | St4 | 10:13:00 | 2024-07-21 | -89.43162 | 28.90630 | NaN | 0.5 | 8.34838 | 30.13989 | ... | 100ml | yes | yes | yes | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 22 columns
Filter the data to select only the sampling locations with latitude and longitude coordinates.
df_filtered = df.dropna(subset=["Lon", "Lat"]).reset_index(drop=True)
df_filtered.head()
Station | Station.1 | Time | Date | Lon | Lat | Depth (m) | Secchi (m) | Salinity | Water Temp | ... | Absorption | CDOM | LISST | Nano | Surface \npH | Sufface \nO2 | Bottom \nO2 | FL-ECO | FL-CDOM | Notes | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | R1 | St1 | 09:39:00 | 2024-07-21 | -89.45114 | 28.89887 | NaN | 2 | 28.71 | 30.74161 | ... | 125ml | yes | yes | no | NaN | NaN | NaN | NaN | NaN | River Mouth |
1 | R2 | St2 | 09:47:00 | 2024-07-21 | -89.45306 | 28.90001 | NaN | 2 | 25.91325 | 30.91475 | ... | 150ml | yes | yes | yes | NaN | NaN | NaN | NaN | NaN | River plume, seaside |
2 | R3 | St3 | 09:59:00 | 2024-07-21 | -89.43833 | 28.89487 | NaN | 0.75 | 24.44862 | 30.59565 | ... | 100ml | yes | yes | yes | NaN | NaN | NaN | NaN | NaN | NaN |
3 | R4 | St4 | 10:13:00 | 2024-07-21 | -89.43162 | 28.90630 | NaN | 0.5 | 8.34838 | 30.13989 | ... | 100ml | yes | yes | yes | NaN | NaN | NaN | NaN | NaN | NaN |
4 | R5 | St5 | 10:58:00 | 2024-07-21 | -89.37324 | 28.98095 | NaN | 0.5 | 2.4625 | 30.3054 | ... | 100ml | yes | yes | yes | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 22 columns
Download the KML file containing the cruise path.
url = (
"https://github.com/opengeos/datasets/releases/download/hypercoast/Hypoxia_Path.kml"
)
kml_path = "data/Hypoxia_Path.kml"
hypercoast.download_file(url, kml_path)
'/home/runner/work/HyperCoast/HyperCoast/docs/workshops/data/Hypoxia_Path.kml'
We will use the PACE AOP dataset acquired on July 30, 2024, to visualize the cruise sampling locations. The dataset should have been downloaded in the previous section.
filepath = "data/PACE_OCI.20240730T181157.L2.OC_AOP.V2_0.NRT.nc"
Read the PACE AOP dataset as an xarray Dataset.
dataset = hypercoast.read_pace(filepath)
# dataset
Visualize the cruise sampling locations and PACE data on the map.
m = hypercoast.Map()
m.add_basemap("Hybrid")
wavelengths = [450, 550, 650]
m.add_pace(
dataset, wavelengths, indexes=[3, 2, 1], vmin=0, vmax=0.02, layer_name="PACE"
)
m.add("spectral")
style = {"weight": 2, "color": "red"}
m.add_kml(kml_path, style=style, layer_name="Hypoxia Path", info_mode=None)
m.add_points_from_xy(
df_filtered,
x="Lon",
y="Lat",
max_cluster_radius=50,
layer_name="Hypoxia Data Points",
)
m.set_center(-91.46118, 28.89758, zoom=8)
m
Visualize in-situ data¶
This section demonstrates how to visualize in-situ data on the map. First, let's download a hypothetical in-situ dataset.
url = "https://github.com/opengeos/datasets/releases/download/hypercoast/pace_sample_points.csv"
data = pd.read_csv(url)
data.head()
band | wavelength | (30.1926 -90.1318) | (30.1594 -90.2856) | (29.3295 -92.3071) | (28.8783 -90.4559) | (30.5481 -87.9840) | (29.5305 -85.0671) | (28.8254 -85.6659) | (29.4587 -83.8477) | (26.8878 -87.7643) | (24.6570 -86.4954) | (26.8045 -82.4854) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 339.0 | NaN | NaN | 0.000100 | NaN | NaN | NaN | NaN | NaN | 0.001718 | 0.000710 | NaN |
1 | 1 | 341.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.001554 | 0.000846 | NaN |
2 | 2 | 344.0 | NaN | NaN | 0.001816 | NaN | NaN | NaN | NaN | NaN | 0.004047 | 0.003598 | NaN |
3 | 3 | 346.0 | 0.000346 | NaN | 0.002393 | NaN | NaN | NaN | NaN | NaN | 0.005045 | 0.004646 | NaN |
4 | 4 | 348.0 | NaN | NaN | 0.001578 | NaN | NaN | NaN | NaN | NaN | 0.004258 | 0.003984 | NaN |
Again, we will use the PACE AOP dataset acquired on July 30, 2024, to visualize the in-situ data. The dataset should have been downloaded in the previous section.
filepath = "data/PACE_OCI.20240730T181157.L2.OC_AOP.V2_0.NRT.nc"
Read the PACE dataset as an xarray Dataset.
dataset = hypercoast.read_pace(filepath)
Visualize the in-situ data on the map.
m = hypercoast.Map(center=[27.235094, -87.791748], zoom=6)
m.add_basemap("Hybrid")
wavelengths = [450]
m.add_pace(dataset, wavelengths, colormap="jet", vmin=0, vmax=0.02, layer_name="PACE")
m.add_colormap(cmap="jet", vmin=0, vmax=0.02, label="Reflectance")
m.add("spectral")
m.add_field_data(
data,
x_col="wavelength",
y_col_prefix="(",
x_label="Wavelength (nm)",
y_label="Reflectance",
use_marker_cluster=True,
)
m.set_center(-87.791748, 27.235094, zoom=6)
m
Click on any marker to display the in-situ data.
Analyze PACE data¶
To anyalyze the PACE data with algorithms, such as K-means clustering, principal component analysis (PCA), or Spectral Angle Mapper (SAM), follow the notebook at https://hypercoast.org/examples/pace_cyano.