3. Watching a solar eclipse using an OOI moored echosounder#

Jupyter notebook accompanying the manuscript:

Echopype: A Python library for interoperable and scalable processing of ocean sonar data for biological information
Authors: Wu-Jung Lee, Emilio Mayorga, Landung Setiawan, Kavin Nguyen, Imran Majeed, Valentina Staneva

3.1. Introduction#

3.1.1. Goals#

  • Illustrate a common workflow for echosounder data conversion, calibration and use. This workflow leverages the standardization applied by echopype. and the power, ease of use and familiarity of libraries in the scientific Python ecosystem.

  • Demonstrate the ease to interoperate echosounder data with those from a different instrument in a single computing environment. Without echopype, additional wrangling across more than one software systems is needed to achieve the same visualization and comparison.

3.1.2. Description#

This notebook uses EK60 echosounder data from the U.S. Ocean Observatories Initiative (OOI) to illustrate a common workflow for data conversion, combination, calibration and analysis using echopype, as well as the data interoperability it enables. Without echopype, additional wrangling across more than one software systems is needed to achieve the same visualization and comparison.

We will use data from the OOI Oregon Offshore Cabled Shallow Profiler Mooring collected on August 20-21, 2017. This was the day before and of a solar eclipse, during which the reduced sunlight affected the regular diel vertical migration (DVM) patterns of marine life. This change was directly observed using the upward-looking echosounder mounted on this mooring platform that happened to be within the totality zone. The effect of the solar eclipse was clearly seen by aligning and comparing the echosounder observations with solar radiation data collected by the Bulk Meteorology Instrument Package located on the nearby Coastal Endurance Oregon Offshore Surface Mooring, also maintained by the OOI.

The data used are 19 .raw files with a total volume of approximately 1 GB. With echopype functionality, the raw data files hosted on the OOI Raw Data Archive (an HTTP server) are directly parsed and organized into a standardized representation following in the SONAR-netCDF4 v1.0 convention, and stored to the cloud-optimized Zarr format. The individual converted files are later combined into a single entity that can be easily explored and manipulated.

3.1.3. Outline#

  1. Establish connection with the OOI Raw Data Archive and generate list of target EK60 .raw files

  2. Process the archived raw files with echopype: convert and combine into a single quantity (an EchoData object) in a standardized format.

  3. Obtain solar radiation data from an OOI Thredds server.

  4. Plot the echosounder and solar radiation data together to visualize the zooplankton response to a solar eclipse.

3.1.4. Running the notebook#

This notebook can be run with a conda environment created using the conda environment file https://github.com/OSOceanAcoustics/echopype-examples/blob/main/binder/environment.yml. The notebook creates a directory ./exports/ooifiles and save all generated Zarr and netCDF files there.

3.1.5. Warning#

The compute_MVBS step in this notebook is not efficient for lazy-loaded data with echopype version 0.6.3. We plan to address this issue soon.

3.1.6. Note#

We encourage importing echopype as ep for consistency.

from pathlib import Path
import itertools as it
import datetime as dt
from dateutil import parser as dtparser

import fsspec
import xarray as xr
import matplotlib.pyplot as plt
import hvplot.xarray

import echopype as ep

import warnings
warnings.simplefilter("ignore", category=DeprecationWarning)

3.2. Establish connection with the OOI Raw Data Archive and generate list of target EK60 .raw files#

Access and inspect the publicly accessible OOI Raw Data Archive (an HTTP server) as if it were a local file system. This will be done through the Python fsspec file system and bytes storage interface. We will use fsspec.filesystem.glob (fs.glob) to generate a list of all EK60 .raw data files in the archive then filter on file names for target dates of interest.

fs = fsspec.filesystem('https')
ooi_raw_url = (
    "https://rawdata.oceanobservatories.org/files/"
    "CE04OSPS/PC01B/ZPLSCB102_10.33.10.143/2017/08"
)

Now let’s specify the range of dates we will be pulling data from. Note that the data filenames contain the time information but were recorded at UTC time.

def in_range(raw_file: str, start: dt.datetime, end: dt.datetime) -> bool:
    """Check if file url is in datetime range"""
    file_name = Path(raw_file).name
    file_datetime = dtparser.parse(file_name, fuzzy=True)
    return file_datetime >= start and file_datetime <= end
start_datetime = dt.datetime(2017, 8, 21, 7, 0)
end_datetime = dt.datetime(2017, 8, 22, 7, 0)

On the OOI Raw Data Archive, the monthly folder is further split to daily folders, so we can simply grab data from the desired days.

desired_day_urls = [f"{ooi_raw_url}/{day}" for day in range(start_datetime.day, end_datetime.day + 1)]
desired_day_urls
['https://rawdata.oceanobservatories.org/files/CE04OSPS/PC01B/ZPLSCB102_10.33.10.143/2017/08/21',
 'https://rawdata.oceanobservatories.org/files/CE04OSPS/PC01B/ZPLSCB102_10.33.10.143/2017/08/22']

Grab all raw files within daily folders by using the filesytem glob, just like the Linux glob.

all_raw_file_urls = it.chain.from_iterable([fs.glob(f"{day_url}/*.raw") for day_url in desired_day_urls])
desired_raw_file_urls = list(filter(
    lambda raw_file: in_range(
        raw_file, 
        start_datetime-dt.timedelta(hours=3),  # 3 hour buffer to select files
        end_datetime+dt.timedelta(hours=3)
    ), 
    all_raw_file_urls
))


print(f"There are {len(desired_raw_file_urls)} raw files within the specified datetime range.")
There are 19 raw files within the specified datetime range.

3.3. Process the archived raw files with echopype#

3.3.1. Examine the workflow by processing just one file#

Let’s first test the echopype workflow by converting and processing 1 file from the above list.

We will use ep.open_raw to directly read in a raw data file from the OOI HTTP server.

The type of sonar needs to be specified as an input argument. The echosounders on the OOI Regional Cabled Array are Simrad EK60 echosounder. All other uncabled echosounders are the Acoustic Zooplankton and Fisher Profiler (AZFP) manufacturered by ASL Environmental Sciences. Echopype supports both of these and other instruments (see echopype documentation for detail).

3.3.2. Converting from raw data files to a standardized data format#

Below we already know the path to the 1 file on the http server:

echodata = ep.open_raw(raw_file=desired_raw_file_urls[0], sonar_model="ek60")

Here echopype read, parse, and convert content of the raw file into memory, and gives you a nice representation of the converted file below as a Python EchoData object.

echodata
EchoData: standardized raw data from Internal Memory
    • <xarray.Dataset>
      Dimensions:  ()
      Data variables:
          *empty*
      Attributes:
          conventions:                 CF-1.7, SONAR-netCDF4-1.0, ACDD-1.3
          keywords:                    EK60
          sonar_convention_authority:  ICES
          sonar_convention_name:       SONAR-netCDF4
          sonar_convention_version:    1.0
          summary:                     
          title:                       
          date_created:                2017-08-21T04:57:17Z
          survey_name:                 

      • <xarray.Dataset>
        Dimensions:                 (channel: 3, time1: 5923)
        Coordinates:
          * channel                 (channel) <U39 'GPT  38 kHz 00907208dd13 5-1 OOI....
          * time1                   (time1) datetime64[ns] 2017-08-21T04:57:17.328999...
        Data variables:
            absorption_indicative   (channel, time1) float64 0.009785 ... 0.05269
            sound_speed_indicative  (channel, time1) float64 1.494e+03 ... 1.494e+03
            frequency_nominal       (channel) float64 3.8e+04 1.2e+05 2e+05

      • <xarray.Dataset>
        Dimensions:              (time1: 1, channel: 3, time2: 5923, time3: 5923)
        Coordinates:
          * time1                (time1) datetime64[ns] NaT
          * channel              (channel) <U39 'GPT  38 kHz 00907208dd13 5-1 OOI.38|...
          * time2                (time2) datetime64[ns] 2017-08-21T04:57:17.328999936...
          * time3                (time3) datetime64[ns] 2017-08-21T04:57:17.328999936...
        Data variables: (12/20)
            latitude             (time1) float64 nan
            longitude            (time1) float64 nan
            sentence_type        (time1) float64 nan
            pitch                (channel, time2) float64 0.0 0.0 0.0 ... 0.0 0.0 0.0
            roll                 (channel, time2) float64 0.0 0.0 0.0 ... 0.0 0.0 0.0
            vertical_offset      (channel, time2) float64 0.0 0.0 0.0 ... 0.0 0.0 0.0
            ...                   ...
            MRU_rotation_y       (channel) float64 nan nan nan
            MRU_rotation_z       (channel) float64 nan nan nan
            position_offset_x    (channel) float64 nan nan nan
            position_offset_y    (channel) float64 nan nan nan
            position_offset_z    (channel) float64 nan nan nan
            frequency_nominal    (channel) float64 3.8e+04 1.2e+05 2e+05

        • <xarray.Dataset>
          Dimensions:        (time1: 1)
          Coordinates:
            * time1          (time1) datetime64[ns] 2017-08-21T04:57:17.328999936
          Data variables:
              NMEA_datagram  (time1) <U22 '$SDVLW,0.000,N,0.000,N'
          Attributes:
              description:  All NMEA sensor datagrams

      • <xarray.Dataset>
        Dimensions:           (filenames: 1)
        Coordinates:
          * filenames         (filenames) int64 0
        Data variables:
            source_filenames  (filenames) <U119 'https://rawdata.oceanobservatories.o...
        Attributes:
            conversion_software_name:     echopype
            conversion_software_version:  0.6.3
            conversion_time:              2022-10-19T01:55:19Z
            duplicate_ping_times:         0

      • <xarray.Dataset>
        Dimensions:           (beam_group: 1)
        Coordinates:
          * beam_group        (beam_group) <U11 'Beam_group1'
        Data variables:
            beam_group_descr  (beam_group) <U131 'contains backscatter power (uncalib...
        Attributes:
            sonar_manufacturer:      Simrad
            sonar_model:             EK60
            sonar_serial_number:     
            sonar_software_name:     ER60
            sonar_software_version:  2.4.3
            sonar_type:              echosounder

        • <xarray.Dataset>
          Dimensions:                        (channel: 3, ping_time: 5923, beam: 1,
                                              range_sample: 1072)
          Coordinates:
            * channel                        (channel) <U39 'GPT  38 kHz 00907208dd13 5...
            * ping_time                      (ping_time) datetime64[ns] 2017-08-21T04:5...
            * range_sample                   (range_sample) int64 0 1 2 ... 1069 1070 1071
            * beam                           (beam) <U1 '1'
          Data variables: (12/25)
              frequency_nominal              (channel) float64 3.8e+04 1.2e+05 2e+05
              beam_type                      (channel, ping_time) int64 0 0 0 0 ... 0 0 0
              beamwidth_twoway_alongship     (channel, ping_time, beam) float64 7.1 ......
              beamwidth_twoway_athwartship   (channel, ping_time, beam) float64 7.1 ......
              beam_direction_x               (channel, ping_time, beam) float64 0.0 ......
              beam_direction_y               (channel, ping_time, beam) float64 0.0 ......
              ...                             ...
              count                          (channel, ping_time) float64 1.072e+03 ......
              offset                         (channel, ping_time) float64 0.0 0.0 ... 0.0
              transmit_mode                  (channel, ping_time) float64 0.0 0.0 ... 0.0
              backscatter_r                  (channel, ping_time, range_sample, beam) float32 ...
              angle_athwartship              (channel, ping_time, range_sample, beam) float64 ...
              angle_alongship                (channel, ping_time, range_sample, beam) float64 ...
          Attributes:
              beam_mode:              vertical
              conversion_equation_t:  type_3

      • <xarray.Dataset>
        Dimensions:            (channel: 3, pulse_length_bin: 5)
        Coordinates:
          * channel            (channel) <U39 'GPT  38 kHz 00907208dd13 5-1 OOI.38|20...
          * pulse_length_bin   (pulse_length_bin) int64 0 1 2 3 4
        Data variables:
            frequency_nominal  (channel) float64 3.8e+04 1.2e+05 2e+05
            sa_correction      (channel, pulse_length_bin) float64 0.0 0.0 ... 0.0 0.0
            gain_correction    (channel, pulse_length_bin) float64 24.0 26.0 ... 25.0
            pulse_length       (channel, pulse_length_bin) float64 0.000256 ... 0.001024

The EchoData object can be saved to either the netCDF4 or zarr formats through to_netcdf or to_zarr methods.

# Create directories for files genereated in this notebook.
base_dpath = Path('./exports')
base_dpath.mkdir(exist_ok=True)

output_dpath = Path(base_dpath / 'ooimooring_onefiletest')
output_dpath.mkdir(exist_ok=True)
# Save to netCDF format
echodata.to_netcdf(save_path=output_dpath, overwrite=True)
# Save to zarr format
echodata.to_zarr(save_path=output_dpath, overwrite=True)

3.3.3. Basic echo processing#

At present echopype supports basic processing funcionalities including calibration (from raw instrument data records to volume backscattering strength, \(S_V\)), denoising, and computing mean volume backscattering strength, \(\overline{S_V}\) or \(\text{MVBS}\). The Echodata object can be passed into various calibrate and preprocessing functions without having to write out any intermediate files.

Here we demonstrate calibration to obtain \(S_V\). For EK60 data, by default the function uses environmental (sound speed and absorption) and calibration parameters stored in the data file. Users can optionally specify other parameter choices.

# Compute volume backscattering strength (Sv) from raw data
ds_Sv = ep.calibrate.compute_Sv(echodata)

The computed Sv is stored with other variables used in the calibration operation.

ds_Sv
<xarray.Dataset>
Dimensions:                (channel: 3, ping_time: 5923, range_sample: 1072,
                            filenames: 1, time3: 5923)
Coordinates:
  * channel                (channel) <U39 'GPT  38 kHz 00907208dd13 5-1 OOI.3...
  * ping_time              (ping_time) datetime64[ns] 2017-08-21T04:57:17.328...
  * range_sample           (range_sample) int64 0 1 2 3 ... 1068 1069 1070 1071
  * filenames              (filenames) int64 0
  * time3                  (time3) datetime64[ns] 2017-08-21T04:57:17.3289999...
Data variables:
    Sv                     (channel, ping_time, range_sample) float64 3.839 ....
    echo_range             (channel, ping_time, range_sample) float64 0.0 ......
    frequency_nominal      (channel) float64 3.8e+04 1.2e+05 2e+05
    sound_speed            (channel, ping_time) float64 1.494e+03 ... 1.494e+03
    sound_absorption       (channel, ping_time) float64 0.009785 ... 0.05269
    sa_correction          (ping_time, channel) float64 0.0 0.0 0.0 ... 0.0 0.0
    gain_correction        (ping_time, channel) float64 26.5 25.0 ... 25.0 25.0
    equivalent_beam_angle  (channel, ping_time) float64 -20.6 -20.6 ... -20.7
    source_filenames       (filenames) <U119 'https://rawdata.oceanobservator...
    water_level            (channel, time3) float64 0.0 0.0 0.0 ... 0.0 0.0 0.0
Attributes:
    processing_software_name:     echopype
    processing_software_version:  0.6.3
    processing_time:              2022-10-19T01:55:25Z
    processing_function:          calibrate.compute_Sv

3.3.4. Quickly visualize the result#

The default xarray visualization functions are useful in getting a quick sense of the data.

First replace the channel dimension and coordinate with the frequency_nominal variable containing actual frequency values. Note that this step is possible only because there are no duplicated frequencies present.

ds_Sv = ep.consolidate.swap_dims_channel_frequency(ds_Sv)
ds_Sv.Sv.sel(frequency_nominal=200000).plot.pcolormesh(
    x='ping_time', cmap = 'jet', vmin=-80, vmax=-30
);
_images/ms_OOI_EK60_mooringtimeseries_34_0.png

Note that the vertical axis is range_sample. This is the bin (or sample) number as recorded in the data. A separate data variable in ds_Sv contains the physical range (echo_range) from the transducer in meters. echo_range has the same dimension as Sv and may not be uniform across all frequency channels or pings, depending on the echosounder setting during data collection.

3.3.5. Convert multiple files and combine into a single EchoData object#

Now that we verified that echopype does work for a single file, let’s proceed to process all sonar data from August 20-21, 2017.

First, convert all desired files from the OOI HTTP server to a local directory ./exports/ooimooring_allfiles.

# Create a directory for all files
output_dpath = Path(base_dpath / 'ooimooring_allfiles')
output_dpath.mkdir(exist_ok=True)
%%time
for raw_file_url in desired_raw_file_urls:
    # Read and convert, resulting in echodata object
    ed = ep.open_raw(raw_file=raw_file_url, sonar_model='ek60', offload_to_zarr=True)
    ed.to_zarr(save_path=output_dpath, overwrite=True)
CPU times: user 1min 18s, sys: 22.2 s, total: 1min 40s
Wall time: 1min 11s

Then, assemble a list of EchoData object from the converted files. Note that by default the files are lazy-loaded and only metadata are read into memory, until more operations are executed.

# Use fsspec locally to assemble a list of converted files
fs_local = fsspec.filesystem('file')
ed_list = []
for converted_file in fs_local.glob(output_dpath / f"*.zarr"):
    ed_list.append(ep.open_converted(converted_file))

Combine all the opened files to a single EchoData object linked to a (lazy-loaded) Zarr file on disk.

ed = ep.combine_echodata(ed_list, zarr_path=str(base_dpath / "ed_combined.zarr"), overwrite=True)

3.3.6. Calibrate the combined EchoData and visualize the mean Sv#

The single EchoData object is convenient to use for content inspection and calibration. First, compute Sv.

ds_Sv = ep.calibrate.compute_Sv(ed).compute()

Next, compute the mean Sv (MVBS) with coherent dimensions along physically meaningful echo_range (in meters) and ping_time from the calibrated data. This processed dataset is easy to visualize. The average bin size along ping_time can be specified using the time series offset alias.

Note that we use .compute() to persist the Sv data in memory in the cell above. This is because the current implementation of compute_MVBS is not efficient for lazy-loaded data. This limitation will be changed in a future release.

%%time
ds_MVBS = ep.preprocess.compute_MVBS(
    ds_Sv, 
    range_meter_bin=0.2,  # 0.2 meters
    ping_time_bin='10s'   # 10 seconds
)
CPU times: user 2min 35s, sys: 7.24 s, total: 2min 42s
Wall time: 2min 42s

The resulting MVBS Dataset has a coherent echo_range coordinate across all frequencies.

ds_MVBS
<xarray.Dataset>
Dimensions:            (ping_time: 11017, channel: 3, echo_range: 1023)
Coordinates:
  * ping_time          (ping_time) datetime64[ns] 2017-08-21T04:57:10 ... 201...
  * channel            (channel) <U39 'GPT  38 kHz 00907208dd13 5-1 OOI.38|20...
  * echo_range         (echo_range) float64 0.0 0.2 0.4 ... 204.0 204.2 204.4
Data variables:
    Sv                 (channel, ping_time, echo_range) float64 10.29 ... -53.92
    frequency_nominal  (channel) float64 3.8e+04 1.2e+05 2e+05
Attributes:
    processing_software_name:     echopype
    processing_software_version:  0.6.3
    processing_time:              2022-10-19T02:00:21Z
    processing_function:          preprocess.compute_MVBS

3.3.7. Visualize MVBS interactively using hvPlot#

To visualize, invert the range axis since the echosounder is upward-looking from a platform at approximately 200 m water depth.

ds_MVBS = ds_MVBS.assign_coords(depth=("echo_range", ds_MVBS["echo_range"].values[::-1]))
ds_MVBS = ds_MVBS.swap_dims({'echo_range': 'depth'})  # set depth as data dimension

Then replace the channel dimension and coordinate with the frequency_nominal variable containing actual frequency values. Note that this step is possible only when there are no duplicated frequencies present.

ds_MVBS = ep.consolidate.swap_dims_channel_frequency(ds_MVBS)
ds_MVBS["Sv"].sel(frequency_nominal=200000).hvplot.image(
    x='ping_time', y='depth', 
    color='Sv', rasterize=True, 
    cmap='jet', clim=(-80, -30),
    xlabel='Time (UTC)',
    ylabel='Depth (m)'
).options(width=800, invert_yaxis=True)

Note that the reflection from the sea surface shows up at a location below the depth of 0 m. This is because we have not corrected for the actual depth of the platform on which the echosounder is mounted, and the actual sound speed at the time of data collection (which is related to the calculated range) could also be different from the user-defined sound speed stored in the data file. More accurate platform depth information can be obtained using data from the CTD collocated on the moored platform.

3.4. Obtain solar radiation data from an OOI THREDDS server#

Now we have the sonar data ready, the next step is to pull solar radiation data collected by a nearby surface mooring also maintained by the OOI. The Bulk Meteorology Instrument Package is located on the Coastal Endurance Oregon Offshore Surface Mooring.

Note: an earlier version of this notebook used the same dataset but pulled from the National Data Buoy Center (NDBC). We thank the Rutgers OOI Data Lab for pointing out the direct data source in one of the data nuggets.

metbk_url = (
    "http://thredds.dataexplorer.oceanobservatories.org/thredds/dodsC/ooigoldcopy/public/"
    "CE04OSSM-SBD11-06-METBKA000-recovered_host-metbk_a_dcl_instrument_recovered/"
    "deployment0004_CE04OSSM-SBD11-06-METBKA000-recovered_host-metbk_a_dcl_instrument_recovered_20170421T022518.003000-20171013T154805.602000.nc#fillmismatch"
)
metbk_ds = (
    xr.open_dataset(metbk_url)
    .swap_dims({'obs': 'time'})
    .drop('obs')
    .sel(time=slice(start_datetime, end_datetime))[['shortwave_irradiance']]
)
metbk_ds.time.attrs.update({'long_name': 'Time', 'units': 'UTC'})

metbk_ds
<xarray.Dataset>
Dimensions:               (time: 1441)
Coordinates:
  * time                  (time) datetime64[ns] 2017-08-21T07:00:08.232999936...
Data variables:
    shortwave_irradiance  (time) float32 2.2 2.3 2.2 2.2 2.2 ... 2.4 2.3 2.3 2.4
Attributes: (12/73)
    node:                               SBD11
    comment:                            
    publisher_email:                    
    sourceUrl:                          http://oceanobservatories.org/
    collection_method:                  recovered_host
    stream:                             metbk_a_dcl_instrument_recovered
    ...                                 ...
    geospatial_vertical_positive:       down
    lat:                                44.36555
    lon:                                -124.9407
    DODS.strlen:                        14
    DODS.dimName:                       string14
    DODS_EXTRA.Unlimited_Dimension:     obs

3.5. Combine sonar observation with solar radiation measurements#

We can finally put everything together and figure out the impact of the eclipse-driven reduction in sunlight on marine zooplankton!

metbk_plot = metbk_ds.hvplot.line(
    x='time', y='shortwave_irradiance'
).options(width=800, height=120, logy=True, xlim=(start_datetime, end_datetime))

mvbs_plot = ds_MVBS["Sv"].sel(frequency_nominal=200000).hvplot.image(
    x='ping_time', y='depth', 
    color='Sv', rasterize=True, 
    cmap='jet', clim=(-80, -30),
    xlabel='Time (UTC)',
    ylabel='Depth (m)'
).options(width=800, invert_yaxis=True, xlim=(start_datetime, end_datetime))
(metbk_plot + mvbs_plot).cols(1)

Look how the dip at solar radiation reading matches exactly with the upwarding moving “blip” at UTC 17:21, August 22, 2017 (local time 10:22 AM). During the solar eclipse, the animals were fooled by the temporary mask of the sun and thought it’s getting dark as at dusk!

3.6. Package versions#

import datetime
print(f"echopype: {ep.__version__}, xarray: {xr.__version__}, fsspec: {fsspec.__version__}, "
      f"hvplot: {hvplot.__version__}")

print(f"\n{datetime.datetime.utcnow()} +00:00")
echopype: 0.6.3, xarray: 2022.3.0, fsspec: 2022.8.2, hvplot: 0.8.1

2022-10-19 02:00:27.942388 +00:00