Ecosystem Characterization in the US North East Coast using Big Data and Machine Learning

Lead Pi: Themistoklis Sapsis · 02/2021 - 01/2023

Project number: 2021-R/RRHC-003

Objectives: Our overall goal is to develop a regional OA model, similar to the regional aragonite models for the US West Coast but with uncertainty quantification that can be used for efficient (re)-allocation of the measurement stations in future operation planning. More specifically our objectives are:
Objective 1: 3D Monitoring & Forecasting of Temperature and Salinity Maps.
Objective 2: 3D Monitoring and Forecasting of DIC, Aragonite and pH.
Objective 3: Diverse, Easily-Accessible Web-Based Databases and Tools.

Methodology: Our machine learning methodology is based on a flexible Bayesian network and a philosophy that leaves no data behind, including opportunistic measurements and drifter data in addition to satellite data as well as data from MWRA stations and buoys. Our approach is informed by physics and is leveraged by new machine learning techniques. The key idea here is to identify and exploit systematically through the multiple and diverse spatio-temporal measurements all existing (in general nonlinear) correlations between state-variables, and obtain implicitly their functional relationships in a multi-dimensional space while at the same time quantify the uncertainty of the data-based predictions.

Rationale: Relying only on a single source of data is insufficient for building monitoring and forecast models in our region due to cost associated with measurements. High-fidelity physics-based modeling alone also, especially for non-conservative variables such dissolved inorganic carbon, pH, pCO2 that are governed by complex transport/chemical/biological processes, will continue to remain elusive for the foreseeable future. We propose to develop a regional OA model that utilizes all the available sources of data, including historical data, and leverages physical laws, possibly in simplified form.