In light of growing interest in data-driven methods for oceanic, atmospheric, and climate sciences, this work focuses on the field of data assimilation and presents the analog data assimilation (AnDA). The proposed framework produces a reconstruction of the system dynamics in a fully data-driven manner where no explicit knowledge of the dynamical model is required. Instead, a representative catalog of trajectories of the system is assumed to be available. Based on this catalog, the analog data assimilation combines the nonparametric sampling of the dynamics using analog forecasting methods with ensemble-based assimilation techniques. This study explores different analog forecasting strategies and derives both ensemble Kalman and particle filtering versions of the proposed analog data assimilation approach. Numerical experiments are examined for two chaotic dynamical systems: the Lorenz-63 and Lorenz-96 systems. The performance of the analog data assimilation is discussed with respect to classical model-driven assimilation. A Matlab toolbox and Python library of the AnDA are provided to help further research building upon the present findings.
Data assimilation leads naturally to a Bayesian formulation in which the posterior probability distribution of the system state, given all the observations on a time window of interest, plays a central conceptual role. The aim of this paper is to use this Bayesian posterior probability distribution as a gold standard against which to evaluate various commonly used data assimilation algorithms. A key aspect of geophysical data assimilation is the high dimensionality and limited predictability of the computational model. This paper examines the two-dimensional Navier-Stokes equations in a periodic geometry, which has these features and yet is tractable for explicit and accurate computation of the posterior distribution by state-of-the-art statistical sampling techniques. The commonly used algorithms that are evaluated, as quantified by the relative error in reproducing moments of the posterior, are four-dimensional variational data assimilation (4DVAR) and a variety of sequential filtering approximations based on three-dimensional variational data assimilation (3DVAR) and on extended and ensemble Kalman filters. The primary conclusions are that, under the assumption of a well-defined posterior probability distribution, (i) with appropriate parameter choices, approximate filters can perform well in reproducing the mean of the desired probability distribution, (ii) they do not perform as well in reproducing the covariance, and (iii) the error is compounded by the need to modify the covariance, in order to induce stability. Thus, filters can be a useful tool in predicting mean behavior but should be viewed with caution as predictors of uncertainty. These conclusions are intrinsic to the algorithms when assumptions underlying them are not valid and will not change if the model complexity is increased.
In the last decade, ensemble-based methods have been widely investigated and applied for data assimilation of flow problems associated with atmospheric physics and petroleum reservoir history matching. This paper focuses entirely on the reservoir history-matching problem. Among the ensemble-based methods, the ensemble Kalman filter (EnKF) is the most popular for history-matching applications. However, the recurrent simulation restarts required in the EnKF sequential data assimilation process may prevent the use of EnKF when the objective is to incorporate the history matching in an integrated geo-modeling workflow. In this situation, the ensemble smoother (ES) is a viable alternative. However, because ES computes a single global update, it may not result in acceptable data matches; therefore, the development of efficient iterative forms of ES is highly desirable. In this paper, we propose to assimilate the same data multiple times with an inflated measurement error covariance matrix in order to improve the results obtained by ES. This method is motivated by the equivalence between single and multiple data assimilation for the linear-Gaussian case. We test the proposed method for three synthetic reservoir history-matching problems. Our results show that the proposed method provides better data matches than those obtained with standard ES and EnKF, with a computational cost comparable with the computational cost of EnKF. ► We introduce a new iterative ensemble smoother for data assimilation (ES-MDA). ► ES-MDA is consistent with the Kalman filter for the linear-Gaussian case. ► ES-MDA resulted in significantly better data matches than EnKF and ES. ► The computational cost of ES-MDA is comparable with EnKF for history matching.
Representation, representativity, representativeness error, forward interpolation error, forward model error, observation‐operator error, aggregation error and sampling error are all terms used to refer to components of observation error in the context of data assimilation. This article is an attempt to consolidate the terminology that has been used in the earth sciences literature and was suggested at a European Space Agency workshop held in Reading in April 2014. We review the state of the art and, through examples, motivate the terminology. In addition to a theoretical framework, examples from application areas of satellite data assimilation, ocean reanalysis and atmospheric chemistry data assimilation are provided. Diagnosing representation‐error statistics as well as their use in state‐of‐the‐art data assimilation systems is discussed within a consistent framework. Representation, representativity, representativeness error, forward interpolation error, forward model error, observation‐operator error, aggregation error and sampling error are all terms used to refer to components of observation error in the context of data assimilation. This article is an attempt to consolidate the terminology that has been used in the earth sciences literature and was suggested at a European Space Agency workshop held in Reading in April 2014. We review the state of the art and, through examples, motivate the terminology.
We show that modifying a Bayesian data assimilation scheme by incorporating kinematically-consistent displacement corrections produces a scheme that is demonstrably better at estimating partially observed state vectors in a setting where feature information is important. While the displacement transformation is generic, here we implement it within an ensemble Kalman Filter framework and demonstrate its effectiveness in tracking stochastically perturbed vortices.
This paper reviews the development of the ensemble Kalman filter (EnKF) for atmospheric data assimilation. Particular attention is devoted to recent advances and current challenges. The distinguishing properties of three well-established variations of the EnKF algorithm are first discussed. Given the limited size of the ensemble and the unavoidable existence of errors whose origin is unknown (i.e., system error), various approaches to localizing the impact of observations and to accounting for these errors have been proposed. However, challenges remain; for example, with regard to localization of multiscale phenomena (both in time and space). For the EnKF in general, but higher-resolution applications in particular, it is desirable to use a short assimilation window. This motivates a focus on approaches for maintaining balance during the EnKF update. Also discussed are limited-area EnKF systems, in particular with regard to the assimilation of radar data and applications to tracking severe storms and tropical cyclones. It seems that relatively less attention has been paid to optimizing EnKF assimilation of satellite radiance observations, the growing volume of which has been instrumental in improving global weather predictions. There is also a tendency at various centers to investigate and implement hybrid systems that take advantage of both the ensemble and the variational data assimilation approaches; this poses additional challenges and it is not clear how it will evolve. It is concluded that, despite more than 10 years of operational experience, there are still many unresolved issues that could benefit from further research.
Although remote sensing data are often plentiful, they do not usually satisfy the users’ needs directly. Data assimilation is required to extract information about geophysical fields of interest from the remote sensing observations and to make the data more accessible to users. Remote sensing may provide, for example, measurements of surface soil moisture, snow water equivalent, snow cover, or land surface (skin) temperature. Data assimilation can then be used to estimate variables that are not directly observed from space but are needed for applications, for instance root zone soil moisture or land surface fluxes. The paper provides a brief introduction to modern data assimilation methods in the Earth sciences, their applications, and pertinent research questions. Our general overview is readily accessible to hydrologic remote sensing scientists. Within the general context of Earth science data assimilation, we point to examples of the assimilation of remotely sensed observations in land surface hydrology.
The Data Assimilation Research Testbed (DART) is an open-source community facility for data assimilation education, research, and development. DART's ensemble data assimilation algorithms, careful software engineering, and diagnostic tools allow atmospheric scientists, oceanographers, hydrologists, chemists, and other geophysicists to build state-of-the-art data assimilation systems with unprecedented ease. For global numerical weather prediction, DART produces ensemble-mean analyses comparable to analyses from major centers while also providing initial conditions for ensemble predictions. In addition, DART supports more novel assimilation applications like parameter estimation, sensitivity analysis, observing system design, and smoothing. Implementing basic systems for large models requires only a few person-weeks; comprehensive systems have been built in a few months. Incorporating new observation types is also straightforward, requiring only a forward operator mapping between a model's state and an observation's expected value. Forward operators for standard, in situ observations and novel types, like GPS radio occultation soundings, are available. DART algorithms scale well on a variety of parallel architectures, allowing large data assimilation problems to be studied. DART also includes many low-order models and an ensemble assimilation tutorial appropriate for undergraduate and graduate instruction.