In this paper, we propose a new methodology to automatically find a model that fits on an experimental variogram. Starting with a linear combination of some basic authorized structures (for instance, spherical and exponential), a numerical algorithm is used to compute the parameters, which minimize a distance between the model and the experimental variogram. The initial values are automatically chosen and the algorithm is iterative. After this first step, parameters with a negligible influence are discarded from the model and the more parsimonious model is estimated by using the numerical algorithm again. This process is iterated until no more parameters can be discarded. A procedure based on a profiled cost function is also developed in order to use the numerical algorithm for multivariate data sets (possibly with a lot of variables) modeled in the scope of a linear model of coregionalization. The efficiency of the method is illustrated on several examples (including variogram maps) and on two multivariate cases.

In the plane, consider a compact set with nonempty interior and piecewise C 1 boundary such that any line segment crosses this boundary at only finitely many points. Then the second derivatives of the geometric covariogram of the compact set at the origin are related to geometrical characteristics of the set boundary. Specifically, the mean second derivative is zero when the boundary has no singular point, it is positive when the boundary has finitely many vertices but no cusp, and it is infinite when the boundary has at least one cusp.Similar results hold for a stationary random closed set with nonempty interior and piecewise C 1 boundary such that any line segment crosses this boundary at almost surely finitely many points. For a set with almost surely no singular boundary point, the mean second derivative of its indicator variogram at the origin is zero. In contrast, when there is a positive probability that, in a given bounded domain of the plane, the boundary has finitely many singular points, this mean second derivative is negative, finite if all the singular points are vertices and infinite if some singular points are cusps.

Linear mixing models of compositional data have been developed in various branches of the earth sciences (e.g., geochemistry, petrology, mineralogy, sedimentology) for the purpose of summarizing variation among a series of observations in terms of proportional contributions of (theoretical) end members. Methods of parameter estimation range from relatively straightforward normative partitioning by (nonnegative) least squares, to more sophisticated bilinear inversion techniques. Solving the bilinear mixing problem involves the estimation of both mixing proportions and end-member compositions from the data. Normative partitioning, also known as linear unmixing, thus can be regarded as a special situation of bilinear unmixing with (supposedly) known end members. Previous attempts to model linear mixing processes are reviewed briefly, and a new iterative strategy for solving the bilinear problem is developed. This end-member modeling algorithm is more robust and has better convergence properties than previously proposed numerical schemes. The bilinear unmixing solution is intrinsically nonunique, unless additional constraints on the model parameters are introduced. In situations where no a priori knowledge is available, the concept of an “ optimal ” solution may be used. This concept is based on the trade-off between mathematical and geological feasibility, two seemingly contradictory but equally desirable requirements of the unmixing solution.

Geometry in the simplex has been developed in the last 15 years mainly based on the contributions due to J. Aitchison. The main goal was to develop analytical tools for the statistical analysis of compositional data. Our present aim is to get a further insight into some aspects of this geometry in order to clarify the way for more complex statistical approaches. This is done by way of orthonormal bases, which allow for a straightforward handling of geometric elements in the simplex. The transformation into real coordinates preserves all metric properties and is thus called isometric logratio transformation (ilr). An important result is the decomposition of the simplex, as a vector space, into orthogonal subspaces associated with nonoverlapping subcompositions. This gives the key to join compositions with different parts into a single composition by using a balancing element. The relationship between ilr transformations and the centered-logratio (clr) and additive-logratio (alr) transformations is also studied. Exponential growth or decay of mass is used to illustrate compositional linear processes, parallelism and orthogonality in the simplex.

In many earth sciences applications, the geological objects or structures to be reproduced are curvilinear, e.g., sand channels in a clastic reservoir. Their modeling requires multiple-point statistics involving jointly three or more points at a time, much beyond the traditional two-point variogram statistics. Actual data from the field being modeled, particularly if it is subsurface, are rarely enough to allow inference of such multiple-point statistics. The approach proposed in this paper consists of borrowing the required multiple-point statistics from training images depicting the expected patterns of geological heterogeneities. Several training images can be used, reflecting different scales of variability and styles of heterogeneities. The multiple-point statistics inferred from these training image(s) are exported to the geostatistical numerical model where they are anchored to the actual data, both hard and soft, in a sequential simulation mode. The algorithm and code developed are tested for the simulation of a fluvial hydrocarbon reservoir with meandering channels. The methodology proposed appears to be simple (multiple-point statistics are scanned directly from training images), general (any type of random geometry can be considered), and fast enough to handle large 3D simulation grids.

Amalgamation of parts of a composition has been extensively used as a technique of analysis to achieve reduced dimension, as was discussed during the CoDaWork'03 meeting (Girona, Spain, 2003). It was shown to be a non-linear operation in the simplex that does not preserve distances under perturbation. The discussion motivated the introduction in the present paper of concepts such as group of parts, balance between groups, and sequential binary partition, which are intended to provide tools of compositional data analysis for dimension reduction. Key concepts underlying this development are the established tools of subcomposition, coordinates in an orthogonal basis of the simplex, balancing element and, in general, the Aitchison geometry in the simplex. Main new results are: a method to analyze grouped parts of a compositional vector through the adequate coordinates in an ad hoc orthonormal basis; and the study of balances of groups of parts (inter-group analysis) as an orthogonal projection similar to that used in standard subcompositional analysis (intra-group analysis). A simulated example compares results when testing equal centers of two populations using amalgamated parts and balances; it shows that, in certain circumstances, results from both analysis can disagree.

An entirely new approach to stochastic simulation is proposed through the direct simulation of patterns. Unlike pixel-based (single grid cells) or object-based stochastic simulation, pattern-based simulation simulates by pasting patterns directly onto the simulation grid. A pattern is a multi-pixel configuration identifying a meaningful entity (a puzzle piece) of the underlying spatial continuity. The methodology relies on the use of a training image from which the pattern set (database) is extracted. The use of training images is not new. The concept of a training image is extensively used in simulating Markov random fields or for sequentially simulating structures using multiple-point statistics. Both these approaches rely on extracting statistics from the training image, then reproducing these statistics in multiple stochastic realizations, at the same time conditioning to any available data. The proposed approach does not rely, explicitly, on either a statistical or probabilistic methodology. Instead, a sequential simulation method is proposed that borrows heavily from the pattern recognition literature and simulates by pasting at each visited location along a random path a pattern that is compatible with the available local data and any previously simulated patterns. This paper discusses the various implementation details to accomplish this idea. Several 2D illustrative as well as realistic and complex 3D examples are presented to showcase the versatility of the proposed algorithm.

Multiple-point simulation, as opposed to simulation one point at a time, operates at the pattern level using a priori structural information. To reduce the dimensionality of the space of patterns we propose a multi-point filtersim algorithm that classifies structural patterns using selected filter statistics. The pattern filter statistics are specific linear combinations of pattern pixel values that represent directional mean, gradient, and curvature properties. Simulation proceeds by sampling from pattern classes selected by conditioning data.

We take stock of the present position of compositional data analysis, of what has been achieved in the last 20 years, and then make suggestions as to what may be sensible avenues of future research. We take an uncompromisingly applied mathematical view, that the challenge of solving practical problems should motivate our theoretical research; and that any new theory should be thoroughly investigated to see if it may provide answers to previously abandoned practical considerations.

Flow properties of reservoir rocks can be computed from an accurate depiction of the porosity network in three dimensions available from synchrotron Xray microtomography. In order to relate computed transport properties to the input dataset, the complex pore networks must be described statistically. A porous media description was deemed adequate if a synthetic medium, possessing similar transport properties, could be generated from acquired statistical information. Synthetic media, based upon Berea sandstone extended variogram statistics, were generated with an actual slice from a 3-D microtomographic image as conditioning data. Control of local porosity variation was observed to be important in the stochastic simulation of porous media by the simulated annealing method, as inclusion of this higher order constraint data reproduced natural variations observed in pore-size distributions. Realizations with the traditional variogram as the only target in the objective function did not honor poresize distribution information. Permeability estimates by the lattice Boltzmann method indicated that the proper level of interConnectivity was not achieved during geostatistical modeling with only two point spatial statistics. Connectedness information, readily available from primary drainage capillary pressure data, forced permeability estimates of synthetic media in the direction of the permeability computed for the parent microtomographic image of Berea sandstone.As a result of this study, it was concluded that global spatial correlation statistics, for example, the traditional variogram, must be supplemented with local variability and connectivity information to adequately characterize a three-dimensional property distribution for fluid transport. Extended porosity spatial correlation structure, extracted from standard imaging techniques, and a capillary pressure drainage curve are perhaps sufficient to characterize a system in terms of pore size, connectedness, and permeability. However, more rapid algorithms are needed to introduce porosimetry information as standard practice in stochastic modeling by the simulated annealing method.

A modeling method that takes into account known points on a geological interface and plane orientation data such as stratification or foliation planes is described and tested. The orientations data do not necessarily belong to one of the interfaces but are assumed to sample the main anisotropy of a geological formation as in current geological situations. The problem is to determine the surfaces which pass through the known points on interfaces and which are compatible with the orientation data. The method is based on the interpolation of a scalar field defined in the space the gradient in which is orthogonal to the orientations, given that some points have the same but unknown scalar value (points of the same interface), and that scalar gradient is known on the other points (foliations). The modeled interfaces are represented as isovalues of the interpolated field. Preliminary two-dimensional tests carried-out with different covariance models demonstrate the validity of the method, which is easily transposable in three dimensions.

The statistical analysis of compositional data based on logratios of parts is not suitable when zeros are present in a data set. Nevertheless, if there is interest in using this modeling approach, several strategies have been published in the specialized literature which can be used. In particular, substitution or imputation strategies are available for rounded zeros. In this paper, existing nonparametric imputation methods—both for the additive and the multiplicative approach—are revised and essential properties of the last method are given. For missing values a generalization of the multiplicative approach is proposed.

Consider the assessment of any unknown event A through its conditional probability P(A | B,C) given two data events B, C of different sources. Each event could involve many locations jointly, but the two data events are assumed such that the probabilities P(A | B) and P(A | C) can be evaluated. The challenge is to recombine these two partially conditioned probabilities into a model for P(A | B,C) without having to assume independence of the two data events B and C. The probability P(A | B,C) is then used for estimation or simulation of the event A. In presence of actual data dependence, the combination algorithm provided by the traditional conditional independence hypothesis is shown to be nonrobust leading to various inconsistencies. An alternative based on a permanence of updating ratios is proposed, which guarantees all limit conditions even in presence of complex data interdependence. The resulting recombination formula is extended to any number n of data events and a paradigm is offered to introduce formal data interdependence.

Building of models in the Earth Sciences often requires the solution of an inverse problem: some unknown model parameters need to be calibrated with actual measurements. In most cases, the set of measurements cannot completely and uniquely determine the model parameters; hence multiple models can describe the same data set. Bayesian inverse theory provides a framework for solving this problem. Bayesian methods rely on the fact that the conditional probability of the model parameters given the data (the posterior) is proportional to the likelihood of observing the data and a prior belief expressed as a prior distribution of the model parameters. In case the prior distribution is not Gaussian and the relation between data and parameters (forward model) is strongly non-linear, one has to resort to iterative samplers, often Markov chain Monte Carlo methods, for generating samples that fit the data likelihood and reflect the prior model statistics. While theoretically sound, such methods can be slow to converge, and are often impractical when the forward model is CPU demanding. In this paper, we propose a new sampling method that allows to sample from a variety of priors and condition model parameters to a variety of data types. The method does not rely on the traditional Bayesian decomposition of posterior into likelihood and prior, instead it uses so-called pre-posterior distributions, i.e. the probability of the model parameters given some subset of the data. The use of pre-posterior allows to decompose the data into so-called, “easy data” (or linear data) and “difficult data” (or nonlinear data). The method relies on fast non-iterative sequential simulation to generate model realizations. The difficult data is matched by perturbing an initial realization using a perturbation mechanism termed “probability perturbation.” The probability perturbation method moves the initial guess closer to matching the difficult data, while maintaining the prior model statistics and the conditioning to the linear data. Several examples are used to illustrate the properties of this method.

Any interpolation, any hand contouring or digital drawing of a map or a numerical model necessarily calls for a prior model of the multiple-point statistics that link together the data to the unsampled nodes, then these unsampled nodes together. That prior model can be implicit, poorly defined as in hand contouring; it can be explicit through an algorithm as in digital mapping. The multiple-point statistics involved go well beyond single-point histogram and two-point covariance models; the challenge is to define algorithms that can control more of such statistics, particularly those that impact most the utilization of the resulting maps beyond their visual appearance. The newly introduced multiple-point simulation (mps) algorithms borrow the high order statistics from a visually and statistically explicit model, a training image. It is shown that mps can simulate realizations with high entropy character as well as traditional Gaussian-based algorithms, while offering the flexibility of considering alternative training images with various levels of low entropy (organized) structures. The impact on flow performance (spatial connectivity) of choosing a wrong training image among many sharing the same histogram and variogram is demonstrated.

The spatial referencing of river channels is complicated by their meandering planform, which dictates that Euclidean distance in a Cartesian reference frame is not an appropriate metric. Channel-fitted coordinate systems are thus widely used in application-oriented geostatistics as well as theoretical fluid mechanics, where flow patterns are described in terms of a streamwise axis s along the channel centerline and an axis n normal to that centerline. A means of transforming geographic (x, y) coordinates to their equivalents in the (s, n) space and vice versa is needed to relate the two frames of reference, and this paper describes a pair of transformation algorithms that are explicitly intended for reach-scale studies of modern rivers. The forward transformation from Cartesian to channel-fitted coordinates involves parametric description of the centerline using cubic splines, calculation of centerline normal vectors and curvature using results from differential geometry, and an efficient local search to find in-channel data points and compute their (s, n) coordinates. The inverse transformation finds the nearest vertices of a discretized centerline and uses a finite difference approximation to the streamwise rates of change of the centerline's Cartesian coordinates to obtain the geographic equivalent of a point in the (s, n) space. The performance of these algorithms is evaluated using: (i) field data from a gravel-bed river to examine the effects of initial centerline digitization and subsequent filtering; and (ii) analytically-defined centerlines and simulated coordinates to assess transformation accuracy and sensitivity to centerline curvature and discretization. Any discrepancy between a point's known coordinates in one frame of reference and the coordinates produced via transformation from the other coordinate system constitutes a transformation error, and our results indicate that these errors are 2–4% and 0.2–0.5% of the channel width for the field case and simulated centerlines, respectively. The primary sources of transformation error are the initial digitization of the centerline and the relationship between centerline curvature and discretization.

Geostatistical space–time models are used increasingly for addressing environmental problems, such as monitoring acid deposition or global warming, and forecasting precipitation or stream flow. Each discipline approaches the problem of joint space–time modeling from its own perspective, a fact leading to a significant amount of overlapping models and, possibly, confusion. This paper attempts an annotated survey of models proposed in the literature, stating contributions and pinpointing shortcomings. Stochastic models that extend spatial statistics (geostatistics) to include the additional time dimension are presented with a common notation to facilitate comparison. Two conceptual viewpoints are distinguished: (1) approaches involving a single spatiotemporal random function model, and (2) approaches involving vectors of space random functions or vectors of time series. Links between these two viewpoints are then revealed; advantages and shortcomings are highlighted. Inference from space–time data is revisited, and assessment of joint space–time uncertainty via stochastic imaging is suggested.

Methods reported in the literature for rock fracture simulations include approaches based on stochastic geometry, multiple-point statistics and a combination of geostatistics for fracture density and object-based modelling for fracture geometries. The advantages and disadvantages of each of these approaches are discussed with examples. By way of review, the authors begin with the geostatistical indicator simulation method, based on the truncated–Gaussian algorithm; this is followed by multiple-point statistical simulation and then the stochastic geometry approach, which is based on marked point process simulation. A new approach, based on pluriGaussian structural simulation, is then introduced. The new approach incorporates in the simulation the spatial correlation between different sets of fractures, which in general, is very difficult, if not impossible, to accomplish in the three methods reviewed. Each simulation method is summarised together with detailed simulation procedures for each. A published two-dimensional fracture dataset is used as a means of assessing the performance of each simulation method and of demonstrating the concepts discussed in the text.

This is the first in a pair of papers in which we present image-processing-based procedures for the measurement of fluvial gravels. The spatial and temporal resolution of surface grain-size characterization is constrained by the time-consuming and costly nature of traditional measurement techniques. Several groups have developed image-processing-based procedures, but none have demonstrated the transferability of these techniques between sites with different lithological, clast form and textural characteristics. Here we focus on image-processing procedures for identifying and measuring image objects (i.e. grains); the second paper examines the application of such procedures to the measurement of fluvially deposited gravels. Four image-segmentation procedures are developed, each having several internal parameters, giving a total of 416 permutations. These are executed on 39 images from three field sites at which the clasts have contrasting physical properties. The performance of each procedure is evaluated against a sample of manually digitized grains in the same images, by comparing three derived statistics. The results demonstrate that it is relatively straightforward to develop procedures that satisfactorily identify objects in any single image or a set of images with similar sedimentary characteristics. However, the optimal procedure is that which gives consistently good results across sites with dissimilar sedimentary characteristics. We show that neighborhood-based operations are the most powerful, and a morphological bottom-hat transform with a double threshold is optimal. It is demonstrated that its performance approaches that of the procedures giving the best results for individual sites. Overall, it out-performs previously published, or improvements to previously published, methods.

The variogram is a critical input to geostatistical studies: (1) it is a tool to investigate and quantify the spatial variability of the phenomenon under study, and (2) most geostatistical estimation or simulation algorithms require an analytical variogram model, which they will reproduce with statistical fluctuations. In the construction of numerical models, the variogram reflects some of our understanding of the geometry and continuity of the variable, and can have a very important impact on predictions from such numerical models. The principles of variogram modeling are developed and illustrated with a number of practical examples. A three-dimensional interpretation of the variogram is necessary to fully describe geologic continuity. Directional continuity must be described simultaneously to be consistent with principles of geological deposition and for a legitimate measure of spatial variability for geostatistical modeling algorithms. Interpretation principles are discussed in detail. Variograms are modeled with particular functions for reasons of mathematical consistency. Used correctly, such variogram models account for the experimental data, geological interpretation, and analogue information. The steps in this essential data integration exercise are described in detail through the introduction of a rigorous methodology.