ComDim analysis was designed to assess the relationships between individuals and variables within a multiblock setting where several variables, organized in blocks, are measured on the same individuals. An overview of this method is presented together with some of its properties. Furthermore, we discuss a new extension of the method of analysis to the case of (K+1) datasets. More precisely, the aim is to explore the relationships between a response dataset and K other datasets. An illustration of this latter strategy of analysis on the basis of a case study involving Time Domain ‐ Nuclear Magnetic Resonance data is outlined and the outcomes are compared with those of Multiblock Partial Least Squares regression. An overview of ComDim analysis is presented together with some of its properties. Furthermore, a new extension of this method to the case of K+1 datasets is discussed. More precisely, the aim is to explore the relationships between a response dataset and K other datasets. An illustration of this latter strategy of analysis on the basis of Time Domain ‐ Nuclear Magnetic Resonance data is outlined and the outcomes are compared to those of Multiblock PLS regression.
The STATIS method has been successfully applied to the analysis of sensory profiling data and other kinds data in sensometrics. We discuss its use and benefits and compare its outcomes to alternative methods for the analysis of multiblock data arising in situations such as projective mapping and free sorting experiments. More importantly, a method of clustering a collection of datasets measured on the same individuals, called CLUSTATIS, is introduced. It is based on the optimization of a criterion and consists in a hierarchical cluster analysis and a partitioning algorithm akin to the K-means algorithm. The procedure of analysis can be seen as an extension of the cluster analysis of variables around latent components (CLV, Vigneau & Qannari, 2003) to the case of blocks of variables. Alongside the determination of the clusters, a latent configuration is determined by the STATIS method. The interest of CLUSTATIS in sensometrics is discussed and illustrated on the basis of two case studies pertaining to the projective mapping also called Napping and the free sorting tasks, respectively.
ComDim (Common Dimensions) analysis was initially introduced within the context of sensometrics to analyze conventional and free choice sensory profiling data, and more generally multiblock datasets. Thereafter, it has gained some popularity in chemometrics and has been extended in different ways to meet specific needs. Recently, this strategy of analysis has been adapted to the supervised case, under the name of P-ComDim. Going further, we propose herein to extend ComDim to Path-ComDim where the datasets at hand are assumed to have a specific pattern of directed relations among them reflecting, for instance, a chain of influence. The aim of Path-ComDim is to analyze these datasets taking into account the structural connections among them. After a brief review of alternative path modeling approaches, Path-ComDim is detailed encompassing both methodological and algorithmic aspects. In the particular case of a single block to be predicted, it is shown that Path-ComDim is equivalent to P-ComDim analysis. Path-ComDim analysis is illustrated on the basis of a case study involving instrumental, sensory and preference data. Finally, the outcomes are compared to those obtained from alternative path modeling methods.
This paper reports the analysis of a multiblock environmental dataset consisting of 176 samples collected in Islamabad Pakistan between February 2006 and August 2007. The concentrations of 32 elements in each sample were measured using Proton Induced X-ray Emission plus black carbon for both coarse and fine particulate matter. Six meteorological parameters were also recorded, namely maximum and minimum daily temperatures, humidity, rainfall, windspeed and pressure. The data were explored using Principal Components Analysis (PCA), Partial Least Squares (PLS), Consensus PCA, Multiblock PLS, Mantel test, Procrustes analysis and the coefficient. Seasonal trends can be identified and interpreted. Using the elemental composition of the particulates it is possible to predict meteorological parameters. Based on the models from PLS, it is possible to use elemental composition in the airborne particulates matter (APM) to predict meteorological parameters. The results from block similarity measures show that fine APM resembles meteorological parameters better than coarse APM. Multiblock PLS models however are not better than classical PLSR. This paper also demonstrates the potential of multiblock approach in environmental monitoring.
We address the problem of analyzing one or several blocks of variables measured on the same individuals which are divided into several groups. In this framework, we focus on the within-group analysis. For the case of a single dataset, we consider multigroup Principal Component Analysis proposed by several authors (Levin ; Krzanowski ; Kiers and Ten Berge ). A new optimization criterion which characterizes this method and an extension to the case of multiblock datasets are presented. The method is illustrated on the basis of a dataset pertaining to sensory analysis.
The integration of multiple data sources has emerged as a pivotal aspect to assess complex systems comprehensively. This new paradigm requires the ability to separate common and redundant from specific and complementary information during the joint analysis of several data blocks. However, inherent problems encountered when analysing single tables are amplified with the generation of multiblock datasets. Finding the relationships between data layers of increasing complexity constitutes therefore a challenging task. In the present work, an algorithm is proposed for the supervised analysis of multiblock data structures. It associates the advantages of interpretability from the orthogonal partial least squares (OPLS) framework and the ability of common component and specific weights analysis (CCSWA) to weight each data table individually in order to grasp its specificities and handle efficiently the different sources of Y-orthogonal variation. Three applications are proposed for illustration purposes. A first example refers to a quantitative structure-activity relationship study aiming to predict the binding affinity of flavonoids toward the P-glycoprotein based on physicochemical properties. A second application concerns the integration of several groups of sensory attributes for overall quality assessment of a series of red wines. A third case study highlights the ability of the method to combine very large heterogeneous data blocks from Omics experiments in systems biology. Results were compared to the reference multiblock partial least squares (MBPLS) method to assess the performance of the proposed algorithm in terms of predictive ability and model interpretability. In all cases, ComDim-OPLS was demonstrated as a relevant data mining strategy for the simultaneous analysis of multiblock structures by accounting for specific variation sources in each dataset and providing a balance between predictive and descriptive purpose.
In this article, we extend the scope of the first paper of the sequel, which was dedicated to the analysis of advanced single-block regression methods (Rendall et al., 2016) , to the class of multiblock regression approaches. The datasets contemplated for developing the multiblock approaches are the same as in the former publication: volatile, polyphenols, organic acids composition and the UV–Vis spectra. The context is still the prediction of the ageing time of one of finest Portuguese fortified wines, the Madeira Wine, but now the data collected from the different analytical sources is explored simultaneously, in a more structured and informative way, through multiblock methodologies. The goal of this paper is to provide a critical assessment of a rich variety of multiblock regression methods, namely: Concatenated PLS, Multiblock PLS (MBPLS), Hierarchical PLS (HPLS), Network-Induced Supervised Learning (NI-SL) and Sequential Orthogonalised Partial Least Squares (SO-PLS). A comparison of block scaling methods was also undertaken for the Concatenated PLS algorithm, and new block scaling methods were proposed that led to better prediction performances. This study explores and reveals the potential advantages of applying multiblock methods for fusing datasets from different sources, from both the predictive and interpretability perspectives.
For the purpose of exploring and modelling the relationships between a dataset and several datasets, multiblock Partial Least Squares is a widely-used regression technique. It is designed as an extension of PLS which aims at linking two datasets. In the same vein, we propose an extension of Redundancy Analysis to the multiblock setting. We show that PLS and multiblock Redundancy Analysis aim at maximizing the same criterion but the constraints are different. From the solutions of both these approaches, it turns out that they are the two end points of a continuum approach that we propose to investigate.
There is a growing need to analyze datasets characterized by several sets of variables observed on a single set of observations. Such complex but structured dataset are known as multiblock dataset, and their analysis requires the development of new and flexible tools. For this purpose, Kernel Generalized Canonical Correlation Analysis (KGCCA) is proposed and offers a general framework for multiblock data analysis taking into account an graph of connections between blocks. It appears that KGCCA subsumes, with a single monotonically convergent algorithm, a remarkably large number of well-known and new methods as particular cases. KGCCA is applied to a simulated -block dataset and a real molecular biology dataset that combines Gene Expression data, Comparative Genomic Hybridization data and a qualitative phenotype measured for a set of children with glioma. KGCCA is available on CRAN as part of the RGCCA package.