We present the response‐oriented sequential alternation (ROSA) method for multiblock data analysis. ROSA is a novel and transparent multiblock extension of the partial least squares regression (PLSR). According to a “winner takes all” approach, each component of the model is calculated from the block of predictors that most reduces the current residual error. The suggested algorithm is computationally fast compared with other multiblock methods because orthogonal scores and loading weights are calculated without deflation of the predictor blocks. Therefore, it can work effectively even with a large number of blocks included. The ROSA method is invariant to block scaling and ordering. The ROSA model has the same attributes (vectors of scores, loadings, and loading weights) as PLSR and is identical to PLSR modeling for the case with only one block of predictors. A quick data fusion method for many data blocks is proposed as a generalization of partial least squares regression that can also be interpreted as a generalization of variable selection. Improved interpretability is accomplished through a single‐block focus. Stability and speed are ensured through Gram–Schmidt steps and minimal deflations.
Multiblock partial least squares (MB-PLS) method has been proposed for modeling the data set with large number of variables and for making the model more interpretable. In MB-PLS, the variables are split into several blocks containing different information, and the relative importance of the blocks is reflected by the super-weights of the MB-PLS model. In this paper, a weighted MB-PLS coupled with discrete wavelet transform (DWT) method is proposed for modeling of the near infrared (NIR) spectra. In the method, the spectra are decomposed into blocks by DWT, and the relative importance of the blocks is estimated by both the super-weights and the block-weights determined by the prediction error of the sub-models in cross validation. Therefore, a practical approach to separate the variables is provided for MB-PLS and the relative contribution of the variable blocks to the prediction can be modulated adaptively. To validate the performance of the method, two industrial NIR data sets of tobacco powder and fragment of tobacco lamina are investigated, respectively. The root-mean-square error of prediction (RMSEP), the residual predictive deviation (RPD), and the correlation coefficient ( ) show that the weighted MB-PLS coupled with DWT gives a better predictive accuracy and interpretability compared with the ordinary PLS and MB-PLS methods.
This paper presents two novel statistical analyses of multiblock data using the R language. It is designed for data organized in (K + 1) blocks (i.e., tables) consisting of a block of response variables to be explained by a large number of explanatory variables which are divided into K meaningful blocks. All the variables - explanatory and dependent are measured on the same individuals. Two multiblock methods both useful in practice are included, namely multiblock partial least squares regression and multiblock principal component analysis with instrumental variables. The proposed new methods are included within the ade4 package widely used thanks to its great variety of multivariate methods. These methods are available on the one hand for statisticians and on the other hand for users from various fields in the sense that all the values derived from the multiblock processing are available. Some relevant interpretation tools are also developed. Finally the main results are summarized using overall graphical displays. This paper is organized following the different steps of a standard multiblock process, each corresponding to specific R functions. All these steps are illustrated by the analysis of real epidemiological datasets.
ComDim analysis was designed to assess the relationships between individuals and variables within a multiblock setting where several variables, organized in blocks, are measured on the same individuals. An overview of this method is presented together with some of its properties. Furthermore, we discuss a new extension of the method of analysis to the case of (K+1) datasets. More precisely, the aim is to explore the relationships between a response dataset and K other datasets. An illustration of this latter strategy of analysis on the basis of a case study involving Time Domain ‐ Nuclear Magnetic Resonance data is outlined and the outcomes are compared with those of Multiblock Partial Least Squares regression. An overview of ComDim analysis is presented together with some of its properties. Furthermore, a new extension of this method to the case of K+1 datasets is discussed. More precisely, the aim is to explore the relationships between a response dataset and K other datasets. An illustration of this latter strategy of analysis on the basis of Time Domain ‐ Nuclear Magnetic Resonance data is outlined and the outcomes are compared to those of Multiblock PLS regression.
Antimicrobial use in pig farming is influenced by a range of risk factors, including herd characteristics, biosecurity level, farm performance, occurrence of clinical signs and vaccination scheme, as well as farmers' attitudes and habits towards antimicrobial use. So far, the effect of these risk factors has been explored separately. Using an innovative method called multi-block partial least-squares regression, this study aimed to investigate, in a sample of 207 farrow-to-finish farms from Belgium, France, Germany and Sweden, the relative importance of the six above mentioned categories or 'blocks' of risk factors for antimicrobial use in pig production, Four country separate models were developed; they showed that all six blocks provided useful contribution to explaining antimicrobial use in at least one country. The occurrence of clinical signs, especially of respiratory and nervous diseases in fatteners, was one of the largest contributing blocks in all four countries, whereas the effect of the other Mocks differed between countries. In terms of risk management, it suggests that a holistic and country-specific mitigation strategy is likely to he more effective. However, further research is needed to validate our findings in larger and more representative samples, as well as in other countries.
Quality control (QC) in the pharmaceutical industry is a key activity in ensuring medicines have the required quality, safety and efficacy for their intended use. QC departments at pharmaceutical companies are responsible for all release testing of final products but also all incoming raw materials. Near-infrared spectroscopy (NIRS) and Raman spectroscopy are important techniques for fast and accurate identification and qualification of pharmaceutical samples. Tablets containing two different active pharmaceutical ingredients (API) [bisoprolol, hydrochlorothiazide] in different commercially available dosages were analysed using Raman- and NIR Spectroscopy. The goal was to define multivariate models based on each vibrational spectroscopy to discriminate between different dosages (identity) and predict their dosage (semi-quantitative). Furthermore the combination of spectroscopic techniques was investigated. Therefore, two different multiblock techniques based on PLS have been applied: multiblock PLS (MB-PLS) and sequential-orthogonalised PLS (SO-PLS). NIRS showed better results compared to Raman spectroscopy for both identification and quantitation. The multiblock techniques investigated showed that each spectroscopy contains information not present or captured with the other spectroscopic technique, thus demonstrating that there is a potential benefit in their combined use for both identification and quantitation purposes.
For human beings, the mouth is the first organ to perceive food and the different signalling events associated to food breakdown. These events are very complex and as such, their description necessitates combining different data sets. This study proposed an integrated approach to understand the relative contribution of main food oral processing events involved in aroma release during cheese consumption. In vivo aroma release was monitored on forty eight subjects who were asked to eat four different model cheeses varying in fat content and firmness and flavoured with ethyl propanoate and nonan-2-one. A multiblock partial least square regression was performed to explain aroma release from the different physiological data sets ( masticatory behaviour, bolus rheology, saliva composition and flux, mouth coating and bolus moistening). This statistical approach was relevant to point out that aroma release was mostly explained by masticatory behaviour whatever the cheese and the aroma, with a specific influence of mean amplitude on aroma release after swallowing. Aroma release from the firmer cheeses was explained mainly by bolus rheology. The persistence of hydrophobic compounds in the breath was mainly explained by bolus spreadability, in close relation with bolus moistening. Resting saliva poorly contributed to the analysis whereas the composition of stimulated saliva was negatively correlated with aroma release and mostly for soft cheeses, when significant.
This study investigates flux decline in ultrafiltration as a capacity measure for the process. A continuous ultrafiltration is a multistage process where a considerable coupling between the stages is expected due to similar settings on the subsequent recirculation loops and recirculation of parts of the process streams. To explore the flux decline issue from an engineering perspective, two ways of organizing process signals into logical blocks are identified and used in a multiblock partial least-squares regression: (1) the “physical location” of the sensors on the process layout and (2) “engineering type of tags”. Abnormal runs are removed iteratively from the original data set, and then the multiblock parameters are calculated based on the optimized regression model to determine the role of the different data building units in flux prediction. Both blocking alternatives are interpreted alongside offering a compact overview of the most important sections related to the flux decline. This way one can zoom in on the smaller sections of the process, which gives an optimization potential.