It is already true that Big Data has drawn huge attention from researchers in information sciences, policy and decision makers in governments and enterprises. As the speed of information growth exceeds Moore’s Law at the beginning of this new century, excessive data is making great troubles to human beings. However, there are so much potential and highly useful values hidden in the huge volume of data. A new scientific paradigm is born as data-intensive scientific discovery (DISD), also known as Big Data problems. A large number of fields and sectors, ranging from economic and business activities to public administration, from national security to scientific researches in many areas, involve with Big Data problems. On the one hand, Big Data is extremely valuable to produce productivity in businesses and evolutionary breakthroughs in scientific disciplines, which give us a lot of opportunities to make great progresses in many fields. There is no doubt that the future competitions in business productivity and technologies will surely converge into the Big Data explorations. On the other hand, Big Data also arises with many challenges, such as difficulties in data capture, data storage, data analysis and data visualization. This paper is aimed to demonstrate a close-up view about Big Data, including Big Data applications, Big Data opportunities and challenges, as well as the state-of-the-art techniques and technologies we currently adopt to deal with the Big Data problems. We also discuss several underlying methodologies to handle the data deluge, for example, granular computing, cloud computing, bio-inspired computing, and quantum computing.
Distributed networked control systems have attracted intense attention from both academia and industry due to the multidisciplinary nature among the areas of communication networks, computer science and control. With ever-increasing research trends in these areas, it is desirable to review recent advances and to identify methodologies for distributed networked control systems. This paper presents a brief overview of such systems regarding system configurations, challenging issues and methodologies. First, networked control systems are introduced and their prevalent configurations including centralized, decentralized and distributed structures are outlined. Second, an emphasis is laid on a number of challenging issues from the analysis and synthesis of distributed networked control systems. More specifically, these challenging issues are identified through three integrated aspects: communication, computation and control. Third, different methodologies in the literature for distributed networked control systems are reviewed and categorized based on three pairs: undirected and directed graphs, fixed and time-varying topologies, and time-triggered and event-triggered mechanisms. Finally, concluding remarks are drawn and some potential research directions are suggested.
AdaBoost is a popular method for vehicle detection, but the training process is quite time-consuming. In this paper, a rapid learning algorithm is proposed to tackle this weakness of AdaBoost for vehicle classification. Firstly, an algorithm for computing the Haar-like feature pool on a 32 × 32 grayscale image patch by using all simple and rotated Haar-like prototypes is introduced to represent a vehicle’s appearance. Then, a fast training approach for the weak classifier is presented by combining a sample’s feature value with its class label. Finally, a rapid incremental learning algorithm of AdaBoost is designed to significantly improve the performance of AdaBoost. Experimental results demonstrate that the proposed approaches not only speed up the training and incremental learning processes of AdaBoost, but also yield better or competitive vehicle classification accuracies compared with several state-of-the-art methods, showing their potential for real-time applications.
The cloud computing exhibits, remarkable potential to provide cost effective, easy to manage, elastic, and powerful resources on the fly, over the Internet. The cloud computing, upsurges the capabilities of the hardware resources by optimal and shared utilization. The above mentioned features encourage the organizations and individual users to shift their applications and services to the cloud. Even the critical infrastructure, for example, power generation and distribution plants are being migrated to the cloud computing paradigm. However, the services provided by third-party cloud service providers entail additional security threats. The migration of user’s assets (data, applications, etc.) outside the administrative control in a shared environment where numerous users are collocated escalates the security concerns. This survey details the security issues that arise due to the very nature of cloud computing. Moreover, the survey presents the recent solutions presented in the literature to counter the security issues. Furthermore, a brief view of security vulnerabilities in the mobile cloud computing are also highlighted. In the end, the discussion on the open issues and future research directions is also presented.
Metaheuristics are widely recognized as efficient approaches for many hard optimization problems. This paper provides a survey of some of the main metaheuristics. It outlines the components and concepts that are used in various metaheuristics in order to analyze their similarities and differences. The classification adopted in this paper differentiates between single solution based metaheuristics and population based metaheuristics. The literature survey is accompanied by the presentation of references for further details, including applications. Recent trends are also briefly discussed.
Training classifiers with datasets which suffer of imbalanced class distributions is an important problem in data mining. This issue occurs when the number of examples representing the class of interest is much lower than the ones of the other classes. Its presence in many real-world applications has brought along a growth of attention from researchers. We shortly review the many issues in machine learning and applications of this problem, by introducing the characteristics of the imbalanced dataset scenario in classification, presenting the specific metrics for evaluating performance in class imbalanced learning and enumerating the proposed solutions. In particular, we will describe preprocessing, cost-sensitive learning and ensemble techniques, carrying out an experimental study to contrast these approaches in an intra and inter-family comparison. We will carry out a thorough discussion on the main issues related to using data intrinsic characteristics in this classification problem. This will help to improve the current models with respect to: the presence of small disjuncts, the lack of density in the training data, the overlapping between classes, the identification of noisy data, the significance of the borderline instances, and the dataset shift between the training and the test distributions. Finally, we introduce several approaches and recommendations to address these problems in conjunction with imbalanced data, and we will show some experimental examples on the behavior of the learning algorithms on data with such intrinsic characteristics.
Swarm intelligence is a research field that models the collective intelligence in swarms of insects or animals. Many algorithms that simulates these models have been proposed in order to solve a wide range of problems. The Artificial Bee Colony algorithm is one of the most recent swarm intelligence based algorithms which simulates the foraging behaviour of honey bee colonies. In this work, modified versions of the Artificial Bee Colony algorithm are introduced and applied for efficiently solving real-parameter optimization problems.
Countering cyber threats, especially attack detection, is a challenging area of research in the field of information assurance. Intruders use polymorphic mechanisms to masquerade the attack payload and evade the detection techniques. Many supervised and unsupervised learning approaches from the field of machine learning and pattern recognition have been used to increase the efficacy of intrusion detection systems (IDSs). Supervised learning approaches use only labeled samples to train a classifier, but obtaining sufficient labeled samples is cumbersome, and requires the efforts of domain experts. However, unlabeled samples can easily be obtained in many real world problems. Compared to supervised learning approaches, semi-supervised learning (SSL) addresses this issue by considering large amount of unlabeled samples together with the labeled samples to build a better classifier. This paper proposes a novel fuzziness based semi-supervised learning approach by utilizing unlabeled samples assisted with supervised learning algorithm to improve the classifier’s performance for the IDSs. A single hidden layer feed-forward neural network (SLFN) is trained to output a fuzzy membership vector, and the sample categorization (low, mid, and high fuzziness categories) on unlabeled samples is performed using the fuzzy quantity. The classifier is retrained after incorporating each category separately into the original training set. The experimental results using this technique of intrusion detection on the NSL-KDD dataset show that unlabeled samples belonging to low and high fuzziness groups make major contributions to improve the classifier’s performance compared to existing classifiers e.g., naive bayes, support vector machine, random forests, etc.
An efficient optimization method called ‘Teaching–Learning-Based Optimization (TLBO)’ is proposed in this paper for large scale non-linear optimization problems for finding the global solutions. The proposed method is based on the effect of the influence of a teacher on the output of learners in a class. The basic philosophy of the method is explained in detail. The effectiveness of the method is tested on many benchmark problems with different characteristics and the results are compared with other population based methods.
The hesitant fuzzy linguistic term sets (HFLTSs), which can be used to represent an expert’s hesitant preferences when assessing a linguistic variable, increase the flexibility of eliciting and representing linguistic information. The HFLTSs have attracted a lot of attention recently due to their distinguished power and efficiency in representing uncertainty and vagueness within the process of decision making. To enhance and extend the applicability of HFLTSs, this paper investigates and develops different types of distance and similarity measures for HFLTSs. The paper first proposes a family of distance and similarity measures between two HFLTSs. Then a variety of weighted or ordered weighted distance and similarity measures between two collections of HFLTSs are proposed and analyzed for discrete and continuous cases respectively. After that, the application of these measures to multi-criteria decision making problems is given. Based on the proposed distance and similarity measures, the satisfaction degrees for different alternatives are established and are then used to rank alternatives in multi-criteria decision making. Finally a practical example concerning the evaluation of the quality of movies is given to illustrate the applicability and advantage of the proposed approach and the differences between the proposed distance and similarity measures.
Cloud computing emerges as a new computing paradigm that aims to provide reliable, customized and quality of service guaranteed computation environments for cloud users. Applications and databases are moved to the large centralized data centers, called . Due to resource virtualization, global replication and migration, the physical absence of data and machine in the cloud, the stored data in the cloud and the computation results may not be well managed and fully trusted by the cloud users. Most of the previous work on the cloud security focuses on the storage security rather than taking the computation security into consideration together. In this paper, we propose a privacy cheating discouragement and secure computation auditing protocol, or , which is a first protocol bridging secure storage and secure computation auditing in cloud and achieving privacy cheating discouragement by designated verifier signature, batch verification and probabilistic sampling techniques. The detailed analysis is given to obtain an optimal sampling size to minimize the cost. Another major contribution of this paper is that we build a practical secure-aware cloud computing experimental environment, or , as a test bed to implement SecCloud. Further experimental results have demonstrated the effectiveness and efficiency of the proposed SecCloud.
The evaluation of clustering algorithms is intrinsically difficult because of the lack of objective measures. Since the evaluation of clustering algorithms normally involves multiple criteria, it can be modeled as a multiple criteria decision making (MCDM) problem. This paper presents an MCDM-based approach to rank a selection of popular clustering algorithms in the domain of financial risk analysis. An experimental study is designed to validate the proposed approach using three MCDM methods, six clustering algorithms, and eleven cluster validity indices over three real-life credit risk and bankruptcy risk data sets. The results demonstrate the effectiveness of MCDM methods in evaluating clustering algorithms and indicate that the repeated-bisection method leads to good 2-way clustering solutions on the selected financial risk data sets.
The electrocardiogram (ECG) is a useful diagnostic tool to diagnose various cardiovascular diseases (CVDs) such as myocardial infarction (MI). The ECG records the heart's electrical activity and these signals are able to reflect the abnormal activity of the heart. However, it is challenging to visually interpret the ECG signals due to its small amplitude and duration. Therefore, we propose a novel approach to automatically detect the MI using ECG signals. In this study, we implemented a convolutional neural network (CNN) algorithm for the automated detection of a normal and MI ECG beats (with noise and without noise). We achieved an average accuracy of 93.53% and 95.22% using ECG beats with noise and without noise removal respectively. Further, no feature extraction or selection is performed in this work. Hence, our proposed algorithm can accurately detect the unknown ECG signals even with noise. So, this system can be introduced in clinical settings to aid the clinicians in the diagnosis of MI.
This paper investigates the problem of network-based leader-following consensus of nonlinear multi-agent systems via distributed impulsive control. First, by taking network-induced delays into account, a nonlinear system with delayed impulses is formulated. Then, a general consensus criterion is derived and several special cases of network-induced delays and network topologies are discussed. Moreover, sufficient conditions on the design of the sampling period, pinned nodes and the coupling strength are provided. The effects of the coupling strength and pinning strategy are further explored for multi-agent systems with an undirected communication graph. Finally, two examples are given to verify the theoretical results.
When expressing preferences in qualitative setting, several possible linguistic terms with different weights (represented by probabilities) may be considered at the same time. The probabilistic distribution is usually hard to be provided completely and ignorance may exist. In this paper, we first propose a novel concept called probabilistic linguistic term set (PLTS) to serve as an extension of the existing tools. Then we put forward some basic operational laws and aggregation operators for PLTSs. After that, we develop an extended TOPSIS method and an aggregation-based method respectively for multi-attribute group decision making (MAGDM) with probabilistic linguistic information, and apply them to a practical case concerning strategy initiatives. Finally, the strengths and weaknesses of our methods are clarified by comparing them with some similar techniques. (C) 2016 Elsevier Inc. All rights reserved.
In recent years, various heuristic optimization methods have been developed. Many of these methods are inspired by swarm behaviors in nature. In this paper, a new optimization algorithm based on the law of gravity and mass interactions is introduced. In the proposed algorithm, the searcher agents are a collection of masses which interact with each other based on the Newtonian gravity and the laws of motion. The proposed method has been compared with some well-known heuristic search methods. The obtained results confirm the high performance of the proposed method in solving various nonlinear functions.
In this paper, we propose a variety of distance measures for hesitant fuzzy sets, based on which the corresponding similarity measures can be obtained. We investigate the connections of the aforementioned distance measures and further develop a number of hesitant ordered weighted distance measures and hesitant ordered weighted similarity measures. They can alleviate the influence of unduly large (or small) deviations on the aggregation results by assigning them low (or high) weights. Several numerical examples are provided to illustrate these distance and similarity measures.
The complexity and impact of many real world decision making problems lead to the necessity of considering multiple points of view, building group decision making problems in which a group of experts provide their preferences to achieve a solution. In such complex problems uncertainty is often present and although the use of linguistic information has provided successful results in managing it, these are sometimes limited because the linguistic models use single-valued and predefined terms that restrict the richness of freely eliciting the preferences of the experts. Usually, experts may doubt between different linguistic terms and require richer expressions to express their knowledge more accurately. However, linguistic group decision making approaches do not provide any model to make more flexible the elicitation of linguistic preferences in such hesitant situations. In this paper is proposed a new linguistic group decision model that facilitates the elicitation of flexible and rich linguistic expressions, in particular through the use of comparative linguistic expressions, close to human beings’ cognitive models for expressing linguistic preferences based on hesitant fuzzy linguistic term sets and context-free grammars. This model defines the group decision process and the necessary operators and tools to manage such linguistic expressions.