With the rapid development of computer network technology and the wide application of computer Internet in people's work and life, A large number of malicious software has become an important threat to the normal operation of society and people's privacy security. In the field of malware analysis, more and more attention has been paid to a new analysis method, called automatic analysis of malware behavior using machine learning and data mining technology. At present, this analysis method mainly based on the user-mode behavior. However, user-mode monitoring system cannot capture low-level behavior, such as load driver or call kernel functions. The result of this can be an incomplete malware behavior set, or even influence the accuracy of analysis.Aiming at solving this problem, we do a research on malware analysis method based on machine learning algorithm and the run-time behavior of malware analysis. In this thesis, we propose and implement a malware analysis method based on kernel level monitoring and clustering algorithm. The main work of this thesis is listed as follow:(1). In this thesis, we do a survey on the advantages and disadvantages of different kinds of malware analysis techniques and propose a method of malware analysis based on kernel level monitoring and clustering algorithm.(2). At the core of the system, we study the method of malicious software behavior monitoring, design and implement the automatic depth monitoring system.(3). This thesis studies the extraction method of malware kernel behavior and designs the kernel function mapping table, and proposes the malware behavior representation model.(4). Based on hierarchical clustering analysis algorithm, this thesis achieves a clustering analysis of large number of malicious software.The experimental results show that this design method of kernel level monitoring and clustering algorithm based malware analysis can do automatic analysis of malware samples, obtaining malicious software kernel f
As sequencing technologies continue to develop in the direction of high throughput and low cost, more and more research institutions and individuals have the possibility of independent sequencing and data analysis. At the same time, the rapid increase in the amount of sequencing data also provides the possibility of more in-depth and large-scale data analysis. In transcriptomics research, RNA-Seq, which is based on next generation sequencing technology and can quickly and easily obtain transcriptome snapshots of biological samples in a specific state, has become one of the standard analyses in the field of bioinformatics and plays an increasingly important role in the study of mechanism and development of cancer. For cervical cancer, one of the most common malignancies of gynecology, a large number of omics studies have been devoted to depict its molecular characteristics, but no studies on mechanisms of progression and regression of cervical cancer have been published. In this study, we analyzed 112 cervical RNA-Seq cases to find the difference between progression and regression at transcriptome level. In the meantime, based on the in-depth understanding of RNA-Seq data analysis, we developed a general RNA-Seq data analysis pipeline to facilitate the study. The main work of this article includes two aspects: 1) RNA-Seq data analysis of cervical cases. RNA-Seq data of 112 cervical cases are processed and analyzed from following perspectives: data preprocessing, quality control of sequencing data, alignment, alignment quality control, gene expression quantification, differential expression analysis, enrichment analysis and fusion gene analysis. 2) Design and development of RNA-Seq data analysis pipeline. Diversity in analysis tools has increased the difficulty in RNA-Seq data analysis. We designed and developed a RNA-Seq data analysis pipeline based on data analysis experience to facilitate study.
The efficiency of technical analysis problem confuses not only academic researchers, but also practitioners in the market. Therefore, the debate over the efficiency of technical analysis has been a hot topic in financial theory. Under this background, this paper mainly explores the efficiencies and the test methods of typical technical analysis methods in Chinese commodity futures market. In terms of strategies of technical analysis, this paper starts from the technical analysis theory and selects six major technical analysis strategies such as the common trend strategies, mean-reversion strategies, and volume-based strategies. In terms of research objects, this paper selects three major futures from three futures exchanges respectively in China. Based on the short-term operational characteristics of technical analysis, this paper selects high-frequency trading data with a 5-minute cycle. In order to overcome asynchronous transactions, commissions, slippage and other problems, this paper builds a simulation trading system consisting of four modules: a signal generation module, a transaction execution module, a transaction cost and slippage processing module, and a module for statistical analysis of result. In order to test the efficiency of the strategies scientifically, this paper uses Monte Carlo simulation and non-parametric statistical techniques to improve traditional test indicators. After the out-of-sample test and the slippage impact test, the following conclusions are obtained: Firstly, the statistical analysis shows that the technical analysis strategies are effective in Chinese commodity future market. Even after deduction of a slippage point, part of the technical analysis strategies still show strong efficiency and robustness. This also reflects that commodity future market in China has not yet reached the level of weak efficient market, and the market efficiency of different exchanges varies. Secondly, in terms of technical analysis strategies, the sim
Software vulnerability is an important manifestation of software security issues. With the increase of system and software complexity and the extension of application scenarios, software vulnerabilities show a trend of large number, wide impact and great harm. The research of software vulnerabilities has always been a hot spot in industry and academia. To improve the software security, new methods, techniques and theories have been put forward by security researchers, focusing on the discovery, analysis and defense of software vulnerabilities. Meanwhile, attackers are also developing the technology of vulnerability discovery and exploitation. In the "battle of attack and defense" of vulnerabilities, finding and analyzing vulnerabilities in a timely and efficient way is the key to grasp the initiative. In the work of real software vulnerabilities, many target softwares are closed source commercial software. In the absence of source code, reverse analysis has become an important basis for software vulnerability research. Dynamic reverse analysis takes the binary code of the program as the analysis object. Through the dynamic execution of the target program, it reverses the high-level semantics. And combining with the dynamic information such as memory states and program behaviors in the program execution, it analyzes the software more accurately. This paper focuses on the application of dynamic reverse analysis to software vulnerability analysis, and the studies can be divided into four aspects: key function reverse, software fuzzing test, dynamic vulnerability mining and vulnerability threat analysis. First, we solve the problem of the semantic absence of key functions in binary code, and then we use fuzzing test and dynamic analysis methods to discover vulnerabilities. Finally, we propose an exploit-oriented vulnerability threat analysis method. The main contributions of the paper are as follows: (1) We propose a reverse method of key functions b
Single cell sequencing technology can achieve sequence analysis at the single-cell level and detect cell heterogeneity. It is widely used in many researches, such as identifying tumor cell subsets and clarifying the mechanism of tumor resistance. Single-cell sequencing data with high transcript loss, high sequencing noise, and high biological variation is different from traditional sequencing data. To fully explore it, the corresponding analysis process need to be developed.In this paper, using python and R languages and integrating current single-cell sequencing analysis tools, for the hematological single-cell sequencing data which is common to our laboratory, we developed a software that can be applied for whole-process analysis of single-cell transcriptome sequencing data, named scrap (Single-Cell RNA-seq Analysis Pipeline). This software set sequence mapping, counting, quality control, standardization, differential gene searching and identification, full transcriptome splicing, transcript diversity analysis, protein network analysis and other functions as a whole. And it also can visually display the results. This software has the characteristics of quick computing and convenient operation.We used scrap to analyze the single-cell transcripts data of 45 single artificial stem cells and 45 primary hematopoietic stem cells in our laboratory and identified different genes between the two groups of cells, KEGG pathway enrichment information, and hub genes. These results were in line with biological facts, indicating that scrap can quickly and intuitively analyze the data to provide a reliable basis for the follow-up experiment.
With the development of technologies, the data is growing rapidly in the fields of scientific research, Internet applications, etc. Because more and more data analysis algorithms are used to get the research value and commercial value from the big data, the traditional SQL database performs worse than the new big data analysis system. However, compared to the big data analysis system, the traditional SQL database as an early data storage system with a strong ecosystem is more mature, easier to operate and maintain. While the big data analysis system requires more skills and it is not compatible with the traditional architecture based on the traditional SQL database. Therefore, how to improve the data analysis performance of the traditional SQL database is significant for the data analysis of large-scale structured data and the reduction of the enterprise’s cost.In this paper, we propose a new solution for the integration of the SQL database and the big data analysis system to extend the analytics capability of the SQL. The core of the solution is to establish the interoperability mechanism between heterogeneous big data analysis systems. To solve this problem, we design and implement the interoperable middleware for connecting the SQL database and the big data analysis system. Based on the middleware, we build a heterogeneous analysis system, using the big data system as the execution engine for data analysis. We analyze the difference between the SQL database and the big data analysis system in terms of structure and data format. In order to overcome the difficulty which the difference results in, we improve the middleware from four aspects, including communication protocol, the design of interface, data transmission and data processing. For example, the middleware defines a general message type for ignoring the difference between the SQL database and the big data analysis system. The middleware provides a series of interfaces to ensure the interoperability of the heterogeneous big data analysis system. To improve the versatility of the data transmission, the middleware defines a system-independent data transmission process based on the interface of data transmission. And, for the extensibility of the middleware, the module of data processing is defined in the middleware. Finally, based on the interoperating middleware for heterogeneous big data analysis systems, we choose PostgreSQL and Spark to build the heterogeneous analysis system and prove the availability and efficiency of the system when executing complex data analysis. Then, we use this system to recommend information for the bus WIFI users, which further validates the efficiency and practicability of the system.
The number of malwares is increasing at a rate of more than 100 million a year, which brings huge economic and property losses to people. How to effectively combat malware is a security issue continuously receiving attention of researchers. Compared with static analysis, dynamic analysis is free from the effects of code protection technologies such as packing and obfuscation. It can detect new malware and is the preferred method for analyzing malicious code. The comprehensiveness of sample behavior data is one of the key factors affecting the accuracy of dynamic analysis. However, the lack of dynamic analysis environment simulation is an important factor affecting the dynamic analysis system's full acquisition of sample behavior data. The insufficient of dynamic analysis environment simulation mainly includes the following three aspects: 1) Anti-virtualization, when sample detects that it is running in the virtual environment, it hides the malicious behavior; 2) Insufficient simulation of networking environment, the simulated network provides services that do not return the documents which sample really wants; 3) Insufficient simulation of the user interaction environment, the sample lacks the driving of user operation and cannot continue to run. In order to solve the problem of insufficient simulation of dynamic analysis environment, this paper studies the technology of high-fidelity malware dynamic analysis environment, and increase the degree of the dynamic analysis environment simulation and enhances the ability of the analysis system to obtain sample behavior data through conducting anti-virtualization countermeasures, improving the insufficiency of environment simulation and user interaction simulation. The main research contents and achievements of this topic include the following four aspects: 1) This paper investigated the existing anti-virtualization method and conducted systematic anti-virtualization confrontation from three aspects: netwo
With the arrival of the global big data era, people pay more and more attention to data use and data analysis. At present, China is in the period of rapid development of smart city construction. In the process of development, if we wanted the intelligent management come true, we should collect all the data such as energy data, traffic data, environment data and so on.Also we need the support of spatial location because the city is a huge geographical space for human activities, so we should associate thematic attribute data with spatial data. What is the most important is that we should make data analysis to discover temporal and spatial patterns of data.This paper aims to study the key technologies of spatiotemporal data fluctuation analysis based on GIS platform. Based on GIS data aggregation platform, the collection, storage, association, unification and centralized management of data and spatial data are completed in the special fields of energy, transportation, environment and so on. Then the whole technical framework of pulse analysis is completed by combing and integrating the key technologies of pulse analysis.Finally this paper makes pulsation analysis in the fields energy, transportation and environment based on the data from the platform. In the energy pulsation analysis, the ARMA time series prediction model is established to predict the gas consumption of the residents. Then this paper makes data visualization based on energy data. In the traffic pulsation analysis, the buffer site analysis and passenger flow statistics of the bus station in the Sino-Singapore Tianjin Eco-city is used to make some suggestions for improvement.In the traffic pulsation analysis the short-term traffic prediction model based on self encoding stack is established.Based on the Shanghai metro card data, it confirmed the validity of mode. In the environment pulsation analysis,this paper makes some study on vegetation coverage based on the NDVI index of Sino-Singapore Tianjin Ec
Since the Android operation system was published in 2008, it has rapidly developed to become the mobile operation system with the highest market share in the past few years. The number of Android applications is huge, but the quality is posing as a big problem. There are a large number of personal privacy information in mobile phones. If an application with vulnerability is installed by the phone user, its vulnerability is likely to be used by the malicious attackers. This accident will result in privacy leaks and property losses. Since the Android application markets don’t provide an effective detection mechanism of Android application security, this paper aims to propose a comprehensive and effective detection method of Android application vulnerabilities to improve the security of Android application.This paper analyzes the existing researches on the Android application analysis. The researches on Android application analysis include static analysis and dynamic analysis. Static analysis does not need to actually run the application. It has the characteristics such as high efficiency, high code coverage and so on. It is suitable for the detection of large scale of samples, so the paper uses the static analysis method. The analysis based on data stream has the highest accuracy in the static analysis of Android application vulnerabilities. However, the researches that provide data analysis are only designed for a certain type of vulnerability detection and the actual effect of these researches in vulnerability detection still remains to prove in practical inspection further. The researches such as the tool Androbugs provide a complete scheme of vulnerability detection, but they don’t provide the data flow level detection. It leads to the inaccuracy result of some specific vulnerability detection.In order to make up the lack of the research on the complete vulnerability detection method based on data flow analysis, this paper proposes an Android vulnerability detec
Independent Component Analysis (ICA) is a common data-driven analytic methodology for functional Magnetic Resonance Imaging (fMRI). The uncertainty in component order in ICA leads to difficulties in multiple subject data analysis. The current algorithms suffer from problems such as ignoring individual variability or mismatching components across individuals. To address these problems, we propose a multi-subject ICA algorithm based on spatial consistency of individual components. Using novel cost function and genetic optimization, this algorithm simultaneously decomposes individual data while regularizing the decomposition with interdependence of components across individuals. Comparisons between the new algorithm and existing methods using simulated data showed that the new algorithm preserves individual variability in brain components while stably matches components from multiple subjects. This approach can obtain stable connectivity network while revealing individual variability, and thus provides a promising tool for discovering neuroimaging markers for mental disorders.