Front cover image for Handbook of Statistics : Data Mining and Data Visualization

Handbook of Statistics : Data Mining and Data Visualization

This book focuses on dealing with large-scale data, a field commonly referred to as data mining. The book is divided into three sections. The first deals with an introduction to statistical aspects of data mining and machine learning and includes applications to text analysis, computer intrusion detection, and hiding of information in digital files. The second section focuses on a variety of statistical methodologies that have proven to be effective in data mining applications. These include clustering, classification, multivariate density estimation, tree-based methods, pattern recognition, o
eBook, English, 2014
Elsevier Science, Amsterdam, 2014
1 online resource (660 pages).
9780080459400, 0080459404
1055796009
Front cover; copyright; front matter; Preface; Table of contents; Contributors; body; 1. Statistical Data Mining; Introduction 1; Computational complexity; Order of magnitude considerations; Feasibility limits due to CPU performance; Feasibility limits due to file transfer performance; Feasibility limits due to visual resolution; The computer science roots of data mining; Knowledge discovery in databases and data mining; Association rules; Data preparation; Missing values and outliers; Quantization; Databases; SQL; Data cubes and OLAP; Statistical methods for data mining; Density estimation. Cluster analysisHierarchical clustering; The number of groups problem; Artificial neural networks; The biological basis; Functioning of an artificial neural network; Back propagation; Visual data mining; The four stages of data graphics; Graphics constructs for visual data mining; Example 1
PRIM 7 data; Example 2
iterative denoising with hyperspectral data; Streaming data; Recursive analytic formulations; Counts, moments and densities; Evolutionary graphics; Waterfall diagrams and transient geographic mapping; Block-recursive plots and conditional plots; A final word; Acknowledgements 1. Strong patterns vs. complete and consistent rulesRuleset visualization via concept association graphs; Integration of knowledge generation operators; Summary 2; Acknowledgements 2; References 2; 3. Mining Computer Securitycomputer security Data; Introduction 3; Basic TCP/IP; Overview of networking; The threat; Probes and scans; Denial of service attacks; Gaining access; Network monitoring; TCP sessions; Signatures versus anomalies; User profiling; Program profiling; Conclusions 3; References 3; 4. Data Mining of Text Files; 4. Introduction and background. Natural language processing at the word and sentence levelHidden Markov models; Probabilistic context-free grammars; Word sense disambiguation; Supervised disambiguation; Unsupervised disambiguation; Approaches beyond the word and sentence level; Information retrieval; Vector space model; Generic implementation.; Using term weights.; Latent Semantic Indexing (LSI); Other approaches; The bigram proximity matrix; Measures of semantic similarity.; Matching coefficient; Jaccard coefficient; Ochiai measure (also called cosine); L1 distance; Information radius measure (IRad)
Document classification via supervised learning