Handbook of Statistics : Data Mining and Data Visualization

Authors:C. R. Rao, E. J. Wegman, J. L. Solka

Summary:This book focuses on dealing with large-scale data, a field commonly referred to as data mining. The book is divided into three sections. The first deals with an introduction to statistical aspects of data mining and machine learning and includes applications to text analysis, computer intrusion detection, and hiding of information in digital files. The second section focuses on a variety of statistical methodologies that have proven to be effective in data mining applications. These include clustering, classification, multivariate density estimation, tree-based methods, pattern recognition, o

eBook, English, 2014

Edition:View all formats and editions

Publisher: Elsevier Science, Amsterdam, 2014

Series:

Handbook of statistics (Amsterdam, Netherlands), v. 24

Physical Description:1 online resource (660 pages).

ISBN:

9780080459400, 0080459404

OCLC Number / Unique Identifier:1055796009

Subjects:

Data mining

Data mining Statistical methods

Additional Physical Form Entry:

Print version:

Handbook of Statistics : Data Mining and Data Visualization.

Rao, C.R.

Contents:

Front cover; copyright; front matter; Preface; Table of contents; Contributors; body; 1. Statistical Data Mining; Introduction 1; Computational complexity; Order of magnitude considerations; Feasibility limits due to CPU performance; Feasibility limits due to file transfer performance; Feasibility limits due to visual resolution; The computer science roots of data mining; Knowledge discovery in databases and data mining; Association rules; Data preparation; Missing values and outliers; Quantization; Databases; SQL; Data cubes and OLAP; Statistical methods for data mining; Density estimation. Cluster analysisHierarchical clustering; The number of groups problem; Artificial neural networks; The biological basis; Functioning of an artificial neural network; Back propagation; Visual data mining; The four stages of data graphics; Graphics constructs for visual data mining; Example 1

PRIM 7 data; Example 2

iterative denoising with hyperspectral data; Streaming data; Recursive analytic formulations; Counts, moments and densities; Evolutionary graphics; Waterfall diagrams and transient geographic mapping; Block-recursive plots and conditional plots; A final word; Acknowledgements 1. Strong patterns vs. complete and consistent rulesRuleset visualization via concept association graphs; Integration of knowledge generation operators; Summary 2; Acknowledgements 2; References 2; 3. Mining Computer Securitycomputer security Data; Introduction 3; Basic TCP/IP; Overview of networking; The threat; Probes and scans; Denial of service attacks; Gaining access; Network monitoring; TCP sessions; Signatures versus anomalies; User profiling; Program profiling; Conclusions 3; References 3; 4. Data Mining of Text Files; 4. Introduction and background. Natural language processing at the word and sentence levelHidden Markov models; Probabilistic context-free grammars; Word sense disambiguation; Supervised disambiguation; Unsupervised disambiguation; Approaches beyond the word and sentence level; Information retrieval; Vector space model; Generic implementation.; Using term weights.; Latent Semantic Indexing (LSI); Other approaches; The bigram proximity matrix; Measures of semantic similarity.; Matching coefficient; Jaccard coefficient; Ochiai measure (also called cosine); L1 distance; Information radius measure (IRad)

Notes:

Document classification via supervised learning

More Information:

Ebook Library