Applied data mining for business and industry

Applied data mining for business and industry

Paolo Giudici, Silvia Figini.

Giudici, Paolo

Chichester, U.K. : Wiley, c2009.

2nd ed.

Book

Print

Where to find it

Information & Library Science Library

Call Number: QA76.9.D343 G75 2009

Status: Available

Request

Authors, etc.

Names:

Summary

The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a variety of business applications. Introduces data mining methods and applications. Covers classical and Bayesian multivariate statistical methodology as well as machine learning and computational data mining methods. Includes many recent developments such as association and sequence rules, graphical Markov models, lifetime value modelling, credit risk, operational risk and web mining. Features detailed case studies based on applied projects within industry. Incorporates discussion of data mining software, with case studies analysed using R. Is accessible to anyone with a basic knowledge of statistics or data analysis. Includes an extensive bibliography and pointers to further reading within the text.

Applied Data Mining for Business and Industry, 2nd edition is aimed at advanced undergraduate and graduate students of data mining, applied statistics, database management, computer science and economics. The case studies will provide guidance to professionals working in industry on projects involving large volumes of data, such as customer relationship management, web design, risk management, marketing, economics and finance.

Content provided by Syndetic Solutions, Inc. Terms of Use

show more show less

1 Introduction p. 1
Part I Methodology p. 5
2 Organisation of the data p. 7
2.1 Statistical units and statistical variables p. 7
2.2 Data matrices and their transformations p. 9
2.3 Complex data structures p. 10
2.4 Summary p. 11
3 Summary statistics p. 13
3.1 Univariate exploratory analysis p. 13
3.1.1 Measures of location p. 13
3.1.2 Measures of variability p. 15
3.1.3 Measures of heterogeneity p. 16
3.1.4 Measures of concentration p. 17
3.1.5 Measures of asymmetry p. 19
3.1.6 Measures of kurtosis p. 20
3.2 Bivariate exploratory analysis of quantitative data p. 22
3.3 Multivariate exploratory analysis of quantitative data p. 25
3.4 Multivariate exploratory analysis of qualitative data p. 27
3.4.1 Independence and association p. 28
3.4.2 Distance measures p. 29
3.4.3 Dependency measures p. 31
3.4.4 Model-based measures p. 32
3.5 Reduction of dimensionality p. 34
3.5.1 Interpretation of the principal components p. 36
3.6 Further reading p. 39
4 Model specification p. 41
4.1 Measures of distance p. 42
4.1.1 Euclidean distance p. 43
4.1.2 Similarity measures p. 44
4.1.3 Multidimensional scaling p. 46
4.2 Cluster analysis p. 47
4.2.1 Hierarchical methods p. 49
4.2.2 Evaluation of hierarchical methods p. 53
4.2.3 Non-hierarchical methods p. 55
4.3 Linear regression p. 57
4.3.1 Bivariate linear regression p. 57
4.3.2 Properties of the residuals p. 60
4.3.3 Goodness of fit p. 62
4.3.4 Multiple linear regression p. 63
4.4 Logistic regression p. 67
4.4.1 Interpretation of logistic regression p. 68
4.4.2 Discriminant analysis p. 70
4.5 Tree models p. 71
4.5.1 Division criteria p. 73
4.5.2 Pruning p. 74
4.6 Neural networks p. 76
4.6.1 Architecture of a neural network p. 79
4.6.2 The multilayer perceptron p. 81
4.6.3 Kohonen networks p. 87
4.7 Nearest-neighbour models p. 89
4.8 Local models p. 90
4.8.1 Association rules p. 90
4.8.2 Retrieval by content p. 96
4.9 Uncertainty measures and inference p. 96
4.9.1 Probability p. 97
4.9.2 Statistical models p. 99
4.9.3 Statistical inference p. 103
4.10 Non-parametric modelling p. 109
4.11 The normal linear model p. 112
4.11.1 Main inferential results p. 113
4.12 Generalised linear models p. 116
4.12.1 The exponential family p. 117
4.12.2 Definition of generalised linear models p. 118
4.12.3 The logistic regression model p. 125
4.13 Log-linear models p. 126
4.13.1 Construction of a log-linear model p. 126
4.13.2 Interpretation of a log-linear model p. 128
4.13.3 Graphical log-linear models p. 129
4.13.4 Log-linear model comparison p. 132
4.14 Graphical models p. 133
4.14.1 Symmetric graphical models p. 135
4.14.2 Recursive graphical models p. 139
4.14.3 Graphical models and neural networks p. 141
4.15 Survival analysis models p. 142
4.16 Further reading p. 144
5 Model evaluation p. 147
5.1 Criteria based on statistical tests p. 148
5.1.1 Distance between statistical models p. 148
5.1.2 Discrepancy of a statistical model p. 150
5.1.3 Kullback-Leibler discrepancy p. 151
5.2 Criteria based on scoring functions p. 153
5.3 Bayesian criteria p. 155
5.4 Computational criteria p. 156
5.5 Criteria based on loss functions p. 159
5.6 Further reading p. 162
Part II Business case studies p. 163
6 Describing website visitors p. 165
6.1 Objectives of the analysis p. 165
6.2 Description of the data p. 165
6.3 Exploratory analysis p. 167
6.4 Model building p. 167
6.4.1 Cluster analysis p. 168
6.4.2 Kohonen networks p. 169
6.5 Model comparison p. 171
6.6 Summary report p. 172
7 Market basket analysis p. 175
7.1 Objectives of the analysis p. 175
7.2 Description of the data p. 176
7.3 Exploratory data analysis p. 178
7.4 Model building p. 181
7.4.1 Log-linear models p. 181
7.4.2 Association rules p. 184
7.5 Model comparison p. 186
7.6 Summary report p. 191
8 Describing customer satisfaction p. 193
8.1 Objectives of the analysis p. 193
8.2 Description of the data p. 194
8.3 Exploratory data analysis p. 194
8.4 Model building p. 197
8.5 Summary p. 201
9 Predicting credit risk of small businesses p. 203
9.1 Objectives of the analysis p. 203
9.2 Description of the data p. 203
9.3 Exploratory data analysis p. 205
9.4 Model building p. 206
9.5 Model comparison p. 209
9.6 Summary report p. 210
10 Predicting e-learning student performance p. 211
10.1 Objectives of the analysis p. 211
10.2 Description of the data p. 212
10.3 Exploratory data analysis p. 212
10.4 Model specification p. 214
10.5 Model comparison p. 217
10.6 Summary report p. 218
11 Predicting customer lifetime value p. 219
11.1 Objectives of the analysis p. 219
11.2 Description of the data p. 220
11.3 Exploratory data analysis p. 221
11.4 Model specification p. 223
11.5 Model comparison p. 224
11.6 Summary report p. 225
12 Operational risk management p. 227
12.1 Context and objectives of the analysis p. 227
12.2 Exploratory data analysis p. 228
12.3 Model building p. 230
12.4 Model comparison p. 232
12.5 Summary conclusions p. 235
References p. 237
Index p. 243

Content provided by Syndetic Solutions, Inc. Terms of Use

show more show less

Subjects

Subject Headings A:

Other details

Description:
Published:
Language:
Notes:
ISBN:
OCLC Number:
Other Identifiers:

MARC View

Tools

Applied data mining for business and industry

Where to find it

Information & Library Science Library

Authors, etc.

Summary

Contents

Subjects

Other details