Advanced Statistics and Data Mining Summerschool

Madrid. July 6th to 17th 2009
















This summerschool is organized by the Artificial Intelligence Department of the Computer Science Faculty of the Univ. Politécnica de Madrid. This summerschool is the continuation of the one organized by Univ. San Pablo - CEU for three years. Therefore, this would be its 4th edition. It is an intensive course aiming at providing attendees with an introduction to the theoretical foundations as well as the practical applications of some of the modern statistical analysis techniques currently in use. The summerschool takes 2 weeks and is divided into 18 courses. Each subject has 8 theoretical classes and 7 practical classes in which each technique is put into practice with a computer program. Students may register only in those courses of their interest.

Academic Interest: this course complements the background of many students from a variety of disciplines with the theoretical and practical fundamentals of those modern techniques employed in the analysis and modelling of large data sets. The academic interest of this course is high since there are no specific university studies on this kind of techniques.

Scientific interest: any scientist in most fields (engineering, life sciences, economics, etc.) is confronted to the problem of extracting conclusions from a set of experimental data. This course supplies experimentalists with the sufficient resources to be able to select the appropriate analysis technique and how to apply it to their specific problem.

Professional interest: the application of modern data analysis in the industry is well spread since it is practically needed in nearly all disciplines. As for job offers, it is a quite demanded topic: a search in as for March 2008 retrieves more than 5000 offers for “data analysis”, more than 1784 offers for “data mining”, and 438 offers for “statistical consultant”.



The goals of this summer school are to complement the technical background of attendees in the field of data analysis and modelling. This course is open to any student or professional wanting to enlarge his knowledge of a topic that is more and more involved in nearly all productive areas (Computer Science, Engineering, Pharmacy, Medicine, Economics, Statistics, etc.)

A second objective of the summerschool is that the student is acquainted with a set of computational tools in which to try the techniques studied during the course on practical problems that they may bring on their own or that the summerschool professors may propose.

Note that the Summerschool is on advanced techniques and the courses will provide with the insight of modern techniques that, nearly by definition, are not mathematically trivial. Although, the emphasis is placed on their use and not in the mathematics behind, attendees should not be afraid or surprised of seeing some mathematics. Teachers will make the course content accessible to students with all backgrounds. For making the course easier, the student is supposed to be familiar with certain concepts that are described as "prerequisites" and he is encouraged to read the "before attending documents" to benefit as much as possible from the course.



All classes will be given in English. Courses 1, 2, and 3; 4, 5 and 6; ... are given simultaneously, therefore a student cannot register in two simultaneous courses.


Week 1 (July 6th - July 10th, 2009)


Course 1: Bayesian networks (15 h), Practical sessions: Hugin, Elvira, Weka, LibB (see sample)

                  Prof. Mª Concepción Bielza, Pedro Larrañaga (Univ. Politénica de Madrid)

                  Theory: Block 6, Room 6105; Practice: Block 4, Room Los Verdes

Course 2: Multivariate data analysis (15 h), Practical sessions: R (see sample)

                  Prof. Carlos Óscar S. Sorzano (Univ. San Pablo CEU, CSIC)

                  Theory: Block 6, Room 6101; Practice: Block 4, Room Monje

Course 3: Dimensionality reduction (15 h), Practical sessions: MATLAB (see sample)

                  Prof. Alberto Pascual Montano (CSIC)


Course 4: Supervised pattern recognition (Classification) (15 h), Practical sessions: Weka (see sample)

                  Prof. Pedro Larrañaga (Univ. País Vasco)

                  Theory: Block 6, Room 6105; Practice: Block 4, Room Los Verdes

Course 5: Introduction to MATLAB (15h), Practical sessions: MATLAB

                  Prof. Rubén Armañanzas (Univ. Politécnica Madrid)

Course 6: Data Mining: a practical perspective (15 h), Practical sessions: Weka, R, MATLAB

                  Prof. Alberto Pascual Montano (CSIC), Miguel Vázquez, Mariana Lara, Pedro Carmona  (Univ. Complutense)

                  Theory: Block 6, Room 6101; Practice: Block 4, Room Monje


Course 7: Time series analysis (15 h), Practical sessions: R (see sample)

                  Prof. Carlos Óscar S. Sorzano (Univ. San Pablo CEU)

                  Theory: Block 6, Room 6105; Practice: Block 4, Room Los Verdes

Course 8: Neural networks (15 h), Practical sessions: MATLAB (see sample)

                  Prof. Santiago Falcón (Unión Fenosa)

Course 9: Introduction to SPSS (15h), Practical sessions: SPSS

                  Prof. Concepción Bielza, Antonio Jiménez (Univ. Politécnica Madrid)


Week 2 (July 13th - July 17th, 2009)


Course 10: Regression (15 h), Practical sessions: SPSS (see sample)

                  Prof. Carlos Rivero Rodríguez (Univ. Complutense de Madrid)

Course 11: Practical Statistical Questions (15 h), Practical sessions: study of cases (without computer) (see sample)

                  Prof. Carlos Óscar S. Sorzano (Univ. San Pablo CEU,CSIC)

                  Theory: Block 6, Room 6105

Course 12: Missing data and outliers, Practical sessions: R

                  Prof. Román Mínguez (Univ. Castilla-La Mancha)


Course 13: Hidden Markov Models (15 h), Practical sessions:HTK (see sample)

                  Prof. Agustín Álvarez (Univ. Politécnica de Madrid)

                  Theory: Block 6, Room 6105; Practice: Block 4, Room Los Verdes

Course 14: Statistical inference (15 h), Practical sessions: SPSS (see sample)

                  Prof. Román Mínguez (Univ. Castilla-La Mancha)

                  Theory: Block 6, Room 6101; Practice: Block 4, Room Monje

Course 15: Feature subset selection (15 h), Practical sessions: Weka, R, MATLAB

                  Prof. Rubén Armañanzas, Víctor Robles, Pedro Larrañaga,  (Univ. Politécnica de Madrid)


Course 16: Introduction to R (15h), Practical sessions: R

                  Prof. Pedro Carmona (Univ. Complutense Madrid)

                  Theory: Block 6, Room 6101; Practice: Block 4, Room Monje

Course 17: Unsupervised pattern recognition (clustering) (15 h), Practical sessions: MATLAB (see sample)

                  Prof. Carlos Oscar S. Sorzano (Univ. San Pablo CEU, CSIC)

                  Theory: Block 6, Room 6105; Practice: Block 4, Room Los Verdes

Course 18: Evolutionary computation (15 h), Practical sessions: MATLAB

                  Prof. Daniel Manrique, Roberto Santana (Univ. Politécnica de Madrid)

Click here for a more detailed program


Dates and Times


  • Week 1: July 6th - July 10th, 2008
  • Week 2: July 13th - July 17th, 2008

The timetable of each week is as follows



Monday-Friday Week 1

Monday-Friday Week 2


Courses 1,2 and 3

Courses 10,11 and 12


Courses 4,5 and 6

Courses 13,14 and 15


Courses 7,8 and 9

Courses 16,17 and 18



The summerschool takes place at the Computer Science Faculty of the Univ. Politécnica de Madrid located in the Urb. Montepríncipe, Boadilla del Monte (Metro stations: Montepríncipe (Metro ligero línea 3, first stop: Colonia Jardín (line 10)); Buses: 571 and 573) in Madrid. Click here for a detailed map of the area. Note that we don't provide lodgement for those students from outside Madrid. Students are recommended to stay in Madrid and commute daily to the Faculty (about 40 minutes from Madrid depending on location). In the following links you will find useful lodging information in Spanish and English.

In Spanish it might be useful to ask for the "Facultad de Informática de la Universidad Politécnica de Madrid". This link also contains useful information for reaching the building.


Diplomas and Academic recognition

All students will obtain an Assistance Diplomma and 1 ECTS (European Credit) will be acknowledged.



The price of each course in the summerschool is 150 euros. This fee includes attendance to lectures and educational materials.

To apply for the summerschool, the students must send an email to the organizers (, asking for course availability. After confirmation from the organizer, they should make the payment and send another e-mail stating name, e-mail, telephone, institution, nationality (note that depending on your nationality you may need a visa for entering Spain), courses paid, and a copy (PDF) of the payment. Courses have a maximum attendance of 40 people, and they will be selected in strict order of payment date. Courses with less than 6 people will be cancelled.

The payment shall be made by bank transfer to the following address:

To: Títulos propios de la UPM
Subject: Course identifier+Your name
Address: Madrid

Bank name: Banco Bilbao Vizcaya Argentaria (BBVA)
Bank Adress: c/Alcalá, 16
Account Number: 0182 2370 48 0201516245
IBAN: ES0501822370480201516245

For official invoices are fiscal data is:

Universidad Politécnica de Madrid
CIF: Q2818015F




Coordinators of the course:

Pedro Larrañaga
Facultad de Informática
Univ. Politécnica de Madrid
Campus Urb. Montepríncipe s/n
28660 Boadilla del Monte, Madrid

Tel.: +34 91 336 7443
Fax: +34 91 352 4819

Carlos Oscar Sánchez Sorzano
Escuela Politécnica Superior
Univ. San Pablo - CEU
Campus Urb. Montepríncipe s/n
28668 Boadilla del Monte, Madrid

Tel.: +34 91 372 4034
Fax: +34 91 372 4049


Collaborating institutions

Red temática de Investigación Cooperativa de Centros de Cáncer (RTICCC)


Detailed program


Course 1

Bayesian networks


1. Bayesian networks basics.
  1.1. Reasoning with uncertainty
  1.2. Probabilistic conditional independence
  1.3 Correspondence between graph and model: D-separation
  1.4 Probabilistic graphical models
  1.5 Bayesian networks: properties
  1.6 Building bayesian networks (examples)

2. Inference in Bayesian networks.
  2.1 Queries in Bayesian networks: deductive, diagnostic and intercausal reasoning
  2.2 Exact inference:
     - Brute force approach
     - Variable elimination
     - Message passing
  2.3 Searching for explanations: abduction
  2.4 Approximate inference

3 Learning Bayesian networks from data
  3.1 Introduction
  3.2 Factorization of the joint probability distribution
     - Methods based on testing conditional independence
     - Methods based on score + search
     - Hybrid methods
     - Applications
  3.3 Bayesian classifiers
     - Introduction
     - Naive Bayes
     - Seminaive Bayes
     - Tree augmented network
     - K- dependence network
     - Markov Blanket
     - Applications
  3.4 Clustering
     - Introduction
     - The EM algorithm
     - The structural EM algorithm
     - Applications

Practical demonstration: Hugin, Elvira, Weka, BayesBuilder


Learning Bayesian Networks by Richard E. Neapolitan. Prentice Hall; 1st edition (2003)
Expert Systems and Probabilistic Network Models by E. Castillo, J. Gutierrez, and A. Hadi. Springer-Verlag (1997)
Bayesian Networks and Decision Graphs (second edition) Finn V. Jensen and Thomas D. Nielsen. Springer Verlag (2007)


The attendant is supposed to be familiar with basic notions of probability and graphs.

Readings before coming

The attendant will benefit more from the course if he reads before coming:


Course 2

Multivariate data analysis


1. Introduction
   1.1. Types of variables
   1.2. Types of analysis and technique selection
   1.3. Descriptors (mean, covariance matrix)
   1.4. Variability and distance
   1.5. Linear dependence

2. Data Examination
   2.1. Graphical examination of the data
   2.2. Missing Data
   2.3. Outliers
   2.4. Assumptions of multivariate analysis

Principal component analysis (PCA)
   3.1. Introductionn
   3.2. Component computation
   3.3. Example
   3.4. Properties
   3.5. Extensions

4. Factor Analysis
   4.1. Introduction
   4.2. Factor computation
   4.3. Example
   4.4. Extensions

5. Multidimensional Scaling (MDS)
   5.1. Introduction
   5.2. Metric scaling
   5.3. Example
   5.4. Non metric scaling
   5.5. Extensions

6. Correspondence analysis
   6.2. Projection search
   6.3. Example
   6.4. Extensions

7. Multivariate Analysis of Variance (MANOVA)
   7.1. Introductionn
   7.2. Computations (1-way)
   7.3. Computations (2-way)
   7.4. Post-hoc tests

8. Canonical correlation
   8.1. Introduction
   8.2. Construction of the canonical variables
   8.3. Example
   8.4. Extensions

Practical demonstration: R


Multivariate Data Analysis (6th Edition) by Joseph F. Hair, Bill Black, Barry Babin, Rolph E. Anderson, Ronald L. Tatham. Prentice Hall; 6 edition (2005)

Análisis de datos multivariantes by Daniel Sánchez Peña. McGraw-Hill (2002)


The attendant is supposed to be familiar with univariate statistics, the univariate Gaussian, ANOVA, and Correlation.

Readings before coming

The attendant will benefit more from the course if he reads before coming:


Course 3

Dimensionality reduction


1. Introduction:
   1.1. Why dimensionality reduction
   1.2. Curse of dimensionality
   1.3. Feature selection vs. feature extraction
   1.4. Linear vs. no linear
   1.5. Accuracy vs. Interpretation

2. Matrix factorization methods
   2.1. Principal Component Analysis
   2.2. Singular Value Decomposition
   2.3. Factor analysis
   2.4. Non-negative matrix factorization
   2.5. Non-negative tensor factorization
   2.6. Independent Component Analysis and Blind Source Separation techniques

3. Clustering methods
   3.1. Motivation and theoretical basis
   3.2. K-means
   3.3. Fuzzy c-means
   3.4. Hierarchical clustering

4. Projection methods
   4.1. Random mapping
   4.2. Sammon mapping
   4.3. Self-organizing maps
   4.4. Isomap
   4.5. Locally linear embedding (LLE)

5. Applications
   5.1. Pattern recognition
   5.2. Image classification
   5.3. Gene expression analysis
   5.4. Text mining

6. Practical excercises
   6.1. Image classification
   6.2. Gene expression analysis
   6.3. Scientific text analysis

Practical Demonstration: MATLAB and Web applications


Mitchell, T., (1997) Machine Learning, Mc Graw Hill.
Russell, S., Norvig, P., (2003) Artificial Intelligence: A Modern Approach, 2nd Ed.
Prentice Hall.


The attendant is supposed to be familiar with basic statistical concepts

Reading before coming

The attendant will benefit more from this course if before coming he reads:


Course 4

Supervised pattern recognition (Classification)


1. Introduction
   1.1. Supervised classification
   1.2. Semisupervised classification
   1.3. Partially supervised classification
   1.4. Unsupervised classification

2. Assessing the Performance of Supervised Classification Algorithms
   2.1. Error generalization
   2.2. Area under the ROC curve
   2.3. Brier score
   2.4. Holdout method
   2.5. k-fold cross validation
   2.6. Bootstrapping

3. Classification techniques
   3.1. Discriminant analysis
   3.2. Classification trees
   3.3. Nearest neighbour classifier
   3.4. Logistic regression
   3.5. Bayesian network classifiers
   3.6. Neural network classifiers
   3.7. Support Vector Machines (SVM)

4. Combining Classifierss
   4.1. Hybridizing classifiers
   4.2. Basic methods:       - Fusion of label outputs
      - Fusion of continuous-valued outputs
      - Sketched generalization
      - Cascading
   4.3. Advanced methods:
      - Bagging
      - Randomization
      - Boosting
      - Hybrid classifiers

5. Comparing Supervised Classification Algorithms
   5.1. Two classifiers in the same database
   5.2. More than two classifiers in the same database
   5.3. Two classifiers in multiple databases
   5.4. More than two classifiers in multiple databases

Practical demonstration: WEKA, SPSS


Statistical Pattern Recognition by Andrew R. Webb. John Wiley & Sons; 2nd edition (2002)

Kuncheva, L. (2004) Combining Pattern Classifiers, Wiley.

Readings before coming

Recommended listening: “An Introduction to Pattern Classification” by E. Yom Tov, IBM Haifa Research Lab at


Course 5

Introduction to MATLAB


1. Overview of the Matlab suite
   1.1. History
   1.2. Elements and use of the Graphical Use Interface
   1.3. Editors
   1.4. The help system
   1.5. Computing with Matlab
   1.6. Matlab peculiarities

2. Data structures and files
   2.1. Arrays
   2.2. Matrices
   2.3. Cell arrays
   2.4. Structure arrays
   2.6. Operations using arrays/matrices
   2.7. Importing / Exporting data

3. Programming in Matlab
   3.1. Types of files
   3.2. Scripts and .m code
   3.3. Functions and design conventions
   3.4. Operators
   3.6. Imperative statements
   3.7. Complex structures
   3.8. Toolboxes

4. Visualization tools
   4.1. Plots and subplots
   4.2. Property editor
   4.3. Command line customization
   4.4. Advanced plots
   4.5. Exporting figures

5. Some applications in pattern recognition
   5.1. Clustering
   5.2. Feature selection
   5.3. Supervised classification

Practical demonstration: MATLAB


Introduction to Matlab 7 for Engineers. William Palm III. McGraw-Hill Science/Engineering/Math, 2004. ISBN 978-0072548181.


The attendee is supposed to be familiar with imperative programming and basic pattern recognition concepts. Familiarity with mathematical softwares and spreadsheets is also desired

Readings before coming

The attendee will start the course with a precise idea of the Matlab suite by watching the following three videos:


Course 6

Data mining: a practical perspective


1. Introduction to Data Mining and Knowledge Discovery
   1.1. What is datamining and knowledge discovery
   1.2. Preparing data for mining
   1.3. Data warehouses
   1.4. What kind of patterns can be mined?
   1.5. Data cleaning and transformation
   1.6. Online Analytical Processing (OLAP)
   1.7. Visual data mining
   1.8. Practical examples

2. Prediction in data mining
   2.1. Regression of one dependent variable
   2.2. Predictor variable selection and regularization
   2.3. Non-linear fits
   2.4. Model assessment
   2.5. Real data examples

3. Classification
   3.1. Two class classification
   3.2. Evaluation of classification models
   3.3. Interpretability. Support Vector Machines vs Random Forest
   3.4. Multiclass classification approaches
   3.5. Unsupervised classification and Clustering
   3.6. Real data examples

4. Association studies
   4.1. Introduction
   4.2. Frequent Itemset Mining
   4.3. Association Rules
   4.4. Application areas and Hands-on

5. Data mining in free-form texts: text mining
   5.1. Why is text mining needed?
   5.2. What is natural language processing?
   5.3. Words, syntax and semantics
   5.4. Information retrieval
   5.5. Information extraction
   5.6. Text categorization
   5.7. Corpora
   5.8. Applications

Practical demonstration: Biomedical text mining online tools, open source data mining applications, R, Weka, MATLAB


• Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management. Michael J. A. Berry, Gordon S. Linoff. Wiley. Second Edition.
• Data Mining: Concepts and Techniques (The Morgan Kaufmann Series in Data Management Systems). Jiawei Han, Micheline Kamber. Academic Press.
• Data Mining: Practical machine learning tools and techniques with java implementations. IH Witten, E Frank Morgan Kaufmann, San Francisco, 2005
• The Elements of Statistical Learning. Hastie, Tibshirani and Friedman - Springer-Verlag, 2008
• Bart Goethals: Survey on Frequent Pattern Mining. 
• R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 207-216, Washington D.C., May 1993.
• Natural language processing for online applications: text retrieval, extraction and categorization. Peter Jackson and Isabelle Moulinier. John Benjamins Publishing Company. 2nd revised edition. 2002.
• Text mining for Biology and Biomedicine. Sophia Ananiadou and John McNaught. Artech House Publisher. 2005.
• A Survey of Current Work in Biomedical Text Mining. Aaron M. Cohen and William R. Hersh. Briefings in Bioinformatics. 6(1):57-71. 2005.


Course 7

Time series analysis


1. Introduction:
    1.1. Areas of applications
    1.2. Objectives of time series analysis
    1.3. Components of time series
    1.4. Descriptive analysis
    1.5. Distributional properties:  independence, autocorrelation, stationarity.
    1.6. Detection and removal of outliers

2. Trend and seasonal component analysis
   2.1. Linear and non-linear regression
   2.2. Polynomial fitting
   2.3. Cubic spline fitting
   2.4. Fourier representation of a sequence
   2.5. Spectral representation of stationary processes
   2.6. Detrending and filtering

3. Probability models to time series:
   3.1. Random walk
   3.2. Autoregressive model (AR)
   3.3. Moving Average model (MA)
   3.4. Mixed models (ARMA, ARIMA, FARIMA, SARIMA, Box-Jenkins, ARMAX)
   3.5. System identification and model families
   3.6. Generalized Autoregressive Conditional Heteroscedasticity (GARCH)
   3.7. Parameter estimation
   3.8. Order selection
   3.9. Model checking

4. Forecasting and Data mining
   4.1. Optimal forecasts
   4.2. Forecasts for ARMA models
   4.3. Analysis of the prediction error
   4.4. State-space modelling
   4.5. Mining of seasonal trends
   4.6. Frequently occurring patterns
   4.7. Connections between different time-series

Practical demonstration: R



The Analysis of Time Series: An Introduction, by Chris Chatfield. Chapman & Hall/CRC; 6th edition (2003)
Time series analysis, by James D. Hamilton.  Princeton University Press (1994)
Handbook of Time Series Analysis, Signal Processing, and Dynamics  by D. S.G. Pollock, Richard C. Green, Truong Nguyen. Academic Press (1999)


The attendant is supposed to be familiar with the concept of correlation, regression, and inference.

Readings before comming

The attendant will benefit more from the course if he reads before coming:


Course 8

Neural networks


1. Introduction to the biological models. Nomenclature.
   1.1. The biological model
   1.2. Artificial neural networks
   1.3. Notation
   1.4. The neural model
   1.5. Architecture of neural networks
   1.6. Learning mechanisms
   1.7. First examples and geometrical representation.

2. Perceptron networks
   2.1.  Perceptron architecture
   2.2. Decision contour
   2.3. Learning rules
   2.4. Classification examples

3. The Hebb rule.
   3.1. Linear associator and the Hebb rule
   3.2. Pseudoinverse rule

4. Foundations of multivariate optimization. Numerical optimization
   4.1 Mathematical foundations
   4.2. Function optimization: minimization
   4.3. Gradient algorithms
      4.3.1. Gradient method
      4.3.2. Conjugate gradient method

5. Rule of Widrow-Hoff
   5.1. Mathematical foundations
   5.2. Widrow-Hoff algorithm
   5.3. Examples

6. Backpropagation algorithm
   6.1. Backpropagation
   6.2. Backpropagation algorithm
   6.3. Examples

7. Practical data modelling with neural networks

Practical demonstration: MATLAB Neural network toolbox


Neural Networks: A Comprehensive Foundation by Simon Haykin. Prentice Hall; 2nd edition (1998)


Course 9

Introduction to SPSS


1. Introduction
   1.1. Menu structure
   1.2. Getting help
   1.3. Basic operations with .sav data: read, save, Data Editor
   1.4. Running an analysis
   1.5. Using the Viewer
   1.6. Data transformations: functions, random number generators, recoding
   1.7. File handling: sort cases and variables, merging and split files, select cases, restructuring data

2. Describing data
   2.1. Frequencies: statistics, charts
   2.2. Descriptive: options
   2.3. Explore: statistics, plots
   2.4. Interactive plots
   2.5. Contingency tables

3. Statistical inference
   3.1. Confidence intervals with Explore and error bars
   3.2. Compare means: independent samples and paired samples T test
   3.3. Test whether the mean differs from a constant: One sample T test
   3.4. Nonparametric tests: chi-square, binomial, runs, Kolmogorov-Smirnov, two-independent-samples, two-related-samples, several independent samples, several related samples

4. Time series
   4.1. Time series creation and transformations

5. Sampling
   5.1. Complex Samples option

6. Classification and regression
   6.1. Discriminant analysis, logistic regression and decision trees
   6.2. Variable selection methods in regression
   6.3. Plots, statistics, options and Curve Estimation

Practical demonstration: MATLAB Neural network toolbox


J. Pallant (2007) “SPSS Survival Manual: A Step by Step Guide to Data Analysis Using SPSS for Windows (Version 15)”. Open University Press.
SPSS 16.0 Student Version for Windows (Audio CD). Inc. SPSS.
Tutorials and manuals provided by SPSS.


Assumed familiarity with the basics of probability and inference

Reading before coming

The student will benefit if he takes a look at the following videos:


Course 10



1. Introduction  
   1.1. A brief historical framework
   1.2. Some applications and examples of Regression Analysis
   1.3. Specification of a Regression Model
   1.4. Organization of the Regression Analysis

2. Simple Linear Regression Model
   2.1. Introduction and examples
   2.2. Data graphs
   2.3. Specification of a Linear Regression Model
   2.4. Parameter estimation
   2.5. Inference on the model parameters
     2.5.1. Hypothesis testing
     2.5.2. Confidence intervals
   2.6. Prediction of new observations
   2.7. Coefficient of Determination
   2.8. Correlation
   2.9. Examples

3. Measures of model adequacy
   3.1. Residual analysis
   3.2. Outliers
   3.3. Transformations
     3.3.1. To a straight line
     3.3.2. To stabilize variance
   3.4. Methods to select a transformation

4. Multiple Linear Regression
   4.1. Introduction and examples
   4.2. Interpretation of regression coefficients
   4.3. Parameter estimation
     4.3.1. Ordinary Least Squares estimation
     4.3.2. Geometrical interpretation
   4.4. Inadequacy of some data graphs in Multiple Linear Regression
   4.5. Inference on the model parameters
     4.5.1. Hypothesis testing: significance of regression; general linear hypothesis
     4.5.2. Confidence intervals
     4.5.3. Simultaneous inference
   4.6. Prediction of new observations
   4.7. Standardized regression coefficients
   4.8. Multiple correlation coefficient

5. Regression Diagnostics and model violations
   5.1. Some types of residuals
   5.2. Residuals plot
   5.3. Linearity assumption
   5.4. Normality assumption
   5.5. Influence diagnostics
   5.6. Multicollinearity
   5.7. Additional predictors
   5.8. Heteroskedasticity and autocorrelation

6. Polynomial regression
   6.1. Introduction
   6.2. Polynomial model in one variable
   6.3. Polynomial model in more than one variable

7. Variable selection
   7.1. Consequences of model misspecification
   7.2. Evaluation of subset regression models
   7.3. All possible regressions
   7.4. Stepwise regression

8. Indicator variables as regressors
   8.1. Use of indicator variables
   8.2. Qualitative data
   8.3. Interaction
   8.4. Indicator response variables

9. Logistic regression
   9.1. Introduction and examples
   9.2. Parameter estimation: maximum likelihood and nonlinear least squares
   9.3. Measures of model adequacy
   9.4. Inference
   9.5. Regressor selection

10. Nonlinear Regression
   10.1. Model specification
   10.2. Iterative estimation: nonlinear least squares
   10.3. Linear approximation and normal approximation
   10.4. Inference

Practical demonstration: SPSS


Ryan, T. P. (1997) Modern Regression Methods. New York: Wiley
Draper, N. R. and Smith, H. (1998) Applied Regression Analysis. Third edition. New York: Wiley
Greene, W. H. (2007) Econometric Analysis. Prentice Hall
Seber, G. A. F. and Lee, A. J. (2003) Linear Regression Analysis. Second edition. New Jersey: Wiley
Montgomery, D. C. and Peck, E. A. (1992) Introduction to Linear Regression Analysis. Second edition. New York: Wiley
Chatterjee, S., Hadi, A. S. and Price, B. (2000) Regression Analysis by Example. Third edition. New York: Wiley
Goldberger, A. S. (1998) Introductory Econometrics. Harvard University Press


The attendant is supposed to be familiar with basic concepts on Probability and Statistical Inference: probability, random variables, discrete and continuous distributions, normal distribution, random sample, maximum likelihood estimator, confidence interval, hypothesis testing…

Readings before coming

The attendat will benefit more from the course if he reads before coming:


Course 11

Practical statistical questions


1. I would like to know the intuitive definition and use of …: The basics
  1.1. Descriptive vs inferential statistics
  1.2. Statistic vs parametric. What is a sampling distribution?
  1.3. Types of variables
  1.4. Parametric vs non-parametric statistics
  1.5. What to measure? Central tendency, differences, variability, skewness and kurtosis, association
  1.6. Use and abuse of the normal distribution
  1.7. Is my data really independent?

2. How do I collect the data? Experimental design
  2.1. Methodology
  2.2. Design types
  2.3. Basics of experimental design
  2.4. Some designs: Randomized Complete Blocks, Balanced Incomplete Blocks, Latin squares, Graeco-latin squares, Full 2k factorial, Fractional 2k-p factorial
  2.5. What is a covariate?

3. Now I have data, how do I extract information? Parameter estimation
  3.1. How to estimate a parameter of a distribution?
  3.2. How to report on a parameter of a distribution? What are confidence intervals?
  3.3. What if my data is “contaminated"? Robust statistics

4. Can I see any interesting association between two variables, two populations, …?
  4.1. What are the different measures available?
  4.2. Use and abuse of the correlation coefficient
  4.3. How can I use models and regression to improve my measure?

5. How can I know if what I see is “true”? Hypothesis testing
  5.1. The basics
    5.1.a. What is a hypothesis test?
    5.1.b. What is the statistical power?
    5.1.c. What is a p-value? How to use it?
    5.1.d. What is the effect size?
    5.1.e. What is the relationship between sample size, sampling error, effect size and power?
    5.1.f. What are the assumptions of hypothesis testing?
  5.2. How to select the appropriate statistical test
    5.2.a. Tests about a population central tendency
    5.2.b. Tests about a population variability
    5.2.c. Tests about a population parameter
    5.2.d. Tests about differences between populations
    5.2.e. Tests about the ordering of data
    5.2.f. Tests about distributions
    5.2.g. Tests about correlation/association measures
  5.3. Multiple testing
  5.4. Permutation tests
  5.5. Words of caution

6. How many samples do I need for my test?: Sample size
  6.1. Basic formulas for different distributions
  6.2. Formulas for samples with different costs
  6.3. What if I cannot get more samples? Resampling: Bootstrapping, jackknife

7. Can I deduce a model for my data?
  7.1. What kind of models are available?
  7.2. How to select the appropriate model?
  7.2. Analysis of Variance as a model
    7.2.a. What is ANOVA?
    7.2.b. What is ANCOVA?
    7.2.c. How do I use them with the pretest and the posttest designs?
    7.2.d. What are planned and post-hoc comparisons?
    7.2.e. What are fixed effects and random effects?
    7.2.f. When should I use Multivariate ANOVA (MANOVA)?
  7.3. Regression as model
    7.3.a. What are the assumptions of regression?
    7.3.b. Are there other kinds of regression?
    7.3.c. How reliable are the coefficients? Confidence intervals
    7.3.d. How reliable are the coefficients? Validation
    7.3.e. When should I use nonlinear regression?

Practical sessions: study of cases of different fields (economics, biology, engineering, computer science, ...)



David J. Sheskin. Handbook of Parametric and Nonparametric Statistical Procedures. Chapman & Hall/CRC (2007)
R. R. Newton, K. E. Rudestam. Your Statistical Consultant: Answers to Your Data Analysis Questions. Sage Publications, Inc (1999)
G. van Belle. Statistical Rules of Thumb. Wiley-Interscience (2002)
P. I. Good, J. W. Hardin. Common Errors in Statistics (and How to Avoid Them). Wiley-Interscience (2006)


The attendant is supposed to be familiar with the basics of probability, experimental design, hypothesis testing, regression and ANOVA.

Readings before coming

The attendant will benefit more from the course if he reads before attending:


Course 12

Missing data and outliers


1. Missing Data
   1.1. Introduction
   1.2. Typology of missing data
      1.2.1. Missing completely at random (MCAR)
      1.2.2. Missing at random (MAR)
      1.2.3. Non-ignorable missingness Depends on unobserved predictors Depends on the missing value itself
   1.3. Simple missing-data methods
      1.3.1. Missing-data methods that discard data
      1.3.2. Missing-data methods that retain all the data
   1.4. Imputation Methods
      1.4.1. Single Imputation Regression imputation Hot-deck imputation
      1.4.2. Multiple Imputation Iterative regression imputation Likelihood-based approach. EM Algorithm Markov Chain Monte Carlo (MCMC)
   1.5. Diagnostics and Overimputing.

2. Outliers and Robust Statistics
   2.1. Introduction
   2.2. Typology of outliers
      2.2.1. Additive Outliers
      2.2.2. Level Shifts
      2.2.3. Innovational Outliers
   2.3. Influence measures
   2.4. Robust methods
      2.4.1. M-estimates of location and scale
      2.4.2. Robust inference
      2.4.3. Huber estimators for regression
      2.4.4. Some robust techniques for multivariate analysis and time series
      2.4.5. Software based on R.

Practical demonstration: R


Course 13

Hidden Markov Models


1. Introduction
   1.1. Introduction to Hidden Markov Models
   1.2. Hidden Markov Models definition
   1.3. Application of HMMs to speech recognition
   1.4. Overview of quantification

2. Discrete Hidden Markov Models
   2.1. Presentation of Discrete HMMs: model description
   2.2. HMM simple examples

3. Basic algorithms for Hidden Markov Models.
   3.1. Forward-backward
   3.2. Viterbi decoding
   3.3. Baum-Welch reestimation
   3.4. Practical issues for the implementation

4. Semicontinuous Hidden Markov Models
   4.1. Overview, advantages and disadvantages
   4.2. Formulae modification in the basic algorithms

5. Continuous Hidden Markov Models.
   5.1. Overview, advantages and disadvantages
   5.2. Formulae modification in the basic algorithms
   5.3. Multi-Gaussian modeling

6. Unit selection and clustering
   6.1. Considerations for unit selection in Hidden Markov Models
   6.2. Parameter sharing
   6.3. Unit clustering in HMMs

7. Speaker and Environment Adaptation for HMMs
   7.1. Adaptation modes
   7.2. Maximum Likelihood Linear Regression (MLLR)
   7.3. Maximum a Posteriori (MAP)
   7.4. Rapid adaptation

8. Other applications of HMMs
   8.1. HMMs for alignment and gene finding in DNA
   8.2. HMMs for handwritten word recognition
   8.3. HMMs for lip reading
   8.4. HMMs for stroke recognition in tennis videos
   8.5. HMMs for electricity market modelling

Practical demonstration: An open-source solution for HMM modeling: The HTK toolkit from Cambridge University.


Hidden Markov Models for Speech Recognition. X.D.Huang, J. Ariki, M. A. Jack. Edinburgh University Press, 1990.
Spoken Language Processing. Huang, X., Acero, A., Hon, H.W. Ed. Prentice Hall, New Jersey, 2001.


The attendant should be familiar with some basics in pattern recognition, multivariate Gaussian distribution, dynamic programming

Readings before attending

Attendant will benefit more from the course if they read before coming:


Course 14

Statistical inference


1. Introduction
   1.1. The general problem of Statistical inference
   1.2. Deduction vs induction
   1.3. Statistics and Probability
   1.4. Estimation
   1.5. Hypothesis Testing
   1.6. Decision Theory
   1.7. Examples

2. Some basic statistical test
   2.1. Cross tabulation
   2.2. Chi Square test
   2.3. Nominal data cross tabulation tests
   2.4. Ordinal data cross tabulation tests
   2.5. Nominal by scale test
   2.6. Concordance measures
   2.7. T test for comparing means: paired and independent samples
   2.8. Non-parametric versions
   2.9. One Way ANOVA. Non parametric version
   2.10. Comparing variances of two samples, the F distribution
   2.11. Correlations and partial correlations
   2.12. Regression and non-linear regression
   2.13. Kolmogorov-Smirnov test
   2.14. Run Test
   2.15. Randomized tests

3. Multiple testing
   3.1. The Sidak correction
   3.2. The Bonferroni correction
   3.3. Holm's step-wise correction
   3.4. The False Discovery Rate
   3.5. Permutation correction

4. Introduction to bootstrapping
   4.1. Parametric bootstrapping
   4.2. Nonparametric bootstrapping

Practical demonstration: SPSS


Essentials of Statistical Inference, G.A. Young and R.K. Smith. Cambridge University Press
Statistical Inference, S.D. Silvey. Chapman & Hall Monographs on Statistics and Applied Probability, 7
Statistical Inference, Adelchi Azzalini. Chapman & Hall Monographs on Statistics and Applied Probability, 68
Applied Linear Statistical Models, Neter et al, Mc Graw Hill


The student is assumed to be familiar with the basics of probability, random variables and probability distributions (binomial, Poisson, normal, t-Student, Chis square and F), concepts of random sampling and estimators.

Readings before coming

The student will benefit more from the course if he reads before attending:


Course  15

Feature subset selection


1. Introduction

2. Filter approaches
   2.1. Introduction
   2.2. Univariate filters: parametrics (t-test, ANOVA, Bayesian, regression) and model-free (Wilcoxon rank-sum, BSS/WSS, rank products, random permutations, TNoM)
   2.3. Multivariate filters: bivariate, CFS, MRMR, USC, Markov blanket

3. Wrapper methods: sequential search, genetic algorithms, EDAs

4. Embedded methods: random forest, weight vector of SVM, weights of logistic regression, regularization

5. Practical exercises: Weka, Spider (Matlab), SVM and Kernel (matlab), SAM (R), GALGO (R), EDGE (R)

Practical demonstration: MATLAB, Weka, R


H. Liu, H. Motoda (2008). Computational Methods of Feature Selection. Chapman and Hall/CRC

H. Liu, H. Motoda (1998). Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers.

Y. Saeys, I. Inza, P. Larrañaga (2007). A review of feature selection techniques in bioinformatics. Bioinformatics, 23(19), 2507-2517.


Course 16

Introduction to R


1. Introduction
2. An introductory R session
   2.1. The R Environment
   2.2. Getting help
   2.3. R Packages
   2.4. Editors for R scripts
   2.5. Simple manipulations and basic commands
3. Data in R
   3.1. Objects in R
   3.2. Basic operators
   3.3. Data access and manipulation
4. Importing/Exporting data
5. Programming in R
   5.1. Loops and conditionals
   5.2. Writing R functions
   5.3. Debugging
6. R Graphics
   6.1. Types of graphs
   6.2. The plot function
   6.3. Editing graphs
   6.4. Trellis graphics
7. Statistical Functions in R
8. R session 1: Feature selection
9. R session 2: Clustering Analysis
10. R session 3: Survival Analysis


• Jason Owen: “The R Guide"
• W. N. Venables and D. M. Smith. al.: An Introduction to R
• Brian S. Everitt and Torsten Nothorn: A Handbook of Statistical Analyses Using R
• Tutorials and manuals provided by R users at the R web site:


Course  17

Unsupervised pattern recognition (Clustering)


1. Introduction
   1.1. Problem formulation
   1.2. Types of features
   1.3. Feature extraction
   1.4. Graphical examination
   1.5. Data quality
   1.6. Distance measures
   1.7. Preprocessing
   1.8. Data reduction
   1.9. Types of clustering

2. Prototype-based clustering
   2.1. K-means: problem formulation
   2.2. K-means: suboptimal solution
   2.3. K-means: initialization
   2.4. K-means: limitations
   2.5. ISODATA
   2.6. Fuzzy K-means
   2.7. Mixture models (EM algorithm)
   2.8. Self-Organizing Maps
   2.9. Extensions

3. Density-based clustering
   3.1. DBSCAN (Density based spatial clustering of applications with noise)
   3.2. Grid-clustering
   3.3. Denclue (density clustering)
   3.4. More algorithms

4. Graph-based clustering
   4.1. Hierarchical clustering: introduction
   4.2. Hierarchical clustering: locally optimal algorithm
   4.3. Hierarchical clustering: linking comparison
   4.4. Chameleon
   4.5. Hybrid graph-density based clustering: SNN-DBSCAN
   4.6. Extensions

5. Cluster evaluation
   5.1. Clustering tendency
   5.2. Unsupervised cluster evaluation
   5.3. Supervised cluster evaluation
   5.4. Determining the number of clusters

6. Miscellanea
   6.1. Categorical clustering
   6.2. Conceptual clustering
   6.3. Subspace clustering
   6.4. Information theoretic clustering
   6.5. Ensemble/Consensus clustering
   6.6. Semisupervised clustering
   6.7. Clustering with obstacles
   6.8. Biclustering, coclustering, two-mode clustering
   6.9. Turning a supervised classification algorithm into a clustering algorithm

Practical demonstration: MATLAB


Data Mining: Concepts and Techniques by Jiawei Han, Micheline Kamber. Morgan Kaufmann (2000)

Principles of Data Mining by David Hand, Heikki Mannila, Padhraic Smyth. MIT Press (2001)


Course 18

Evolutionary computation


1. Introduction
   1.1. Genetic algorithms
   1.2. Genetic programming
   1.3. Robust intelligent systems
   1.4. Self-adapting intelligent systems
   1.5. Sub-symbolic knowledge representation: artificial neural networks
   1.6. Symbolic knowledge representation: rule and fuzzy-rule based systems

2. Genetic algorithms
   2.1. How do they work?
   2.2. Main features
   2.3. Problem codification
   2.4. Convergence
   2.5. Reproduction operators
   2.6. Mutation operators
   2.7. Crossover operators for finite alphabets
   2.8. Crossover operators for real numbers codification methods
   2.9. Individual replacement

3. Genetic programming
   3.1. Grammar-guided genetic programming
   3.2. Initialization methods
   3.3. How initialization method influences the evolution process
   3.4. Classic crossover operators
   3.5. The grammar-based crossover operator
   3.6. Mutation operators

4. Robust and self-adapting intelligent systems
   4.1. Fitness function
   4.2. Binary direct codification method of neural architectures
   4.3. Grammar codification method
   4.4. Basic neural architectures codification method
   4.5. Training neural networks with genetic algorithms
   4.6. Designing neural architectures with genetic programming
   4.7. Rule and fuzzy rule based systems
   4.8. Real-world applications

5. Introduction to Estimation of Distribution Algorithms
   5.1. Probabilistic modelling in optimization
   5.2. Components of an EDA
   5.3. EDAs based on univariate probabilistic models
   5.4. EDAs based on multivariate probabilistic models
   5.5. EDAs for problems with discrete representation
   5.6. EDAs for problems with continuous and mixed representation

6. Improvements, extensions and applications of EDAs
   6.1. Parallel EDAs
   6.2. Fitness partial evaluation using probabilistic models
   6.3. Multi-objective optimization using EDAs
   6.4. Hybrid EDAs
   6.6. Applications in Bioinformatics
   6.7. Applications in Robotics
   6.8. Applications in Engineering and Scheduling

7. Current research in EDAs
   7.1. Information extraction from probabilistic models learned by EDAs
   7.2. EDA approaches to highly complicated, constrained, mixed and other difficult problems
   7.3. Addition of prior knowledge about the problem domain into the search

Practical Demonstration: MATLAB and GeLi library


  • A.E. Eiben, J.E. Smith. Introduction to Evolutionary Computing. Springer-Verlag Berlin Heidelberg New York (2003)
  • M. Mitchell. An Introduction to Genetic Algorithms. MIT Press (1998)
  • W.B. Langdon, R. Poli. Foundations of Genetic Programming. Springer-Verlag Berlin Heidelberg New York (2002).
  • Estimation of Distribution Algorithms. A New Tool for Evolutionary Computation. Series: Genetic Algorithms and Evolutionary Computation , Vol. 2 , Pedro Larrañaga and , José A. Lozano (Eds.) 2001.


The attendant is supposed to be familiar with neural networks and the structure of knowledge based systems, and with basic statistical concepts and has attended the course on Bayesian networks.

Reading before coming

The attendant will benefit more from the course if he reads before attending:

Daniel Manrique and Juan Ríos, Artificial Neural Networks. A Survey of Optimization by Building and Using Probabilistic Models Computational Optimization and Applications Volume 21 , Issue 1 (January 2002) Pages: 5 – 20. by: Martin Pelikan, David E Goldberg, Fernando G Lobo.