69703 - Statistical Analysis of Data in Nuclear and Subnuclear Physics

Academic Year 2017/2018

  • Moduli: Maximiliano Sioli (Modulo 1) Tommaso Chiarusi (Modulo 2) Gabriele Sirri (Modulo 3)
  • Teaching Mode: Traditional lectures (Modulo 1) Traditional lectures (Modulo 2) Traditional lectures (Modulo 3)
  • Campus: Bologna
  • Corso: Second cycle degree programme (LM) in Physics (cod. 8025)

Learning outcomes

At the end of the course students get knowledge of the main statistical tools used in high energy physics, with and without accelerators. The course is complemented with exercise and laboratory sessions.

Course contents

The course is structured in three modules: theory, exercises and lab with hands-on exercises.

Detailed program:

Concept of probability: axiomatic, combinatorial, frequentist and subjective. Conditional probability. Statistical independence. Bayes' theorem. 


Random variables and probability density functions. Multivariate distributions. Marginal and conditional densities. Functions of random variables. Distribution moments: expectation value, variance, covariance. Error propagation in the presence of correlated variables.   Examples of probability distributions: Binomial, Multinomial, Poisson, Exponential, Normal (multivariate), Chi-square, Breit-Wigner, Landau.

Characteristic functions and their applications. Central Limit Theorem.

Statistical inference. Fisher information. Test statistics and sufficient test statistics.

Monte Carlo method: convergence criteria, law of large numbers, calculation of integrals and their uncertainties. Variance reduction. Random number generators. Sampling a generic distribution.

Generalities on statistical estimators. Test statistics and estimators. Estimators for the expectation value, variance and correlation. Variance of the estimators. The maximum likelihood method. Score and Fisher information. Multi-parametric estimator uncertainties with correlations. Extended Maximum Likelihood. Bayesian estimators, Jeffrey's priors. Least squares method.

Hypothesis testing. Simple hypotheses. Efficiency and power of the test. Neyman-Pearson lemma. Linear test, Fisher's discriminant. Multivariate methods: Neural Networks, Boosted Decision Tree, k-Nearest Neighbor. Statistical significance. P-values. Look-Elsewhere Effect. Chi-square method for hypothesis testing.

Exact methods for the construction of confidence intervals. Gauss and Poisson case. Unified approach. Bayesian method. CLs method. Systematic errors and nuisance parameters in the calculation of confidence intervals. Frequentist and Bayesian methods. Asymptotic properties.

Lab: Elements of C++ and ROOT. RooFit Workspace, Factory, composite models, multi-dimensional models. Use of RooStats to compute confidence intervals, Profile Likelihood, Feldman-Cousins, Bayesian intervals, w/ and w/o nuisance parameters. Use of TMVA as classifier, description of TMVAGui.

Readings/Bibliography

Basic texts:
  •  Glen Cowan, Statistical Data Analysis, Oxford Univ. Press, 1998
  • Frederick James, Statistical Methods in Experimental Physics, World Scientific, 2007

In-depth texts:

  •  O. Behnke et al., Data Analysis in High Energy Physics: A Practical Guide to Statistical Methods, Wiley, 2013
  • A. G. Frodesen, O. Skjeggestad, H. Toft, Probability and Statistics in Particle Physics, Universitetforlaget, 1979

Bayesian statistics:

  •  G. D'Agostini, Bayesian reasoning in data analysis - A critical introduction, World Scientific Publishing, 2003

Monte Carlo method:

  •  A. Rotondi et al., Probabilità, Statistica e Simulazione, Springer, 2012

Teaching methods

Frontal lessons (blackboard and slides), exercises (blackboard) and laboratory sessions with statistical tools to solve practical problems.

Assessment methods

Oral examinatons. The access to the examination is subject to the accomplishement of laboratory tests, even if they do not affect the final grade. Dates must be arranged with the teachers (mail to maximiliano.sioli@unibo.it). Master students will be examined on the whole course while Ph.D students will be asked to prepare a short seminar (30 min) about their activity focusing on the statistical treatment of the data.

Teaching tools

Lecture notes are available in AMS Campus.

Office hours

See the website of Maximiliano Sioli

See the website of Tommaso Chiarusi

See the website of Gabriele Sirri