This course establishes a foundation in applied statistics and data science for those interested in pursuing data-driven research. The course may involve examples from any area of science, but it places a special emphasis on modern biological problems and data sets. Topics may include data wrangling, data exploration and visualization, statistical programming, reproducible data analysis, likelihood based inference, Bayesian inference, bootstrap, EM algorithm, regularization, statistical modeling, principal components analysis, latent variable modeling, multiple hypothesis testing, and causal inference. The statistical programming language R is used extensively to explore methods and analyze data.
SML 201 is an introduction to the burgeoning field of data science, which is primarily concerned with data-driven discovery and utilizing data as a research and technology development tool. We cover approaches and techniques for obtaining, organizing, exploring, and analyzing data, as well as creating tools based on data. Elements of statistics, machine learning, and statistical computing form the basis of the course content. We consider applications in the natural sciences, social sciences, and engineering. Note: I no longer teach this course, but I will continue to make the materials available.