We have a lot of data. How can we understand it? How can we extract meaningful insight from it? In many places around the globe, researchers in academia and industry are using Machine Learning to answer these questions. Artificial Intelligence tools provide big advantages for the scientist that use them. This workshop (also known as “The Machine Atelier”) aims to update us into the current state-of-the-art computational and mathematical tools that are useful in Molecular Biology, Physics, and other sciences.

The challenge for contemporary science is moving from data production to knowledge discovery. Ideas such as Data Mining and Big Data are popular today. We can track their origin to more than 30 years ago, when Artificial Intelligence and Neural Networks were being developed. Today computers are much more powerful (by several orders of magnitude) and data is abundant. Collecting and storing data is cheap and easy.

Although these techniques can be used in a wide variety of subjects, we will focus on scientific applications, mainly in Molecular Biology and in Physics

Work plan

We meet every Monday at 17:00 in the MBG II room, at the Science Faculty. Bring your computer. We will use R and Python, and discuss the mathematics behind the tools.

The idea is to follow the book “Hands-On Machine Learning with Scikit-Learn and TensorFlow” by Aurélien Géron, and some exercises in R. We will also discuss some of the underlaying mathematical theory, so we can really understand the tools.

You can preview the book on Google Books. I will keep a little library of books on the bookshelf in my office, so anybody can check them. They include books about Python and computing techniques, some math books about Pattern Recognition and Classification, and something about the ethical and philosophical implications of Machine Learning.

Topics

This is the initial plan. We may add more topics later, inspired by the textbook table of contents.

Unsupervised classification, also known as Clustering or Pattern recognition
- K-means
  - distance
- Self Organized maps (Kohonen networks)
- Hierarchical clustering
  - average, simple, and complete linkage
Supervised classification, also known as Machine Learning or Pattern classification
- K-nearest neighbors
- Bayesian classification
- Linear models
  - Probit
  - Logit
- Classification and Regression Trees
- Support Vector Machines
- Neural Networks

Tools

We will use publicly available machine learning libraries written for Python including:

Scikit-learn: general purpose machine learning library
Keras: deep learning Python library
Tensorflow

Learning Machine Learning

With applications in Molecular Biology, Physics, and other Sciences

Work plan

Topics

Tools