The ever growing usage of high throughput technologies in Biology is revolutionizing the life sciences and profoundly changing its practices. Scripting languages are used on a daily basis in life science labs in order to mine huge data sets produced by high-throughput devices. This two-week course will give participants basic knowledge in python and state-of-the-art machine learning methods to analyze their own data sets.
Description:
This course is intended for PhD students, engineers and research scientists willing to acquire knowledge in scientific programming. Throughout the course, we will use Python language to lead participants from the basics of computer programming to more advanced techniques such as practical machine learning techniques.
The course is divided into three main topics. We first expose students to the basics of programming using Python language. Then, we discuss about different data formats and data mangling techniques using state of the art scientific packages (e.g., Pandas library). Finally, we will have a tour of machine learning algorithms using the scikit learn package.
Syllabus
Chapter 1: Python basics (10 hours)
- General introduction
- Programming (iteration, conditions)
- Python data structures and types
- Functional programming
- Overview of OOP (object oriented programming) in python
Chapter 2 : Data handling (20 hours)
- Introduction to Data Modelling
- Fast and efficient DataFrame in python (with Pandas)
- Data visualization with python (with matplotlib)
Chapter 3: Machine learning with scikit learn (20 hours)
- Supervised classification (decision trees, random forest, linear approaches, SVM)
- Regression (Ordinary Least Squares Regression, Elastic Net, Lasso)
- Neural networks and deep learning
- Clustering (kmeans, hierarchical clustering)
- Model selection (cross validation, bootstrap, AIC, BIC)