Target Audience: PhD students, Post-doctoral fellows, Research Staff and PIs that work in any field
related to microbiology that requires to understand or predict gene function.
Summary: Methods to allow experimental scientists lacking computer programming skills to
efficiently use the genomic and post-genomic data that is freely available over the web will be
presented with practical examples taken mainly from the field of microbial metabolism and
regulation. The main focus of the workshop is on predicting gene function. Workshop attendees
can come with a specific problem that can be solved by comparative genomic method or a
protein family of interest. A detailed schedule is provided on the next page
Instructors
Valérie de Crécy-Lagard, U. of Florida
Stéphane Descorps-Declere, Institut Pasteur
Alexandra Calteau, Genoscope
David Roche, Genoscope
Rémi Zallot U. of Illinois
TO SIGN UP : https://c3bi.pasteur.fr/?p=10025
Detailed Schedule
Day 1 :
Module 1: Basic bioinformatics tools . to gather and manipulate data This module is to bring everyone up to
date on the basic tools that will be routinely used in the course
● Extracting Genes, genomes , proteins, families ( NCBI, Uniprot, Patric, IMG, Microscope )
● Complex queries ( NCBI, Uniprot, Microscope)
● Sequence similarity tools: Advanced Blast, Multiple alignments and Sequence Logos
● Reformating, ID mapping
● Small set of command line or regular expression tools to help deal with large data sets
Day 2 :
Module 2: Linking genes to pathway and pathway to genes . This module will focus on pathway databases,
metabolic reconstruction and models and how mapping a gene to a pathway or more generally to biological
system can ground truth a functional annotation .
● Issues with automatic functional annotations
● Pathway reconstruction and Identification of pathway holes ( KEGG, BioCyc, IMG, Patric,
Microscope)
Module 3: Non homology based association methods to find candidates for globally and locally missing
genes
● Tools to identify gene fusion, synteny/physical clustering.
● Tools to look at phylogenetic distribution patterns and do phylogenetic distribution queries.
● Graphics tools to visualize and represent physical clusters
Module 4: Whole genome comparisons
● Pathogenicity island/PanGenomes/Secondary metabolite cluster
Day 3 :
Module 5: Regulatory based associations . This module focuses on identifying regulatory sites, predicting
regulatory networks, mining transcriptome data, and generating heatmaps and venn diagrams. the different
pipelines available for transcriptomic data will also be presented.
Module 6: Associations based on experimental data . This module focuses on mining other types of
experimental data Phenotype/fitness, protein interaction and complexes, localization, metabolomics.
Day 4 :
Module 7: Extracting and comparing structures using PDB and NCBI resources.
Module 8: Paralogs a blessing and a curse . This module will focus on the tools and strategy to disambiguate
paralog families. This includes, Basic phylogenetic tree building, paralog separation tools, building and
comparing logos, and iTOL
Day 5 :
Module 9 : :Building Sequence Similarity Networks to analyze protein families.
Using Enzyme Function initiative platform to build Sequence Similarity Networks and Gene Neighborhood
Networks (https://enzymefunction.org/)
Module 8: Putting it together with student examples. During the five days and on the last day of class
students will have the opportunity to work on a protein family of their choice with the help of the instructors.
*Order and content of modules might still be slightly modified