BigData Workshop

100 heures / 12 ECTS


High throughput sequencing data analysis

High throughput sequencing has already revolutionized modern biology and is extending to more and more extensive disciplinary fields (medicine, environment, ecology ...).


It is now necessary for biologists, ecologists and doctors to acquire the basics of handling this BigData data.


The Big Data workshop is for biologists and physicians who have never treated this type of data. Through this workshop, students will gain the basics of manipulation, processing, and statistical testing to independently analyze high-throughput sequencing data.


Applied science applications: Large-scale genotyping of populations (23andMe, myHeritage, forensic science, forensic sciences ...), personalized medicine (genotype of tumors, characterization of genetic diseases), environment and adaptation, etc.


Applications in basic sciences: genomics (DNA-seq) transcriptomic (RNA-seq), epigenomics (ChIP-seq, ATAC-seq ...), 3D analyzes of genomes (4C-seq, Hi-C) ...


At present, thanks to the technological developments of cell isolation and new methods of sequencing is developing a large scale single cell analysis. This new revolution puts forward the methodologies of population genetics and requires specific statistical analyzes.

BigData workshop

Currently, the fields of genomics, epigenomics and evolution are confronted with a very important data processing or with new analytical techniques.

The objective of this 3-week practical course will be to familiarize students with the processing and exploitation of this type of data. At first, a presentation of the most commonly used tools will be made.

Then, the dataset on which the students will work during the workshop will be presented during a conference. This will define the general context of the analysis and specify the methods used to process the data as well as the biological questions asked. The second part of this workshop will be devoted to the processing of these data. During this phase the students will benefit from an individual support by our team of bioinformaticians and biostatisticians. 

 Finally, the last part of the Big Data workshop will be reserved for the statistical analysis of data, in particular the choice of the most relevant statistical tests. In parallel, conferences focused on the information that can be drawn from this type of data (study of polymorphism, genome operation, epigenomic analysis...) will be programmed. In addition, times for collective reflection in the form of forums (on methods, interpretations, the potential of these approaches ...) will be programed.

Description of the course

The Big Data workshop is dedicated to biology and medicine students who wants to acquire skills in NGS data manipulation, treatment and statical analysis.

This training is for beginners, no previous training in computer required.

Through this workshop, students will gain the basics of manipulation, processing, and statistical testing to independently analyze high-throughput sequencing data.

 During this course, students will be able to:

  • Describe the experimental techniques to achieve ChIP-seq experiments

  • Manipulate high-throughput sequencing files. Choose, set parameters for, and execute software packages for data analysis. Perform sequence alignments, filtering, normalization and quality control.

  • Master the different steps of the differential analysis of RNA-Seq data to sort out differentially expressed genes.

  • Analyse ChIp-Seq data and perform peak-calling

  •  Computerize and concatenate bioinformatics tools to create workflows

  •  Choose the apropriate statiscal model and the R package to analyse and correlate the data sets, according to their structure

  •  Execute clustering of the data (hierarchical clustering, PCA...)

  •  Analyze and interpret the experimental results, formulate conclusions or hypotheses from these data. Discuss biases, limitations and errors 

  • Choose the appropriate graphs, and draw figures to visualize high-throughput data

The results of ChIP-seq will be analyzed during the BigData workshop

The Big Data Team

  • Antoine BRANCA

  • Christine DILLMANN

  • Pierre GROGNET

  • Judith LEGRAND

  • Gaëlle LELANDAIS


  • Benoit MOINDROT

The organizers

  • Sébastien BLOYER

  • Pierre CAPY

  • Cécile FAIRHEAD

Planning Big Data - 2019.jpg

Les  résultats de ChIP-seq seront analysés lors de l'atelier BigData