A network of specialised and complementary platforms

The FRANCE GENOMIQUE infrastructure brings together the majority of sequencing and/or bioinformatics platforms in France:

  • the national platforms of Genoscope and CNRGH in Evry, provide sequencing, genotyping and bioinformatics capacities to enable the completion of very large-scale projects,
  • "Local" sequencing platforms each with their own specific expertise and technologies as well as ad hoc bioinformatics tools
  • bioinformatics platforms, whether or not associated with a sequencing platform, and which are either dedicated to the processing of sequencing data (primary or secondary analyses) or also specialised in the development of innovative data processing tools. Most of the bioinformatics platforms of FRANCE GENOMIQUE are also partners of the IFB (Institut Français de Bio-Informatique )
  • the platforms associated with FRANCE GENOMIQUE,
  • the CEA's TGCC  where data storage and processing spaces are allocated to FRANCE GENOMIQUE partners.
France Génomique Plateforms
  • National sequencing plateforms
  • Local sequencing plateforms
  • Bio-informatic plateforms
  • Associated plateforms
  • TGCC : Très Grand Centre de Calcul

Very high throughput and 3rd generation sequencing instruments

FRANCE GENOMIQUE is equipped with the most efficient and innovative technologies dedicated to sequencing. The fleet is evolving very rapidly with very high throughput (NovaSeq 6000) and so-called 3rd generation sequencing instruments (unique molecule technology).

FRANCE GENOMIQUE is thus able to respond to all requests for sequencing projects

An high performance computation infrastructure

The CEA “Très Grand Centre de Calcul” (TGCC) is an infrastructure dedicated to high-performance computing, capable of hosting petaflop-scale supercomputers and designed on the basis of a data-driven architecture. Within the TGCC, the CCRT has an extension dedicated to the users of the FRANCE GENOMIQUE project.

This e-infrastructure for data storage and processing, implemented by the CEA/DIF teams, allows GENOMIC FRANCE users to benefit from a medium-term storage space (scale: scientific projects lasting several years) of several petabytes, connected to several thousand scalar computing cores by a high-performance interconnection. As the volumes of data to be stored and processed increase exponentially, it is also designed to be scalable, with the objective of meeting all the challenges of genomics in the future.

Equipment and capacity

The set up dedicated to FRANCE GENOMIQUE is composed of :

  • 180 dual processor nodes (Intel Sandy Bridge E5-2680, 2.7 GHz, 8 cores) with 128 GB of memory per node, i.e a total of 2,880 cores (Bull),
  • 2 Bullx S6410 very large memory systems with 2 TB of memory,
  • 9 hybrid blades, equipped with nvidia Kepler GPUs.

Data hosting is achieved through the following storage configuration:

  • Medium-term storage with a global file system of 5 PB, including 2 PB on disk (hierarchical storage system Lustre + IBM HPSS),
  • Archive system for preliminary data.

Expertise and support

The CEA/DIF teams have developed internationally recognized expertise and competence both in the management of very large volumes of data (contribution to Opensource developments, management of EOFS, etc.) and in the definition and management of very large data centres. Assistance and support teams are at hand to help users make the most of the centre resources.

A dedicated application support team is provided by the national platforms (CEA) on behalf of FRANCE GENOMIQUE.

Main achievements

In order to characterize a set of 83 protein families without known functions and including some 60,000 sequences, Genoscope researchers conducted a modelling campaign on the CCRT Titane supercomputer. This phase, which would have required 280,000 hours of computation, could be performed in only 70 hours on 4,000 processors. From the results, the researchers created a catalogue of specific structural signatures for each of the families studied. This catalogue will provide biochemists with valuable information to discover new enzymatic activities.

The Genoscope has been using the TGCC/CCRT calculation resources for several years now, particularly via the DARI calls for projects. In this context, the TARA OCEANS project has benefited from more than 3.5 million hours of calculations to study the diversity of marine organisms. To do this, various sequence analysis tools have been ported to this infrastructure: BLAST, BLAT, InterProScan & CDDsearch. Specific codes have been designed and deployed to adapt these tools to the technical constraints of operating CCGT machines (massive data parallelization, execution control, error recovery, short unit jobs).

For further information

Web site : www-hpc.cea.fr/, www-ccrt.cea.fr

Plateform manager : Pierre Leca

CEA DAM-île de France
Bruyères-le-Châtel
91297 Arpajon Cedex

Contact: e-infrastructure@france-genomique.org

Illumina offers high and very high throughput sequencing.

After clonal amplification of short DNA fragments, sequencing by synthesis (SBS) begins: each base emits a unique fluorescence signal when added to the strand being synthesized. The detection of the signal at each incorporation determines the DNA sequence.

MiniSeq

Run time
20 hours
Maximum Output
7.5 Gb
Maximum Reads per length
44-55 millions
Maximum Reads length
2X150 bp
Plateform equipped
MGX

MiSeq

Run time
4-55 hours
Maximum Output
13.2-15 Gb
Maximum Reads per length
40-50 millions
Maximum Reads length
2X300 bp
Plateform equipped
Institut Curie
Genotoul-Get
CNRGH

NextSeq

Run time
29 hours
Maximum Output
100-120 Gb
Maximum Reads per length
Up to 800 millions
Maximum Reads length
2X150 bp
Plateform equipped
CNRGH
TGML
UCAGenomiX
IBENS

HiSeq 2500

Run time
< 1-3.5 days
Maximum Output
250-300 Gb
Maximum Reads per length
4 billion
Maximum Reads length
2X125 bp
Plateform equipped
Institut Curie
CNRGH
MGX

HiSeq 3000

Run time
< 1-3.5 days
Maximum Output
1300-1500 Gb
Maximum Reads per length
5 billion
Maximum Reads length
2X150 bp
Plateform equipped
Genotoul-Get
GenomEast

HiSeq 4000

Run time
< 1-3.5 days
Maximum Output
1300-1500 Gb
Maximum Reads per length
5 billion
Maximum Reads length
2X150 bp
Plateform equipped
Genotoul-Get
GenomEast

HiSeqX

Run time
< 3 days
Maximum Output
1.6-1.8 Tb
Maximum Reads per length
5,3-6 billion
Maximum Reads length
2X150 bp
Plateform equipped
CNRGH

NovaSeq

Run time
Maximum Output
4800-6000 Gb
Maximum Reads per length
32-40 billion
Maximum Reads length
2X250 bp
Plateform equipped
Institut Curie
Genotoul-Get
CNRGH

Pacific Biosciences offers through the Sequel system, a long reads sequencing of unique molecules, in real time, without synthesis and without amplification The sequencing technique is called SMRT for Single Molecule Real Time sequencing.

Sequel

Smart Cell specifications :

Run time
1 day
Maximum Output
20 Gb
Average read length
up to 30 kb
Single molecule reads
up to 500 000
Platform equipped

Institut Pasteur

Sequel II

Smart Cell specifications :

Run time
30 hours
Maximum Output
500 Gb
Average read length
up to 80-120 kb
Single molecule reads
up to 4 000 000 and 99,9% accuracy on average
Platform equipped

GeT-PlaGe

Gentyane

Institut Curie

Pacific Biosciences Sequel

Oxford Nanopore Technologies offers a technology for real-time DNA and RNA sequencing without synthesis and amplification, where the sequencing is carried out through a nanopore subjected to an electric field.

The ionic current differs according to the base A, T, G or C that blocks the nanopore. The identification of the sequence is done by measuring the evolution of the ionic current passing through the nanopore

MinIon

Maximum Output
10-30 Gb per flow cell
Maximum Reads length
kbs to hundred kbs
Plateform equipped
Genotoul-Get
CNRGH
IBENS
MGX
Genoscope
Institut Pasteur

GridIon

Maximum Output
30 Gb per flow cell
150 Gb for 5 flow cell
Maximum Reads length
kbs to hundred kbs
Plateform equipped
Genotoul-Get

PromethIon

Maximum Output
158 Gb per flow cell
7.6 Tb for 48 flow cell
Maximum Reads length
kbs to hundred kbs
Plateforme equipped
Genotoul-Get
Genoscope
UCAgenomiX
CNRGH

The company 10x Genomics has developed a machine that partially solves the hurdles of short-reads synthetic sequencing (SBS): Chromium.

This system uses an emulsion PCR method. The purpose of creating the emulsion is to encapsulate in a drop of liquid containing all reagents needed, few high molecular weight DNA molecules in the case of a “long synthetic reads sequencing”, or a cell in the case of “single cell sequencing”.

This method allows short reads assemblies (Illumina) to be made via a single barcoding system, facilitating phasing analysis and characterization of chromosome structures. It also allows to study the transcriptome by RNAseq of several thousand unique cells in parallel.

After preparing the libraries, the sequencing is performed on Illumina machines.

Chromium

Plateform equipped
CNRGH
UCAGenomiX
Institut Curie
Genotoul-Get
TGML
ENS
MGX
10xgenomics Chromium

Our expertise

Our equipment

Submit a project