Zu Hauptinhalt springen

Projects open

Use new sequencing technologies for faithful alignments and improved detection of differential expressed genes

Status: open

Thesis: Master

Field: Genomics, NGS

Advisor: N. Strieder  

Courses Required: Transcriptomics with RNA-Seq or Sequencing, Practical BioinformaticsII

Objective:

Transcriptome data from a new sequencing platform, PacBio, are now available for the human cell lines MCF7, hESC H1 and GM12878. These data comprise full length transcript sequences of nearly all expressed transcripts in these cells. In current analysis workflows RNA-Seq data from Illumina is mapped to the genome and analysed with Tophat or Cufflinks which predict transcript isoforms from canonical splice sites, causing large numbers of false transcript isoforms. This happens especially with long genes containing many exons or genes containing intronic transposable elements.

A more powerful approach would be to use PacBio information and build from these transcript graph models. Using an EM algorithm short RNA-Seq reads from Illumina could then be distributed to represent the different transcripts based on the abundance of fragments found. Here a method similar to StringTies maximum flow algorithm could be used.

Consequently, quantification of transcripts should be markedly improved.

 

Therefore, we want to compare an traditional RNASeq analysis method based on Tophat/Cufflinks with the new approach exploiting PacBio transcripts for read alignment. Based on these data we want to quantify the different transcripts based on the Illumina RNASeq data and search for differential expressed transcripts and isoforms.

This way we can measure the information gain acquired by this more exact method regarding transcript isoform quantification.

 

Data: public available RNASeq data from MCF7 cells, PMID: 26153859 and or PMID: 24319002

 

First Steps: Analyse data with standard alignment, tophat and limma-voom

 

Questions/Tasks: Design new mapping strategy using PacBio reads as base, potentially using StringTie. Quantify isoforms.

Measure the information gain by the new method.

 

Start Reading:

Exploiting single-molecule transcript sequencing for eukaryotic gene prediction. Minoche et al. (2015), PMID: 26328666

 

Stringtie enables improved reconstruction of a transcriptome from rNA-seq reads. Pertea et al. Nature Biotechnology (2015) PMID: 25690850

  1. STARTSEITE UR