Zu Hauptinhalt springen

Projects closed

Predicting proteomes from transcriptomes

Praktikum (Bachelor/Master)

Status: selected

Field: Genomics, Machine Learning

Advisors: Spang, Rehberg

Courses Required: Practical Bioinformatics I

Objective: In order to make a protein, a cell needs a mRNA. However, gene expression (mRNA quantities) and protein expression (protein quantities) correlate only weakly. Translation is regulated, and so is RNA processing as well as RNA and Protein degradation. All these processes affect mRNA and protein quantities and compromise the correlation between gene and protein expression. However, the regulating processes might leave their own traces in the expression of other genes. Here we aim at predicting the expression of proteins not only from the expression of the corresponding gene, but from the expression of all genes on a microarray.

Data: The ICGC consortium provides paired gene and protein expression profiles for thousands of tumor samples.

First Steps:

  • Become familiar with the ICGC data sets and download a data set (you need to choose a tumor type) you want to work with.
  • Become familiar with Least Angle Regression (a sparse high dimensional regression method) and the R package LARS.
  • For every protein in your data, learn a regression model that predicts the expression of this gene from the expression of a small set of gene expression values including the gene that codes for the protein.

Questions: Does this approach improve on predicting protein expression exclusively from the coding mRNA? For which proteins do we get an improvement? Which genes are included in the models to complement the coding mRNA? What function do these genes have? Can we cluster Proteins by the genes used in predicting their expression?

Start reading:

Engelmann et al. PLoS One 2012 (A similar analysis between miRNA and mRNA data)

  1. STARTSEITE UR