Predicting nanoString from Affymetrix data – non linear regression

Status: open

Praktikum (Bachelor/Master)

Field: Genomics, Machine Learning

Advisors: Spang, Altenbuchinger

Courses Required: Practical Bioinformatics I

Objective: Gene expression profiles from micro arrays can be used to diagnose tumors. However the experimental protocols are difficult and can only be carried out in large medical centers (like the DKFZ). nanoString is an alternative method to measure gene expression that is experimentally simpler and can be run in every small hospital. We want to move diagnostic signatures from the microarray platform to the nanoString platform. In a pilot study we have profiled 48 genes in 50 lymphomas using both technologies. The measurements for the same gene display a strong but non linear dependence. Here we want to use machine learning curve fitting procedures to predict potential nanoString data from microarray data.

Data: We have paired profiles from both technologies for 50 lymphomas

First Steps:

Get familiar with the two technologies, and the two lymphoma data sets.
Read about predictive non-linear curve fitting algorithms (e.g. SVM-Regression) and learn to predict nanoString data from microarray data.
Generate a virtual nanoString data set for all lymphomas (predictions from microarray data). Describe similarities and differences to the original data.
Derive signatures for the diagnosis of the lymphoma subtypes DLBCL-ABC and DLBCL-GCB from the original microarray data and the predicted nanoString data and compare them.

Questions: Is it possible to predict nanoString data from microarray data? What type of functional relation fits the data best? Do signatures derived from virtual nanoString data differ significantly from original signatures?

Start reading:

Masque-Soler et al. Blood 2013

Scott et al. Blood 2013

Geiss et al. Nat. Biotechnology 2008

Projects open

Predicting nanoString from Affymetrix data – non linear regression