RMSS 2017
Simon Rogers
Decomposing Metabolomics Mass Spectrometry Data with Topic Models
The key challenge in the analysis of metabolomics mass spectrometry data is the identification of the ions detected in the mass spectrometer. Fragmentation is the most popular strategy for molecular identification but relies upon matching fragment spectra to databases, but these have a very low coverage: in a typical experiment, <10% of the measured molecules can be matched to database spectra.
​
In this talk, I will present an approach for the analysis of mass spectrometry fragment data that extracts uses approaches developed for the analysis of text -- topic models -- to extract commonly co-occurring patterns of fragments and losses that can be interpreted as molecular substructures. I will present results that indicate that identification of substructures (topics) is often possible, allowing all molecules including that topic to be partially annotated even if they cannot be identified in a traditional manner. In addition, I will show how topics can be linked to molecular intensity and how the approach can be extended to analyse the change in substructure prevalence across groups of samples.