Characterization of transient intermediate or transition states is crucial for the description of biomolecular folding pathways, which is, however, difficult in both experiments and computer simulations. Such transient states are typically of low population in simulation samples. Even for simple systems such as RNA hairpins, recently there are mounting debates over the existence of multiple intermediate states. In this paper, we develop a computational approach to explore the relatively low populated transition or intermediate states in biomolecular folding pathways, based on a topological data analysis tool, MAPPER, with simulation data from large-scale distributed computing. The method is inspired by the classical Morse theory in mathematics which characterizes the topology of high-dimensional shapes via some functional level sets. In this paper we exploit a conditional density filter which enables us to focus on the structures on pathways, followed by clustering analysis on its level sets, which helps separate low populated intermediates from high populated folded/unfolded structures. A successful application of this method is given on a motivating example, a RNA hairpin with GCAA tetraloop, where we are able to provide structural evidence from computer simulations on the multiple intermediate states and exhibit different pictures about unfolding and refolding pathways. The method is effective in dealing with high degree of heterogeneity in distribution, capturing structural features in multiple pathways, and being less sensitive to the distance metric than nonlinear dimensionality reduction or geometric embedding methods. The methodology described in this paper admits various implementations or extensions to incorporate more information and adapt to different settings, which thus provides a systematic tool to explore the low-density intermediate states in complex biomolecular folding systems.
The topological data analysis tool, Mapper, was used to analyze simulations of an RNA hairpin and provided structural evidence on multiple intermediate states along the path. Mapper, parameters, and RNA conformaitons to reproduce the results are provided
In this paper, we develop a computational approach to explore the relatively low populated transition or intermediate states in biomolecular folding pathways based on a topological data analysis tool, Mapper. We applied Mapper to simulation data from large-scale distributed computing to provide structural evidence on multiple intermediate states of the unfolding and refolding of an RNA hairpin with a GCAA tetraloop. This project contains Mapper, the RNA conformations (in contact map format), and instructions to reproduce results from this paper.
Motivation: Characterization of transient intermediate or transition states is crucial for the description of biomolecular folding pathways, which is however difficult in both experiments and computer simulations. Such transient states are typically of low population in simulation samples. Even for simple systems such as RNA hairpins, recently there are mounting debates over the existence of multiple intermediate states.
More about Mapper and the computational approach The method is inspired by the classical Morse theory in mathematics which characterizes the topology of high dimensional shapes via some functional level sets. In this paper we exploit a conditional density filter which enables us to focus on the structures on pathways, followed by clustering analysis on its level sets, which helps separate low populated intermediates from high populated uninteresting structures.
The method is effective in dealing with high degree of heterogeneity in distribution, capturing structural features in multiple pathways, and being less sensitive to the distance metric than nonlinear dimensionality reduction or geometric embedding methods. The methodology described in this paper admits various implementations or extensions to incorporate more information and adapt to different settings, which thus provides a systematic tool to explore the low density intermediate states in complex biomolecular folding systems.
Downloads
The Matlab script for Mapper, the RNA conformations(in contact map format), and other supporting files that are needed to reproduce the main results of our publication (shown in Figure 3) are provided. The Mapper tool could be used to cluster conformations from biomolecular simulations.
See all Downloads