On the road to an RNA Science Toolbox

The first versions of Assemble2 (RNAMLView, S2S and Assemble) were graphical tools dedicated to the rendering and manipulation of precomputed RNA structures. Now, Assemble2 is tightly linked to a server (public or self-deployed), hosting Web services, and deployable from a virtual machine. These Web services allow to automate RNA computations like sequence alignment, 2D prediction and 3D modeling. The initial idea was not to be exhaustive in terms of algorithms. In contrast to fully automated approaches, i wanted the RNA biologist to find in this tool his particular way of working: a first draft of alignment or structure followed by hours of manual changes to fit his assumptions and experimental data.

Within the last decade, the RNA field has grown rapidly: new RNA families, new RNA functions, new RNA algorithms, new RNA databases… The RNA biologist has now new needs and requirements. He wants to compute more, to visualize more, to compare more, to keep track of all his predictions and modifications. RNA complexes studied are becoming increasingly huge, and are made up of several partners. Several people can be involved in such studies. They need to split the work and to communicate. RNAs perform various functions, but are made with sequential, 2D and 3D patterns which are recurrent between the different RNA families. RNA biologists need to have access to a data warehouse hosted in their lab, allowing them to store, retrieve and share their data. In short, they need a laboratory information management system (LIMS) dedicated to RNA data, an RNA Science Toolbox. This idea is not new, but its fulfillment could be. Especially in the field of RNA.

I had two options to perform this goal: to add new features to Assemble2 or to further develop the Web server. I chose the second option. So, how will be structured this RNA Science toolbox?

First, we need to install and configure RNA algorithms to produce data. We need also to install and configure at least one database to store and retrieve the data produced by the users. This is the computing and data layer of our RNA Science Toolbox.


But having everything installed and configured at the same place is not enough. We need a way to communicate with all these “low-levels” tools in a more uniform way. The Bio* projects from the Open Bioinformatics Foundation provides such communication interfaces since a while. At the moment, i have developed my own solution in Python and it is named PyRNA. PyRNA is able to communicate with tools installed locally, but also with public databases (RFAM, NDB,…) and public Web services. PyRNA is lighter and more RNA oriented than BioPython. But i’m pretty sure that i will delegate the work to BioPython for more “classical” bioinformatics tasks. This Python library will be the middle layer of the toolbox.


The full installation and setup of all these components (algorithms, database, Python library,…) is a mess. This is where a virtual machine comes in play. Virtual machines allow to deploy a fully configured environment in a few easy steps. The RNA Science Toolbox is based on a Linux distribution to be deployed with the tool Vagrant. If we stop here, each user will need to install his own toolbox on his computer and will manage his own data warehouse. Futhermore, his way to do RNA Science will be restricted to the command-line: an efficient but limited way concerning visualization and interactivity with RNA data.


We can go further beyond. The Python library PyRNA contains an embedded Web server based on Tornado. This Web server already provides Web services (many more will come soon) and will offer interactive and graphical Web pages. These Web pages will allow to write interactive notebooks, in the same spirit as what you have with iPython, Beaker or Findings. The Web services will allow to connect non-Web based (native) tools like Assemble2. Consequently, a user would be able to write a Web notebook dedicated to the identification of new RNA 3D motifs, store them in the local database and use them through Assemble2 for the construction of a 3D model.


Or a user would be responsible to work on the identification of new RNA motifs in order to help others in the lab for the construction of 3D models.


With this blog post, i wanted to highlight the ideas that will guide my further developments. The project PyRNA has been renamed RNA Science Toolbox and it has now a twitter account. In parallel with the development of this toolbox, Assemble2 will also get new features to put them in synergy.

First International Workshop on Virtual and Augmented Reality dedicated to Molecular Science (VARMS 2015)

Next week, i will attend the 1st International Workshop on Virtual and Augmented Reality dedicated to Molecular Science (VARMS 2015):

The international IEEE Virtual Reality 2015 conference, through the VARMS workshop, gives researchers an excellent opportunity to:

  • keep up to date with new approaches at the interface between Augmented and Virtual Reality, 3D User Interfaces and Video Games to popularize Molecular Science, both in research and teaching contexts,
  • identify efforts to support the deeper integration of Virtual Reality techniques in the processes and practices of research laboratories and companies in the Molecular Science field, promoting the usefulness and usability of Virtual and Augmented Reality in Molecular Science, implying deep ergonomic analyses and user evaluations in the targeted field,
  • highlight convincing success stories, thereby catalysing the use of Virtual and Augmented Reality in the targeted community, as actual research achievements that lead to decisive results in Molecular Science are still rare.

Here is my poster on Assemble2.

poster varms

Use the mfold web server directly from Assemble2

Here is the last feature i have implemented for the upcoming Assemble2 1.2. We will start with an RNA secondary structure displayed in Assemble2 (and predicted from the sequence stored in the sample file ft3100_from_FANTOM3_project.fasta). I have selected the prediction from the algorithm RNAfold.


Now I would like to submit this sequence to the mfold Web server. By choosing Plugins | mfold Web Server, this opens the RNA folding form inside Assemble2. Once the web page loaded, and if a secondary structure is displayed, the name and the sequence are automatically filled in the form.


Once the button “Fold RNA” clicked, the sequence is reformatted by the website, indicating that the prediction has been correctly submitted. I have observed that the website is a little bit less reactive through Assemble2, but nothing dramatical. After a while, you should get the predictions. They’re listed on the web page (the blue arrow in the next screen capture), and Assemble2 lists them also in a choice box (the red arrow in the next screen capture).


Now i can use this choice box to select the prediction/structure #5 (for example) to see and manipulate it in Assemble2. To do so, i click on the button on the right (the red arrow in the next screen capture). And it’s done!!


To resume, starting from Assemble2 1.2, you will be able to use all the options available from the mfold web server to predict and manipulate your favorite molecule without leaving Assemble2. As you can imagine, more will come, based on the same principle.

Load data directly from RNA databases into Assemble2.

Thanks to JavaFX, the upcoming release of Assemble2 will provide you the way to load data directly from several RNA databases (i have started with the Nucleic Acid Database and RNACentral). You will be able to browse the website as you were used to it. Once on a webpage describing a 3D structure (for the Nucleic Acid Database) or a sequence (for RNACentral), a button will be activated, indicating the ability to import the data into Assemble2.