On the road to an RNA Science Toolbox

The first versions of Assemble2 (RNAMLView, S2S and Assemble) were graphical tools dedicated to the rendering and manipulation of precomputed RNA structures. Now, Assemble2 is tightly linked to a server (public or self-deployed), hosting Web services, and deployable from a virtual machine. These Web services allow to automate RNA computations like sequence alignment, 2D prediction and 3D modeling. The initial idea was not to be exhaustive in terms of algorithms. In contrast to fully automated approaches, i wanted the RNA biologist to find in this tool his particular way of working: a first draft of alignment or structure followed by hours of manual changes to fit his assumptions and experimental data.

Within the last decade, the RNA field has grown rapidly: new RNA families, new RNA functions, new RNA algorithms, new RNA databases… The RNA biologist has now new needs and requirements. He wants to compute more, to visualize more, to compare more, to keep track of all his predictions and modifications. RNA complexes studied are becoming increasingly huge, and are made up of several partners. Several people can be involved in such studies. They need to split the work and to communicate. RNAs perform various functions, but are made with sequential, 2D and 3D patterns which are recurrent between the different RNA families. RNA biologists need to have access to a data warehouse hosted in their lab, allowing them to store, retrieve and share their data. In short, they need a laboratory information management system (LIMS) dedicated to RNA data, an RNA Science Toolbox. This idea is not new, but its fulfillment could be. Especially in the field of RNA.

I had two options to perform this goal: to add new features to Assemble2 or to further develop the Web server. I chose the second option. So, how will be structured this RNA Science toolbox?

First, we need to install and configure RNA algorithms to produce data. We need also to install and configure at least one database to store and retrieve the data produced by the users. This is the computing and data layer of our RNA Science Toolbox.

step1

But having everything installed and configured at the same place is not enough. We need a way to communicate with all these “low-levels” tools in a more uniform way. The Bio* projects from the Open Bioinformatics Foundation provides such communication interfaces since a while. At the moment, i have developed my own solution in Python and it is named PyRNA. PyRNA is able to communicate with tools installed locally, but also with public databases (RFAM, NDB,…) and public Web services. PyRNA is lighter and more RNA oriented than BioPython. But i’m pretty sure that i will delegate the work to BioPython for more “classical” bioinformatics tasks. This Python library will be the middle layer of the toolbox.

step2

The full installation and setup of all these components (algorithms, database, Python library,…) is a mess. This is where a virtual machine comes in play. Virtual machines allow to deploy a fully configured environment in a few easy steps. The RNA Science Toolbox is based on a Linux distribution to be deployed with the tool Vagrant. If we stop here, each user will need to install his own toolbox on his computer and will manage his own data warehouse. Futhermore, his way to do RNA Science will be restricted to the command-line: an efficient but limited way concerning visualization and interactivity with RNA data.

step3

We can go further beyond. The Python library PyRNA contains an embedded Web server based on Tornado. This Web server already provides Web services (many more will come soon) and will offer interactive and graphical Web pages. These Web pages will allow to write interactive notebooks, in the same spirit as what you have with iPython, Beaker or Findings. The Web services will allow to connect non-Web based (native) tools like Assemble2. Consequently, a user would be able to write a Web notebook dedicated to the identification of new RNA 3D motifs, store them in the local database and use them through Assemble2 for the construction of a 3D model.

step4

Or a user would be responsible to work on the identification of new RNA motifs in order to help others in the lab for the construction of 3D models.

step5

With this blog post, i wanted to highlight the ideas that will guide my further developments. The project PyRNA has been renamed RNA Science Toolbox and it has now a twitter account. In parallel with the development of this toolbox, Assemble2 will also get new features to put them in synergy.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s