The development release for Assemble2 1.0.1 is available

You can download it from this page. The new features and modifications are the following:

  • new color themes based on ColorBrewer,
  • a multiple selection from the secondary structure navigator can be colored at one go,
  • residues selected in the 2D panel keep their original color,
  • in the 2D toolbar, the eye icon allows to hide/display absolute positions,
  • the secondary structure can be centered on the selection,
  • if a recent file is not anymore available, it is automatically removed from the menu “Load Recent…”,
  • the algorithm inferring a new 2D and 3D from a structural alignment has been fixed. The 2D structure is inferred first, then the tertiary interactions are added if their residues are conserved between the reference and target sequences,
  • Assemble2 can be configured to not launch Chimera at startup and to not pop up the lateral panels automatically,
  • two new icons have been added in the 2D toolbar: (i) one to clip the 2D to the upper-left corner of the canvas and (ii) one to make a screenshot of the 2D as a PNG file.

The stable release for Assemble2 1.0.0 is now available

With some delay, i’m happy to announce the availability of the first stable release for Assemble2. Among all the new features, i would like to highlight the following ones:

The upcoming posts will give your more details about all these new features. Stay tuned!!

Practice Assemble2 with me at the VIZBI conference

On March the 4th, the conference VIZBI 2014 will host several tutorials focused on biological data visualization and manipulation. If you’re interested in Assemble2 and in RNA structure, you will be able to follow my own tutorial during the afternoon.

I will expose you the basic principles of RNA architectures and i will illustrate them with Assemble2. You will also practice with Assemble2 in order to reveal the folding properties of your RNA molecule. Among the different features that are available with Assemble2, you will more precisely learn how to:
• combine bioinformatics predictions and experimental data (structure probing, NGS reads,…) to quickly identify the best candidates for a secondary structure,
• identify the consensus secondary structure for a set a related/orthologous RNA molecules,
• annotate a solved tertiary structure into a secondary one that can be used as a 3D exploration guide,
• edit an RNA secondary structure interactively to fit your experimental data and/or hypothesis,
• derive a new tertiary structure from a solved one using an user-driven mutagenesis approach (deletion, insertion and/or substitution of single residues),
• use an RNA secondary structure to query a database of recurrent RNA 3D motifs extracted from solved structures,
• construct a 3D model from scratch for an entire RNA molecule or domain.

You will find all the details for registration on the website of the conference. I hope to see you there.

The different ways to predict an RNA secondary structure

Even if you’re not interested in RNA 3D modeling, Assemble2 is of a great help to find the secondary structure that fits the best your experimental data. Assemble2 will have different behaviors, depending on the format of your input file:

  • FASTA files (File -> Load… -> RNA Molecule(s)): Assemble2 will ask you to choose a sequence (if several ones are available). Then it will compute the folding landscape for this sequence using CONTRAfold, RNAfold and RNAsubopt. If several sequences are available, Assemble2 will also use the algorithm mlocarna to compute a consensus secondary structure. As for all Sankoff-style algorithms, mlocarna can be very costly and time-consuming if you have numerous and/or large sequences. You have to note that all the gap characters will be ignored from a FASTA file. If you have several sequences, Assemble2 will only load the molecule you have chosen. You can add the remaining sequences afterwards, using “Right-click -> Import new molecules from a file” from the panel “Aligned RNAs”. If you import these sequences from a FASTA file, they will be added as unaligned sequences. If you import them from a ClustalW file, their gaps will be conserved during the import. If you want to let Assemble2 know that you want to use directly an alignment as input, you have to provide this alignment with the ClustalW or Stockholm formats.
  • ClustalW files (File -> Load… -> RNA Alignment): using the classical format without any structural information, Assemble2 will use the algorithm RNAalifold to predict a consensus 2D based on this alignment. Then, it will display the secondary structure for one of the sequences in this alignment. All the other sequences will be listed and displayed in the panels “Structural Alignment” and “Aligned RNAs”. If you add a consensus bracket notation at the end of the file, no computing will be done. Assemble2 will display the secondary structure for one of the sequences in this alignment, inferred from the consensus structure. Example of ClustalW data with structural information:


  • Stockholm files (File -> Load… -> RNA Alignment): same behavior as with a ClustalW file storing structural information,
  • Vienna files (File -> Load… -> RNA Secondary Structure): this format is known as “dot-bracket notation”. It is like a FASTA format with each sequence followed by a bracket notation describing its secondary structure. Consequently, no prediction will be done, the secondary structure described in the file is displayed.  Assemble2 has the same behavior as with FASTA files: gaps are ignored and only one sequence is loaded if several ones are available,
  • BPSEQ or CT files (File -> Load… -> RNA Secondary Structure): no secondary structure prediction will be done. Assemble2 will display the secondary structure described in the file. Only one secondary structure can be described in such files.
  • PDB files (File -> Load… -> RNA Tertiary Structure): Assemble2 will use the algorithm RNAVIEW to annotate the 3D structure. Once done, the 2D structure will be displayed in the central panel and the 3D structure will be displayed in Chimera.

I have added a new option to configure some of these algorithms (“File -> Configure… -> RNA algorithms”). For now, you can change the number of suboptimal structures to be computed with RNAsubopt. You can also decide to use or not mlocarna for the computation of the folding landscape (if you have too many and/or too large sequences).


Once your data loaded into Assemble2, you have other ways to generate new secondary structures. You can edit the secondary structure displayed in the central panel to add/remove helices and tertiary interactions interactively. You can also edit the consensus bracket notation in the panel “Structural alignment” by clicking on its characters. This will allow you to infer a new secondary structure from this consensus for any sequence in the alignment.

All these behaviors are resumed in the following sketch and are available with the new daily build (Oct 15, 2013). More algorithms will be added to offer you more options.


Now it’s time to pimp your RNA 3D

Do you remember this post? Of course you do ;-). With the new daily build (Oct 3rd, 2013), a new option has appeared (“Edit -> Colors -> Color 3D”) allowing you to synchronize the colors between the 2D and the 3D scenes. As an example, i have opened the sample file 1ehz.pdb and i have assigned the following qualitative values (available as a sample file named tRNA_domains.txt):


By selecting “Edit -> Colors -> Color 3D”, i’m now able to produce easily such result:


You can see that i have slightly improved the way to define the molecular positions in the data file. A qualitative or quantitative value can be linked to:

  • a single position: 7
  • a range of contiguous positions: 7-14
  • several ranges of contiguous positions: 7-14,22-27,56-58

No spaces are allowed in the definition of the molecular positions.

If  your RNA structure is large, i suggest you to select subdomains before to launch the synchronization. Otherwise, Chimera could get stuck. Here is an example of quantitative data assigned to a large ribosomal RNA:



Enjoy your data mapped on the tertiary structure.

Using the consensus structural mask, you can now more easily identify the helices in a tertiary structure:


Explore the folding landscape of your RNA

The daily build of today (Oct 2nd, 2013) provides a new behavior. So far, when you opened an RNA molecule stored in a FASTA file, Assemble2 computed and displayed a single 2D prediction. Now, it opens a new lateral panel named “2D folds”, gathering all the 2D predictions computed with Contrafold and RNAfold, along with a random sample of 20 suboptimal structures computed with RNAsubopt.


You can select a 2D prediction by clicking on the “eye” icon in its upper left corner. Using these icons, you can easily switch between the different folds.


Now, if you assign qualitative or quantitative data to the secondary structure selected, these data will also be automatically assigned to all the 2D predictions displayed within the panel “2D folds”.


This means that you can more easily find the 2D prediction that fits the best with your experimental data. IMHO, that should be useful!!

One more thing: once a 2D structure selected, open the lateral panel named “Secondary Structure”, right-click on the 2D node. You will see that you can now “Display base pairing probabilities”. These probabilities are computed from all the 2D predictions displayed in the panel “2D folds”.