Even if you’re not interested in RNA 3D modeling, Assemble2 is of a great help to find the secondary structure that fits the best your experimental data. Assemble2 will have different behaviors, depending on the format of your input file:
- FASTA files (File -> Load… -> RNA Molecule(s)): Assemble2 will ask you to choose a sequence (if several ones are available). Then it will compute the folding landscape for this sequence using CONTRAfold, RNAfold and RNAsubopt. If several sequences are available, Assemble2 will also use the algorithm mlocarna to compute a consensus secondary structure. As for all Sankoff-style algorithms, mlocarna can be very costly and time-consuming if you have numerous and/or large sequences. You have to note that all the gap characters will be ignored from a FASTA file. If you have several sequences, Assemble2 will only load the molecule you have chosen. You can add the remaining sequences afterwards, using “Right-click -> Import new molecules from a file” from the panel “Aligned RNAs”. If you import these sequences from a FASTA file, they will be added as unaligned sequences. If you import them from a ClustalW file, their gaps will be conserved during the import. If you want to let Assemble2 know that you want to use directly an alignment as input, you have to provide this alignment with the ClustalW or Stockholm formats.
- ClustalW files (File -> Load… -> RNA Alignment): using the classical format without any structural information, Assemble2 will use the algorithm RNAalifold to predict a consensus 2D based on this alignment. Then, it will display the secondary structure for one of the sequences in this alignment. All the other sequences will be listed and displayed in the panels “Structural Alignment” and “Aligned RNAs”. If you add a consensus bracket notation at the end of the file, no computing will be done. Assemble2 will display the secondary structure for one of the sequences in this alignment, inferred from the consensus structure. Example of ClustalW data with structural information:
- Stockholm files (File -> Load… -> RNA Alignment): same behavior as with a ClustalW file storing structural information,
- Vienna files (File -> Load… -> RNA Secondary Structure): this format is known as “dot-bracket notation”. It is like a FASTA format with each sequence followed by a bracket notation describing its secondary structure. Consequently, no prediction will be done, the secondary structure described in the file is displayed. Assemble2 has the same behavior as with FASTA files: gaps are ignored and only one sequence is loaded if several ones are available,
- BPSEQ or CT files (File -> Load… -> RNA Secondary Structure): no secondary structure prediction will be done. Assemble2 will display the secondary structure described in the file. Only one secondary structure can be described in such files.
- PDB files (File -> Load… -> RNA Tertiary Structure): Assemble2 will use the algorithm RNAVIEW to annotate the 3D structure. Once done, the 2D structure will be displayed in the central panel and the 3D structure will be displayed in Chimera.
I have added a new option to configure some of these algorithms (“File -> Configure… -> RNA algorithms”). For now, you can change the number of suboptimal structures to be computed with RNAsubopt. You can also decide to use or not mlocarna for the computation of the folding landscape (if you have too many and/or too large sequences).
Once your data loaded into Assemble2, you have other ways to generate new secondary structures. You can edit the secondary structure displayed in the central panel to add/remove helices and tertiary interactions interactively. You can also edit the consensus bracket notation in the panel “Structural alignment” by clicking on its characters. This will allow you to infer a new secondary structure from this consensus for any sequence in the alignment.
All these behaviors are resumed in the following sketch and are available with the new daily build (Oct 15, 2013). More algorithms will be added to offer you more options.