Plot of the ORFold output#

The output table generated by ORFold can be subsequently given to ORFplot to generate a plot of the HCA score distribution. The user can provide several tables in order to compare different HCA score distributions. In this case, ORFplot will plot all the distributions on the same plot (the tables must be given with the -tab option).

The HCA score distribution of a set of globular proteins extracted from [1] is represented by the grey histogram. We defined three sequence categories according to their HCA scores: low, intermediate and high HCA score sequences. The boundaries of these categories are defined so that 95% of the globular proteins fall into the intermediate HCA score bin. Dotted black lines delineate the boundaries of each category.

Each plotted distribution is compared with the one of the globular proteins set with a Kolmogorov Smirnov test. Asterisks on the plot denote level of significance: * < 0.05, ** < 0.01, *** < 0.001. By default, the names used in the legend of the resulting plot are the root names of the input table files. However, the user can write his/her own names in the legend with the -names option. The names must be given in the same order as the table files.

Please note that the inputs (i.e. the tables outputed by ORFold) must be stored in the /workdir/orfold/ directory of the container.

orfplot -tab /workdir/orfold/sequences_Y.tab /workdir/orfold/sequences_X.tab /workdir/orfold/sequences_Z.tab

This example will generate the HCA score distributions of the sequences stored in the sequences_Y.tab, sequences_X.tab and sequences_Z.tab files. The resulting legend will be sequences_Y, sequences_X, and sequences_Z respectively.

orfplot -tab /workdir/orfold/sequences_Y.tab /workdir/orfold/sequences_X.tab /workdir/orfold/sequences_Z.tab -names Noncoding Coding Translated

This example will generate the HCA score distributions of the sequences stored in the sequences_Y.tab, sequences_X.tab and sequences_Z.tab files. The resulting legend will be "Noncoding", "Coding" and "Translated", respectively.

All the plots will be stored in the /workdir/orfold/ folder of the container.

Note

If the names consist of single words the user can write them the one after the other as shown in the example above. However, if the user wishes to use multiple words in the legend labels (ie Noncoding sequences - Homo sapiens , Coding sequences - Homo sapiens, Translated sequences - Homo sapiens) they must be enclosed in double quotes.

    orfplot -tab /workdir/orfold/sequences_Y.tab /workdir/orfold/sequences_X.tab /workdir/orfold/sequences_Z.tab -names "Noncoding sequences - Homo sapiens" "Coding sequences - Homo sapiens" "Translated sequences - Homo sapiens"

References

  1. Mészáros B, Erdős G, Dosztányi Z (2018) IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic acids research 46:W329–W337