Plot of the ORFold output#

The output table generated by ORFold can be subsequently given to ORFold to generate a plot of the HCA score distribution. The user can provide several tables in order to compare different HCA score distributions. In this case, ORFplot will plot all the distributions on the same plot (the tables must be given with the --tab option).

The HCA score distribution of a set of globular proteins extracted from [1] is represented by the grey histogram. We defined three sequence categories according to their HCA scores: low, intermediate and high HCA score sequences. The boundaries of these categories are defined so that 95% of the globular proteins fall into the intermediate HCA score bin. Dotted black lines delineate the boundaries of each category.

Each plotted distribution is compared with the one of the globular proteins set with a Kolmogorov Smirnov test. Asterisks on the plot denote level of significance: * < 0.05, ** < 0.01, *** < 0.001. By default, the names used in the legend of the resulting plot are the root names of the input table files. However, the user can write his own names in the legend with the --labels option. The names must be given in the same order as the table files.

When running orfplot inside a Singularity container, sometimes you must set the QT_QPA_PLATFORM=offscreen environment variable to avoid Qt-related errors, while this step is not required when using Dock>

orfplot --tab sequences_Y.tab sequences_X.tab sequences_Z.tab --singularity 

This example will generate the HCA score distributions of the sequences stored in the sequences_Y.tab, sequences_X.tab and sequences_Z.tab files. The resulting legend will be sequences_Y, sequences_X, and sequences_Z respectively.

orfplot --tab sequences_Y.tab sequences_X.tab sequences_Z.tab --labels Noncoding Coding Translated --singularity 

This example will generate the HCA score distributions of the sequences stored in the sequences_Y.tab, sequences_X.tab and sequences_Z.tab files. The resulting legend will be "Noncoding", "Coding" and "Translated", respectively.

Note

If the names consist of single words the user can write them the one after the other as shown in the example above. However, if the user wishes to use multiple words in the legend labels (ie Noncoding sequences - Homo sapiens , Coding sequences - Homo sapiens, Translated sequences - Homo sapiens) they must be enclosed in double quotes.

     orfplot --tab sequences_Y.tab sequences_X.tab sequences_Z.tab --labels "Noncoding sequences - Homo sapiens" "Coding sequences - Homo sapiens" "Translated sequences - Homo sapiens"  --singularity

References

  1. Mészáros B, Erdős G, Dosztányi Z (2018) IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic acids research 46:W329–W337