ORFget is a tool provided with ORFtrack that allows the user to extract the protein and/or nulceotide sequences of specific subsets of ORFs according to their annotation categories (see here for a description of all ORF categories). ORFget deals with annotation patterns, thereby allowing different levels of annotation in a very easy fashion.
ORFget has two principal options:
-features_include
: list of motifs that will be used to define the
ORFs that will be included in the fasta
output. The sequences whose annotations include these patterns will
be retained in the output fasta file-features_exclude
: list of motifs that will be used to define the
ORFs that will be excluded in the fasta
output. The sequences whose annotations include these patterns
will not be written in the output fasta fileThe searched patterns can be specific (for a finer selection) or more general.
Note
For example, the motif "nc" which refers to all NonCoding ORFs appears in the features: nc_intergenic, nc_ovp_same-mRNA, nc_ovp_opp-mRNA and nc_ovp_same-tRNA.
As a result, the option -feature_include nc
will keep all the four
feature categories.
The option -feature_include nc_ovp
will keep:
-feature_include nc_ovp_same
will keep:
-feature_include mRNA
will keep:
-feature_exclude opp
will eliminate the nc_ovp_opp-mRNA and will keep:
Note
It can also handle gff files that were not generated by ORFtrack but in this case the user must be sure of the feature names to be indicated if using the -feature_include/exclude options. In this case, the features must correspond to those indicated in the 3rd column of the input gff file.