Process definitions¶

ATAC-Seq¶

data:workflow:atacseqworkflow-atac-seq (data:reads:fastq reads, data:genome:fasta genome, data:bed promoter, basic:string mode, basic:string speed, basic:boolean use_se, basic:boolean discordantly, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:integer trim_5, basic:integer trim_3, basic:integer trim_iter, basic:integer trim_nucl, basic:string rep_mode, basic:integer k_reports, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:boolean tagalign, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff)[Source: v2.0.2]

This ATAC-seq pipeline closely follows the official ENCODE DCC pipeline. It is comprised of three steps; alignment, pre-peakcall QC, and calling peaks (with post-peakcall QC). First, reads are aligned to a genome using [Bowtie2](http://bowtie-bio.sourceforge.net/index.shtml) aligner. Next, pre-peakcall QC metrics are calculated. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq). Finally, the peaks are called using [MACS2](https://github.com/taoliu/MACS/). The post-peakcall QC report includes additional QC metrics – number of peaks, fraction of reads in peaks (FRiP), number of reads in peaks, and if promoter regions BED file is provided, number of reads in promoter regions, fraction of reads in promoter regions, number of peaks in promoter regions, and fraction of reads in promoter regions.

reads

label:	Select sample(s)
type:	`data:reads:fastq`

genome

label:	Genome
type:	`data:genome:fasta`

promoter

label:	Promoter regions BED file
type:	`data:bed`
description:	BED file containing promoter regions (TSS+-1000bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
required:	False

alignment.mode

label:	Alignment mode
type:	`basic:string`
description:	End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. Local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.
default:	`--local`
choices:	end to end mode: `--end-to-end` local: `--local`

alignment.speed

label:	Speed vs. Sensitivity
type:	`basic:string`
default:	`--sensitive`
choices:	Very fast: `--very-fast` Fast: `--fast` Sensitive: `--sensitive` Very sensitive: `--very-sensitive`

alignment.PE_options.use_se

label:	Map as single-ended (for paired-end reads only)
type:	`basic:boolean`
description:	If this option is selected paired-end reads will be mapped as single-ended and other paired-end options are ignored.
default:	`False`

alignment.PE_options.discordantly

label:	Report discordantly matched read
type:	`basic:boolean`
description:	If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.
default:	`True`

alignment.PE_options.rep_se

label:	Report single ended
type:	`basic:boolean`
description:	If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates.
default:	`True`

alignment.PE_options.minins

label:	Minimal distance
type:	`basic:integer`
description:	The minimum fragment length for valid paired-end alignments. 0 imposes no minimum.
default:	`0`

alignment.PE_options.maxins

label:	Maximal distance
type:	`basic:integer`
description:	The maximum fragment length for valid paired-end alignments.
default:	`2000`

alignment.start_trimming.trim_5

label:	Bases to trim from 5’
type:	`basic:integer`
description:	Number of bases to trim from from 5’ (left) end of each read before alignment.
default:	`0`

alignment.start_trimming.trim_3

label:	Bases to trim from 3’
type:	`basic:integer`
description:	Number of bases to trim from from 3’ (right) end of each read before alignment
default:	`0`

alignment.trimming.trim_iter

label:	Iterations
type:	`basic:integer`
description:	Number of iterations.
default:	`0`

alignment.trimming.trim_nucl

label:	Bases to trim
type:	`basic:integer`
description:	Number of bases to trim from 3’ end in each iteration.
default:	`2`

alignment.reporting.rep_mode

label:	Report mode
type:	`basic:string`
description:	Default mode: search for multiple alignments, report the best one; -k mode: search for one or more alignments, report each; -a mode: search for and report all alignments
default:	`def`
choices:	Default mode: `def` -k mode: `k` -a mode (very slow): `a`

alignment.reporting.k_reports

label:	Number of reports (for -k mode only)
type:	`basic:integer`
description:	Searches for at most X distinct, valid alignments for each read. The search terminates when it can’t find more distinct valid alignments, or when it finds X, whichever happens first.
default:	`5`

prepeakqc_settings.q_threshold

label:	Quality filtering threshold
type:	`basic:integer`
default:	`30`

prepeakqc_settings.n_sub

label:	Number of reads to subsample
type:	`basic:integer`
default:	`25000000`

prepeakqc_settings.tn5

label:	TN5 shifting
type:	`basic:boolean`
description:	Tn5 transposon shifting. Shift reads on “+” strand by 4bp and reads on “-” strand by 5bp.
default:	`True`

prepeakqc_settings.shift

label:	User-defined cross-correlation peak strandshift
type:	`basic:integer`
description:	If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
default:	`0`

settings.tagalign

label:	Use tagAlign files
type:	`basic:boolean`
description:	Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
default:	`True`

settings.duplicates

label:	Number of duplicates
type:	`basic:string`
description:	It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
required:	False
hidden:	settings.tagalign
choices:	1: `1` auto: `auto` all: `all`

settings.duplicates_prepeak

label:	Number of duplicates
type:	`basic:string`
description:	It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
required:	False
hidden:	!settings.tagalign
default:	`all`
choices:	1: `1` auto: `auto` all: `all`

settings.qvalue

label:	Q-value cutoff
type:	`basic:decimal`
description:	The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
required:	False
disabled:	settings.pvalue && settings.pvalue_prepeak

settings.pvalue

label:	P-value cutoff
type:	`basic:decimal`
description:	The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
required:	False
disabled:	settings.qvalue
hidden:	settings.tagalign

settings.pvalue_prepeak

label:	P-value cutoff
type:	`basic:decimal`
description:	The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
disabled:	settings.qvalue
hidden:	!settings.tagalign \|\| settings.qvalue
default:	`0.01`

settings.cap_num

label:	Cap number of peaks by taking top N peaks
type:	`basic:integer`
description:	To keep all peaks set value to 0.
disabled:	settings.broad
default:	`300000`

settings.mfold_lower

label:	MFOLD range (lower limit)
type:	`basic:integer`
description:	This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:	False

settings.mfold_upper

label:	MFOLD range (upper limit)
type:	`basic:integer`
description:	This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:	False

settings.slocal

label:	Small local region
type:	`basic:integer`
description:	Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:	False

settings.llocal

label:	Large local region
type:	`basic:integer`
description:	Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:	False

settings.extsize

label:	extsize
type:	`basic:integer`
description:	While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
default:	`150`

settings.shift

label:	Shift
type:	`basic:integer`
description:	Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
default:	`-75`

settings.band_width

label:	Band width
type:	`basic:integer`
description:	The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
required:	False

settings.nolambda

label:	Use backgroud lambda as local lambda
type:	`basic:boolean`
description:	With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
default:	`False`

settings.fix_bimodal

label:	Turn on the auto paired-peak model process
type:	`basic:boolean`
description:	Whether turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tags. If set, MACS will be terminated if paired-peak model is failed.
default:	`False`

settings.nomodel

label:	Bypass building the shifting model
type:	`basic:boolean`
description:	While on, MACS will bypass building the shifting model.
hidden:	settings.tagalign
default:	`False`

settings.nomodel_prepeak

label:	Bypass building the shifting model
type:	`basic:boolean`
description:	While on, MACS will bypass building the shifting model.
hidden:	!settings.tagalign
default:	`True`

settings.down_sample

label:	Down-sample
type:	`basic:boolean`
description:	When set, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change. Consider to use ‘randsample’ script before MACS2 runs instead.
default:	`False`

settings.bedgraph

label:	Save fragment pileup and control lambda
type:	`basic:boolean`
description:	If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
default:	`True`

settings.spmr

label:	Save signal per million reads for fragment pileup profiles
type:	`basic:boolean`
disabled:	settings.bedgraph === false
default:	`True`

settings.call_summits

label:	Call summits
type:	`basic:boolean`
description:	MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
default:	`True`

settings.broad

label:	Composite broad regions
type:	`basic:boolean`
description:	When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
disabled:	settings.call_summits === true
default:	`False`

settings.broad_cutoff

label:	Broad cutoff
type:	`basic:decimal`
description:	Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
required:	False
disabled:	settings.call_summits === true \|\| settings.broad !== true

Abstract alignment process¶

data:alignmentabstract-alignment ()[Source: v1.0.0]

bam

label:	Alignment file
type:	`basic:file`

bai

label:	Alignment index BAI
type:	`basic:file`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Abstract annotation process¶

data:annotationabstract-annotation ()[Source: v1.0.0]

annot

label:	Uploaded file
type:	`basic:file`

source

label:	Gene ID source
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Abstract bed process¶

data:bedabstract-bed ()[Source: v1.0.0]

bed

label:	BED
type:	`basic:file`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Abstract differential expression process¶

data:differentialexpressionabstract-differentialexpression ()[Source: v1.0.0]

raw

label:	Differential expression (gene level)
type:	`basic:file`

de_json

label:	Results table (JSON)
type:	`basic:json`

de_file

label:	Results table (file)
type:	`basic:file`

source

label:	Gene ID source
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

Abstract expression process¶

data:expressionabstract-expression ()[Source: v1.0.0]

exp

label:	Normalized expression
type:	`basic:file`

rc

label:	Read counts
type:	`basic:file`
required:	False

exp_json

label:	Expression (json)
type:	`basic:json`

exp_type

label:	Expression type
type:	`basic:string`

source

label:	Gene ID source
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

Accel Amplicon Pipeline¶

data:workflow:ampliconworkflow-accel (data:reads:fastq:paired reads, data:genome:fasta genome, data:masterfile:amplicon master_file, data:seq:nucleotide adapters, list:data:variants:vcf known_indels, list:data:variants:vcf known_vars, data:variants:vcf dbsnp, basic:integer mbq, basic:integer stand_call_conf, basic:integer min_bq, basic:integer min_alt_bq, list:data:variants:vcf known_vars_db, basic:decimal af_threshold)[Source: v4.0.1]

Processing pipeline to analyse the Accel-Amplicon NGS panel data. The raw amplicon sequencing reads are quality trimmed using Trimmomatic. The quality of the raw and trimmed data is assesed using the FASTQC tool. Quality trimmed reads are aligned to a reference genome using BWA mem. Sequencing primers are removed from the aligned reads using Primerclip. Amplicon performance stats are calculated using Bedtools coveragebed and Picard CollectTargetedPcrMetrics programs. Prior to variant calling, the alignment file is preprocessed using the GATK IndelRealigner and BaseRecalibrator tools. GATK HaplotypeCaller and Lofreq tools are used to call germline variants. Called variants are annotated using the SnpEff tool. Finally, the amplicon performance metrics and identified variants data are used to generate the PDF analysis report.

reads

label:	Input reads
type:	`data:reads:fastq:paired`

genome

label:	Genome
type:	`data:genome:fasta`

master_file

label:	Experiment Master file
type:	`data:masterfile:amplicon`

adapters

label:	Adapters
type:	`data:seq:nucleotide`
description:	Provide an Illumina sequencing adapters file (.fasta) with adapters to be removed by Trimmomatic.

preprocess_bam.known_indels

label:	Known indels
type:	`list:data:variants:vcf`

preprocess_bam.known_vars

label:	Known variants
type:	`list:data:variants:vcf`

gatk.dbsnp

label:	dbSNP
type:	`data:variants:vcf`

gatk.mbq

label:	Min Base Quality
type:	`basic:integer`
description:	Minimum base quality required to consider a base for calling.
default:	`20`

gatk.stand_call_conf

label:	Min call confidence threshold
type:	`basic:integer`
description:	The minimum phred-scaled confidence threshold at which variants should be called.
default:	`20`

lofreq.min_bq

label:	Min baseQ
type:	`basic:integer`
description:	Skip any base with baseQ smaller than the default value.
default:	`20`

lofreq.min_alt_bq

label:	Min alternate baseQ
type:	`basic:integer`
description:	Skip alternate bases with baseQ smaller than the default value.
default:	`20`

var_annot.known_vars_db

label:	Known variants
type:	`list:data:variants:vcf`

report.af_threshold

label:	Allele frequency threshold
type:	`basic:decimal`
default:	`0.01`

Align (BWA) and trim adapters¶

data:alignment:bam:bwatrimalign-bwa-trim (data:masterfile:amplicon master_file, data:genome:fasta genome, data:reads:fastq reads, basic:integer seed_l, basic:integer band_w, basic:decimal re_seeding, basic:boolean m, basic:integer match, basic:integer missmatch, basic:integer gap_o, basic:integer gap_e, basic:integer clipping, basic:integer unpaired_p, basic:boolean report_all, basic:integer report_tr)[Source: v1.2.2]

Align with BWA mem and trim the sam output. The process uses the memory-optimized Primertrim tool.

master_file

label:	Master file
type:	`data:masterfile:amplicon`
description:	Amplicon experiment design file that holds the information about the primers to be removed.

genome

label:	Reference genome
type:	`data:genome:fasta`

reads

label:	Reads
type:	`data:reads:fastq`

seed_l

label:	Minimum seed length
type:	`basic:integer`
description:	Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates 20.
default:	`19`

band_w

label:	Band width
type:	`basic:integer`
description:	Gaps longer than this will not be found.
default:	`100`

re_seeding

label:	Re-seeding factor
type:	`basic:decimal`
description:	Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.
default:	`1.5`

m

label:	Mark shorter split hits as secondary
type:	`basic:boolean`
description:	Mark shorter split hits as secondary (for Picard compatibility)
default:	`False`

scoring.match

label:	Score of a match
type:	`basic:integer`
default:	`1`

scoring.missmatch

label:	Mismatch penalty
type:	`basic:integer`
default:	`4`

scoring.gap_o

label:	Gap open penalty
type:	`basic:integer`
default:	`6`

scoring.gap_e

label:	Gap extension penalty
type:	`basic:integer`
default:	`1`

scoring.clipping

label:	Clipping penalty
type:	`basic:integer`
description:	Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)
default:	`5`

scoring.unpaired_p

label:	Penalty for an unpaired read pair
type:	`basic:integer`
description:	Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty
default:	`9`

reporting.report_all

label:	Report all found alignments
type:	`basic:boolean`
description:	Output all found alignments for single-end or unpaired paired-end reads. These alignments will be flagged as secondary alignments.
default:	`False`

reporting.report_tr

label:	Report threshold score
type:	`basic:integer`
description:	Don’t output alignment with score lower than defined number. This option only affects output.
default:	`30`

bam

label:	Alignment file
type:	`basic:file`
description:	Position sorted alignment

bai

label:	Index BAI
type:	`basic:file`

stats

label:	Statistics
type:	`basic:file`

bigwig

label:	BigWig file
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Amplicon report¶

data:report:ampliconamplicon-report (data:picard:coverage pcr_metrics, data:coverage coverage, data:masterfile:amplicon master_file, list:data:snpeff annot_vars, basic:decimal af_threshold)[Source: v1.0.4]

Create amplicon report.

pcr_metrics

label:	Picard TargetedPcrMetrics
type:	`data:picard:coverage`

coverage

label:	Coverage
type:	`data:coverage`

master_file

label:	Amplicon master file
type:	`data:masterfile:amplicon`

annot_vars

label:	Annotated variants (snpEff)
type:	`list:data:snpeff`

af_threshold

label:	Allele frequency threshold
type:	`basic:decimal`
default:	`0.01`

report

label:	Report
type:	`basic:file`

panel_name

label:	Panel name
type:	`basic:string`

stats

label:	File with sample statistics
type:	`basic:file`

amplicon_cov

label:	Amplicon coverage file (nomergebed)
type:	`basic:file`

variant_tables

label:	Variant tabels (snpEff)
type:	`list:basic:file`

Amplicon table¶

data:varianttable:ampliconamplicon-table (data:masterfile:amplicon master_file, data:coverage coverage, list:data:snpeff annot_vars, basic:boolean all_amplicons, basic:string table_name)[Source: v1.1.0]

Create variant table for use together with the genome browser.

master_file

label:	Master file
type:	`data:masterfile:amplicon`

coverage

label:	Amplicon coverage
type:	`data:coverage`

annot_vars

label:	Annotated variants
type:	`list:data:snpeff`

all_amplicons

label:	Report all amplicons
type:	`basic:boolean`
default:	`False`

table_name

label:	Amplicon table name
type:	`basic:string`
default:	`Amplicons containing variants`

variant_table

label:	Variant table
type:	`basic:json`

Annotate novel splice junctions (regtools)¶

data:junctions:regtoolsregtools-junctions-annotate (data:genome:fasta genome, data:annotation:gtf annotation, data:alignment:bam:star alignment_star, data:alignment:bam alignment, data:bed input_bed_junctions)[Source: v0.3.0]

Identify novel splice junctions by using regtools to annotate against a reference. The process accepts reference genome, reference genome annotation (GTF), and input with reads information (STAR aligment or reads aligned by any other aligner or junctions in BED12 format). If STAR aligner data is given as input, the process calculates BED12 file from STAR ‘SJ.out.tab’ file, and annotates all junctions with ‘regtools junctions annotate’ command. When reads are aligned by other aligner, junctions are extracted with ‘regtools junctions extract’ tool and then annotated with ‘junction annotate’ command. Third option allows user to provide directly BED12 file with junctions, which are then annotated. Finnally, annotated novel junctions are filtered in a separate output file. More information can be found in the [regtools manual](https://regtools.readthedocs.io/en/latest/).

genome

label:	Reference genome
type:	`data:genome:fasta`

annotation

label:	Reference genome annotation (GTF)
type:	`data:annotation:gtf`

alignment_star

label:	STAR alignment
type:	`data:alignment:bam:star`
description:	Splice junctions detected by STAR aligner (SJ.out.tab STAR output file). Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.
required:	False

alignment

label:	Alignment
type:	`data:alignment:bam`
description:	Aligned reads from which splice junctions are going to be extracted. Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.
required:	False

input_bed_junctions

label:	Junctions in BED12 format
type:	`data:bed`
description:	Splice junctions in BED12 format. Please provide one input ‘STAR alignment’ or ‘Alignment’ by any aligner or directly ‘Junctions in BED12 format’.
required:	False

novel_splice_junctions

label:	Table of annotated novel splice junctions
type:	`basic:file`

splice_junctions

label:	Table of annotated splice junctions
type:	`basic:file`

novel_sj_bed

label:	Novel splice junctions in BED format
type:	`basic:file`

bed

label:	Splice junctions in BED format
type:	`basic:file`

novel_sj_bigbed_igv_ucsc

label:	Novel splice junctions in BigBed format
type:	`basic:file`
required:	False

bigbed_igv_ucsc

label:	Splice junctions in BigBed format
type:	`basic:file`
required:	False

novel_sj_tbi_jbrowse

label:	Novel splice junctions bed tbi index for JBrowse
type:	`basic:file`

tbi_jbrowse

label:	Bed tbi index for JBrowse
type:	`basic:file`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Archive and make multi-sample report for amplicon data¶

data:archive:samples:ampliconamplicon-archive-multi-report (list:data data, list:basic:string fields, basic:boolean j)[Source: v0.2.5]

Create an archive of output files. The ouput folder structure is organized by sample slug and data object’s output-field names. Additionally, create multi-sample report for selected samples.

data

label:	Data list
type:	`list:data`

fields

label:	Output file fields
type:	`list:basic:string`

j

label:	Junk paths
type:	`basic:boolean`
description:	Store just names of saved files (junk the path)
default:	`False`

archive

label:	Archive of selected samples and a heatmap comparing them
type:	`basic:file`

Archive samples¶

data:archive:samplesarchive-samples (list:data data, list:basic:string fields, basic:boolean j)[Source: v0.3.0]

Create an archive of output files. The ouput folder structure is organized by sample slug and data object’s output-field names.

data

label:	Data list
type:	`list:data`

fields

label:	Output file fields
type:	`list:basic:string`

j

label:	Junk paths
type:	`basic:boolean`
description:	Store just names of saved files (junk the path)
default:	`False`

archive

label:	Archive
type:	`basic:file`

BAM file¶

data:alignment:bam:uploadupload-bam (basic:file src, basic:string species, basic:string build)[Source: v1.5.0]

Import a BAM file (.bam), which is the binary format for storing sequence alignment data. This format is described on the [SAM Tools web site](http://samtools.github.io/hts-specs/).

src

label:	Mapping (BAM)
type:	`basic:file`
description:	A mapping file in BAM format. The file will be indexed on upload, so additional BAI files are not required.
validate_regex:	`\.(bam)$`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

build

label:	Build
type:	`basic:string`

bam

label:	Uploaded file
type:	`basic:file`

bai

label:	Index BAI
type:	`basic:file`

stats

label:	Alignment statistics
type:	`basic:file`

bigwig

label:	BigWig file
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

BAM file and index¶

data:alignment:bam:uploadupload-bam-indexed (basic:file src, basic:file src2, basic:string species, basic:string build)[Source: v1.5.0]

Import a BAM file (.bam) and BAM index (.bam.bai). BAM file is the binary format for storing sequence alignment data. This format is described on the [SAM Tools web site](http://samtools.github.io/hts-specs/).

src

label:	Mapping (BAM)
type:	`basic:file`
description:	A mapping file in BAM format.
validate_regex:	`\.(bam)$`

src2

label:	bam index (*.bam.bai file)
type:	`basic:file`
description:	An index file of a BAM mapping file (ending with bam.bai).
validate_regex:	`\.(bam.bai)$`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

build

label:	Build
type:	`basic:string`

bam

label:	Uploaded file
type:	`basic:file`

bai

label:	Index BAI
type:	`basic:file`

stats

label:	Alignment statistics
type:	`basic:file`

bigwig

label:	BigWig file
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

BBDuk (paired-end)¶

data:reads:fastq:paired:bbdukbbduk-paired (data:reads:fastq:paired reads, basic:integer min_length, basic:boolean show_advanced, list:data:seq:nucleotide sequences, list:basic:string literal_sequences, basic:integer kmer_length, basic:boolean check_reverse_complements, basic:boolean mask_middle_base, basic:integer min_kmer_hits, basic:decimal min_kmer_fraction, basic:decimal min_coverage_fraction, basic:integer hamming_distance, basic:integer query_hamming_distance, basic:integer edit_distance, basic:integer hamming_distance2, basic:integer query_hamming_distance2, basic:integer edit_distance2, basic:boolean forbid_N, basic:boolean remove_if_either_bad, basic:boolean find_best_match, basic:boolean perform_error_correction, basic:string k_trim, basic:string k_mask, basic:boolean mask_fully_covered, basic:integer min_k, basic:string quality_trim, basic:integer trim_quality, basic:integer trim_poly_A, basic:decimal min_length_fraction, basic:integer max_length, basic:integer min_average_quality, basic:integer min_average_quality_bases, basic:integer min_base_quality, basic:integer min_consecutive_bases, basic:integer trim_pad, basic:boolean trim_by_overlap, basic:boolean strict_overlap, basic:integer min_overlap, basic:integer min_insert, basic:boolean trim_pairs_evenly, basic:integer force_trim_left, basic:integer force_trim_right, basic:integer force_trim_right2, basic:integer force_trim_mod, basic:integer restrict_left, basic:integer restrict_right, basic:decimal min_GC, basic:decimal max_GC, basic:integer maxns, basic:boolean toss_junk, basic:boolean chastity_filter, basic:boolean barcode_filter, list:data:seq:nucleotide barcode_files, list:basic:string barcode_sequences, basic:integer x_min, basic:integer y_min, basic:integer x_max, basic:integer y_max, basic:decimal entropy, basic:integer entropy_window, basic:integer entropy_k, basic:boolean entropy_mask, basic:integer min_base_frequency, basic:boolean nogroup)[Source: v2.3.0]

BBDuk combines the most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool. It is capable of quality-trimming and filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer cardinality estimation, and various other operations in a single pass. See [here](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) for more information.

reads

label:	Reads
type:	`data:reads:fastq:paired`

min_length

label:	Minimum length [minlength=10]
type:	`basic:integer`
description:	Reads shorter than the minimum length will be discarded after trimming.
default:	`10`

show_advanced

label:	Show advanced parameters
type:	`basic:boolean`
default:	`False`

reference.sequences

label:	Sequences [ref]
type:	`list:data:seq:nucleotide`
description:	Reference sequences include adapters, contaminants, and degenerate sequences. They can be provided in a multi-sequence FASTA file or as a set of literal sequences below.
required:	False

reference.literal_sequences

label:	Literal sequences [literal]
type:	`list:basic:string`
description:	Literal sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:	False
default:	`[]`

processing.kmer_length

label:	Kmer length [k=27]
type:	`basic:integer`
description:	Kmer length used for finding contaminants. Contaminants shorter than kmer length will not be found. Kmer length must be at least 1.
default:	`27`

processing.check_reverse_complements

label:	Look for reverse complements of kmers in addition to forward kmers [rcomp=t]
type:	`basic:boolean`
default:	`True`

processing.mask_middle_base

label:	Treat the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors [maskmiddle=t]
type:	`basic:boolean`
default:	`True`

processing.min_kmer_hits

label:	Minimum number of kmer hits [minkmerhits=1]
type:	`basic:integer`
description:	Reads need at least this many matching kmers to be considered as matching the reference.
default:	`1`

processing.min_kmer_fraction

label:	Minimum kmer fraction [minkmerfraction=0.0]
type:	`basic:decimal`
description:	A read needs at least this fraction of its total kmers to hit a reference in order to be considered a match. If this and ‘Minimum number of kmer hits’ are set, the greater is used.
default:	`0.0`

processing.min_coverage_fraction

label:	Minimum coverage fraction [mincovfraction=0.0]
type:	`basic:decimal`
description:	A read needs at least this fraction of its total bases to be covered by reference kmers to be considered a match. If specified, ‘Minimum coverage fraction’ overrides ‘Minimum number of kmer hits’ and ‘Minimum kmer fraction’.
default:	`0.0`

processing.hamming_distance

label:	Maximum Hamming distance for kmers (substitutions only) [hammingdistance=0]
type:	`basic:integer`
default:	`0`

processing.query_hamming_distance

label:	Hamming distance for query kmers [qhdist=0]
type:	`basic:integer`
default:	`0`

processing.edit_distance

label:	Maximum edit distance from reference kmers (substitutions and indels) [editdistance=0]
type:	`basic:integer`
default:	`0`

processing.hamming_distance2

label:	Hamming distance for short kmers when looking for shorter kmers [hammingdistance2=0]
type:	`basic:integer`
default:	`0`

processing.query_hamming_distance2

label:	Hamming distance for short query kmers when looking for shorter kmers [qhdist2=0]
type:	`basic:integer`
default:	`0`

processing.edit_distance2

label:	Maximum edit distance from short reference kmers (substitutions and indels) when looking for shorter kmers [editdistance2=0]
type:	`basic:integer`
default:	`0`

processing.forbid_N

label:	Forbid matching of read kmers containing N [forbidn=f]
type:	`basic:boolean`
description:	By default, these will match a reference ‘A’ if ‘Maximum Hamming distance for kmers’ > 0 or ‘Maximum edit distance from reference kmers’ > 0, to increase sensitivity.
default:	`False`

processing.remove_if_either_bad

label:	Remove both sequences of a paired-end read, if either of them is to be removed [removeifeitherbad=t]
type:	`basic:boolean`
default:	`True`

processing.find_best_match

label:	If multiple matches, associate read with sequence sharing most kmers [findbestmatch=t]
type:	`basic:boolean`
default:	`True`

processing.perform_error_correction

label:	Perform error correction with BBMerge prior to kmer operations [ecco=f]
type:	`basic:boolean`
default:	`False`

operations.k_trim

label:	Trimming protocol to remove bases matching reference kmers from reads [ktrim=f]
type:	`basic:string`
default:	`f`
choices:	Don’t trim: `f` Trim to the right: `r` Trim to the left: `l`

operations.k_mask

label:	Symbol to replace bases matching reference kmers [kmask=f]
type:	`basic:string`
description:	Allows any non-whitespace character other than t or f. Processes short kmers on both ends.
default:	`f`

operations.mask_fully_covered

label:	Only mask bases that are fully covered by kmers [maskfullycovered=f]
type:	`basic:boolean`
default:	`False`

operations.min_k

label:	Look for shorter kmers at read tips down to this length when k-trimming or masking [mink=0]
type:	`basic:integer`
description:	-1 means disabled. Enabling this will disable treating the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.
default:	`-1`

operations.quality_trim

label:	Trimming protocol to remove bases with quality below the minimum average region quality from read ends [qtrim=f]
type:	`basic:string`
description:	Performed after looking for kmers. If enabled, set also ‘Average quality below which to trim region’.
default:	`f`
choices:	Trim neither end: `f` Trim both ends: `rl` Trim only right end: `r` Trim only left end: `l` Use sliding window: `w`

operations.trim_quality

label:	Average quality below which to trim region [trimq=6]
type:	`basic:integer`
description:	Set trimming protocol to enable this parameter.
disabled:	operations.quality_trim == ‘f’
default:	`6`

operations.trim_poly_A

label:	Minimum length of poly-A or poly-T tails to trim on either end of reads [trimpolya=0]
type:	`basic:integer`
default:	`0`

operations.min_length_fraction

label:	Minimum length fraction [mlf=0.0]
type:	`basic:decimal`
description:	Reads shorter than this fraction of original length after trimming will be discarded.
default:	`0.0`

operations.max_length

label:	Maximum length [maxlength]
type:	`basic:integer`
description:	Reads longer than this after trimming will be discarded.
required:	False

operations.min_average_quality

label:	Minimum average quality [minavgquality=0]
type:	`basic:integer`
description:	Reads with average quality (after trimming) below this will be discarded.
default:	`0`

operations.min_average_quality_bases

label:	Number of initial bases to calculate minimum average quality from [maqb=0]
type:	`basic:integer`
description:	Used only if positive.
default:	`0`

operations.min_base_quality

label:	Minimum base quality below which reads are discarded after trimming [minbasequality=0]
type:	`basic:integer`
default:	`0`

operations.min_consecutive_bases

label:	Minimum number of consecutive called bases [mcb=0]
type:	`basic:integer`
default:	`0`

operations.trim_pad

label:	Number of bases to trim around matching kmers [tp=0]
type:	`basic:integer`
default:	`0`

operations.trim_by_overlap

label:	Trim adapters based on where paired-end reads overlap [tbo=f]
type:	`basic:boolean`
default:	`False`

operations.strict_overlap

label:	Adjust sensitivity in ‘Trim adapters based on where paired-end reads overlap’ mode [strictoverlap=t]
type:	`basic:boolean`
default:	`True`

operations.min_overlap

label:	Minimum number of overlapping bases [minoverlap=14]
type:	`basic:integer`
description:	Require this many bases of overlap for detection.
default:	`14`

operations.min_insert

label:	Minimum insert size [mininsert=40]
type:	`basic:integer`
description:	Require insert size of at least this for overlap. Should be reduced to 16 for small RNA sequencing.
default:	`40`

operations.trim_pairs_evenly

label:	Trim both sequences of paired-end reads to the minimum length of either sequence [tpe=f]
type:	`basic:boolean`
default:	`False`

operations.force_trim_left

label:	Position from which to trim bases to the left [forcetrimleft=0]
type:	`basic:integer`
default:	`0`

operations.force_trim_right

label:	Position from which to trim bases to the right [forcetrimright=0]
type:	`basic:integer`
default:	`0`

operations.force_trim_right2

label:	Number of bases to trim from the right end [forcetrimright2=0]
type:	`basic:integer`
default:	`0`

operations.force_trim_mod

label:	Modulo to right-trim reads [forcetrimmod=0]
type:	`basic:integer`
description:	Trim reads to the largest multiple of modulo.
default:	`0`

operations.restrict_left

label:	Number of leftmost bases to look in for kmer matches [restrictleft=0]
type:	`basic:integer`
default:	`0`

operations.restrict_right

label:	Number of rightmosot bases to look in for kmer matches [restrictright=0]
type:	`basic:integer`
default:	`0`

operations.min_GC

label:	Minimum GC content [mingc=0.0]
type:	`basic:decimal`
description:	Discard reads with lower GC content.
default:	`0.0`

operations.max_GC

label:	Maximum GC content [maxgc=1.0]
type:	`basic:decimal`
description:	Discard reads with higher GC content.
default:	`1.0`

operations.maxns

label:	Max Ns after trimming [maxns=-1]
type:	`basic:integer`
description:	If non-negative, reads with more Ns than this (after trimming) will be discarded.
default:	`-1`

operations.toss_junk

label:	Discard reads with invalid characters as bases [tossjunk=f]
type:	`basic:boolean`
default:	`False`

header_parsing.chastity_filter

label:	Discard reads that fail Illumina chastity filtering [chastityfilter=f]
type:	`basic:boolean`
description:	Discard reads with id containing ‘ 1:Y:’ or ‘ 2:Y:’.
default:	`False`

header_parsing.barcode_filter

label:	Remove reads with unexpected barcodes if barcodes are set, or barcodes containing ‘N’ otherwise [barcodefilter=f]
type:	`basic:boolean`
description:	A barcode must be the last part of the read header.
default:	`False`

header_parsing.barcode_files

label:	Barcode sequences [barcodes]
type:	`list:data:seq:nucleotide`
required:	False

header_parsing.barcode_sequences

label:	Literal barcode sequences [barcodes]
type:	`list:basic:string`
description:	Literal barcode sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:	False
default:	`[]`

header_parsing.x_min

label:	Minimum X coordinate [xmin=-1]
type:	`basic:integer`
description:	If positive, discard reads with a smaller X coordinate.
default:	`-1`

header_parsing.y_min

label:	Minimum Y coordinate [ymin=-1]
type:	`basic:integer`
description:	If positive, discard reads with a smaller Y coordinate.
default:	`-1`

header_parsing.x_max

label:	Maximum X coordinate [xmax=-1]
type:	`basic:integer`
description:	If positive, discard reads with a larger X coordinate.
default:	`-1`

header_parsing.y_max

label:	Maximum Y coordinate [ymax=-1]
type:	`basic:integer`
description:	If positive, discard reads with a larger Y coordinate.
default:	`-1`

complexity.entropy

label:	Minimum entropy [entropy=-1.0]
type:	`basic:decimal`
description:	Set between 0 and 1 to filter reads with entropy below that value. Higher is more stringent.
default:	`-1.0`

complexity.entropy_window

label:	Length of sliding window used to calculate entropy [entropywindow=50]
type:	`basic:integer`
description:	To use the sliding window set minimum entropy in range between 0.0 and 1.0.
default:	`50`

complexity.entropy_k

label:	Length of kmers used to calcuate entropy [entropyk=5]
type:	`basic:integer`
default:	`5`

complexity.entropy_mask

label:	Mask low-entropy parts of sequences with N instead of discarding [entropymask=f]
type:	`basic:boolean`
default:	`False`

complexity.min_base_frequency

label:	Minimum base frequency [minbasefrequency=0]
type:	`basic:integer`
default:	`0`

fastqc.nogroup

label:	Disable grouping of bases for reads >50bp [nogroup]
type:	`basic:boolean`
description:	All reports will show data for every base in the read. Using this option will cause fastqc to crash and burn if you use it on really long reads.
default:	`False`

fastq

label:	Remaining upstream reads
type:	`list:basic:file`

fastq2

label:	Remaining downstream reads
type:	`list:basic:file`

statistics

label:	Statistics
type:	`list:basic:file`

fastqc_url

label:	Upstream quality control with FastQC
type:	`list:basic:file:html`

fastqc_url2

label:	Downstream quality control with FastQC
type:	`list:basic:file:html`

fastqc_archive

label:	Download upstream FastQC archive
type:	`list:basic:file`

fastqc_archive2

label:	Download downstream FastQC archive
type:	`list:basic:file`

BBDuk (single-end)¶

data:reads:fastq:single:bbdukbbduk-single (data:reads:fastq:single reads, basic:integer min_length, basic:boolean show_advanced, list:data:seq:nucleotide sequences, list:basic:string literal_sequences, basic:integer kmer_length, basic:boolean check_reverse_complements, basic:boolean mask_middle_base, basic:integer min_kmer_hits, basic:decimal min_kmer_fraction, basic:decimal min_coverage_fraction, basic:integer hamming_distance, basic:integer query_hamming_distance, basic:integer edit_distance, basic:integer hamming_distance2, basic:integer query_hamming_distance2, basic:integer edit_distance2, basic:boolean forbid_N, basic:boolean find_best_match, basic:string k_trim, basic:string k_mask, basic:boolean mask_fully_covered, basic:integer min_k, basic:string quality_trim, basic:integer trim_quality, basic:integer trim_poly_A, basic:decimal min_length_fraction, basic:integer max_length, basic:integer min_average_quality, basic:integer min_average_quality_bases, basic:integer min_base_quality, basic:integer min_consecutive_bases, basic:integer trim_pad, basic:integer min_overlap, basic:integer min_insert, basic:integer force_trim_left, basic:integer force_trim_right, basic:integer force_trim_right2, basic:integer force_trim_mod, basic:integer restrict_left, basic:integer restrict_right, basic:decimal min_GC, basic:decimal max_GC, basic:integer maxns, basic:boolean toss_junk, basic:boolean chastity_filter, basic:boolean barcode_filter, list:data:seq:nucleotide barcode_files, list:basic:string barcode_sequences, basic:integer x_min, basic:integer y_min, basic:integer x_max, basic:integer y_max, basic:decimal entropy, basic:integer entropy_window, basic:integer entropy_k, basic:boolean entropy_mask, basic:integer min_base_frequency, basic:boolean nogroup)[Source: v2.3.0]

BBDuk combines the most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool. It is capable of quality-trimming and filtering, adapter-trimming, contaminant-filtering via kmer matching, sequence masking, GC-filtering, length filtering, entropy-filtering, format conversion, histogram generation, subsampling, quality-score recalibration, kmer cardinality estimation, and various other operations in a single pass. See [here](https://jgi.doe.gov/data-and-tools/bbtools/bb-tools-user-guide/bbduk-guide/) for more information.

reads

label:	Reads
type:	`data:reads:fastq:single`

min_length

label:	Minimum length [minlength=10]
type:	`basic:integer`
description:	Reads shorter than the minimum length will be discarded after trimming.
default:	`10`

show_advanced

label:	Show advanced parameters
type:	`basic:boolean`
default:	`False`

reference.sequences

label:	Sequences [ref]
type:	`list:data:seq:nucleotide`
description:	Reference sequences include adapters, contaminants, and degenerate sequences. They can be provided in a multi-sequence FASTA file or as a set of literal sequences below.
required:	False

reference.literal_sequences

label:	Literal sequences [literal]
type:	`list:basic:string`
description:	Literal sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:	False
default:	`[]`

processing.kmer_length

label:	Kmer length [k=27]
type:	`basic:integer`
description:	Kmer length used for finding contaminants. Contaminants shorter than Kmer length will not be found. Kmer length must be at least 1.
default:	`27`

processing.check_reverse_complements

label:	Look for reverse complements of kmers in addition to forward kmers [rcomp=t]
type:	`basic:boolean`
default:	`True`

processing.mask_middle_base

label:	Treat the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors [maskmiddle=t]
type:	`basic:boolean`
default:	`True`

processing.min_kmer_hits

label:	Minimum number of kmer hits [minkmerhits=1]
type:	`basic:integer`
description:	Reads need at least this many matching kmers to be considered matching the reference.
default:	`1`

processing.min_kmer_fraction

label:	Minimum kmer fraction [minkmerfraction=0.0]
type:	`basic:decimal`
description:	A read needs at least this fraction of its total kmers to hit a reference in order to be considered a match. If this and ‘Minimum number of kmer hits’ are set, the greater is used.
default:	`0.0`

processing.min_coverage_fraction

label:	Minimum coverage fraction [mincovfraction=0.0]
type:	`basic:decimal`
description:	A read needs at least this fraction of its total bases to be covered by reference kmers to be considered a match. If specified, ‘Minimum coverage fraction’ overrides ‘Minimum number of kmer hits’ and ‘Minimum kmer fraction’.
default:	`0.0`

processing.hamming_distance

label:	Maximum Hamming distance for kmers (substitutions only) [hammingdistance=0]
type:	`basic:integer`
default:	`0`

processing.query_hamming_distance

label:	Hamming distance for query kmers [qhdist=0]
type:	`basic:integer`
default:	`0`

processing.edit_distance

label:	Maximum edit distance from reference kmers (substitutions and indels) [editdistance=0]
type:	`basic:integer`
default:	`0`

processing.hamming_distance2

label:	Hamming distance for short kmers when looking for shorter kmers [hammingdistance2=0]
type:	`basic:integer`
default:	`0`

processing.query_hamming_distance2

label:	Hamming distance for short query kmers when looking for shorter kmers [qhdist2=0]
type:	`basic:integer`
default:	`0`

processing.edit_distance2

label:	Maximum edit distance from short reference kmers (substitutions and indels) when looking for shorter kmers [editdistance2=0]
type:	`basic:integer`
default:	`0`

processing.forbid_N

label:	Forbid matching of read kmers containing N [forbidn=f]
type:	`basic:boolean`
description:	By default, these will match a reference ‘A’ if ‘Maximum Hamming distance for kmers’ > 0 or ‘Maximum edit distance from reference kmers’ > 0, to increase sensitivity.
default:	`False`

processing.find_best_match

label:	If multiple matches, associate read with sequence sharing most kmers [findbestmatch=f]
type:	`basic:boolean`
default:	`True`

operations.k_trim

label:	Trimming protocol to remove bases matching reference kmers from reads [ktrim=f]
type:	`basic:string`
default:	`f`
choices:	Don’t trim: `f` Trim to the right: `r` Trim to the left: `l`

operations.k_mask

label:	Symbol to replace bases matching reference kmers [kmask=f]
type:	`basic:string`
description:	Allows any non-whitespace character other than t or f. Processes short kmers on both ends.
default:	`f`

operations.mask_fully_covered

label:	Only mask bases that are fully covered by kmers [maskfullycovered=f]
type:	`basic:boolean`
default:	`False`

operations.min_k

label:	Look for shorter kmers at read tips down to this length when k-trimming or masking [mink=0]
type:	`basic:integer`
description:	-1 means disabled. Enabling this will disable treating the middle base of a kmer as a wildcard to increase sensitivity in the presence of errors.
default:	`-1`

operations.quality_trim

label:	Trimming protocol to remove bases with quality below the minimum average region quality from read ends [qtrim=f]
type:	`basic:string`
description:	Performed after looking for kmers. If enabled, set also ‘Average quality below which to trim region’.
default:	`f`
choices:	Trim neither end: `f` Trim both ends: `rl` Trim only right end: `r` Trim only left end: `l` Use sliding window: `w`

operations.trim_quality

label:	Average quality below which to trim region [trimq=6]
type:	`basic:integer`
description:	Set trimming protocol to enable this parameter.
disabled:	operations.quality_trim == ‘f’
default:	`6`

operations.trim_poly_A

label:	Minimum length of poly-A or poly-T tails to trim on either end of reads [trimpolya=0]
type:	`basic:integer`
default:	`0`

operations.min_length_fraction

label:	Minimum length fraction [mlf=0]
type:	`basic:decimal`
description:	Reads shorter than this fraction of original length after trimming will be discarded.
default:	`0.0`

operations.max_length

label:	Maximum length [maxlength]
type:	`basic:integer`
description:	Reads longer than this after trimming will be discarded.
required:	False

operations.min_average_quality

label:	Minimum average quality [minavgquality=0]
type:	`basic:integer`
description:	Reads with average quality (after trimming) below this will be discarded.
default:	`0`

operations.min_average_quality_bases

label:	Number of initial bases to calculate minimum average quality from [maqb=0]
type:	`basic:integer`
description:	Used only if positive.
default:	`0`

operations.min_base_quality

label:	Minimum base quality below which reads are discarded after trimming [minbasequality=0]
type:	`basic:integer`
default:	`0`

operations.min_consecutive_bases

label:	Minimum number of consecutive called bases [mcb=0]
type:	`basic:integer`
default:	`0`

operations.trim_pad

label:	Number of bases to trim around matching kmers [tp=0]
type:	`basic:integer`
default:	`0`

operations.min_overlap

label:	Minimum number of overlapping bases [minoverlap=14]
type:	`basic:integer`
description:	Require this many bases of overlap for detection.
default:	`14`

operations.min_insert

label:	Minimum insert size [mininsert=40]
type:	`basic:integer`
description:	Require insert size of at least this for overlap. Should be reduced to 16 for small RNA sequencing.
default:	`40`

operations.force_trim_left

label:	Position from which to trim bases to the left [forcetrimleft=0]
type:	`basic:integer`
default:	`0`

operations.force_trim_right

label:	Position from which to trim bases to the right [forcetrimright=0]
type:	`basic:integer`
default:	`0`

operations.force_trim_right2

label:	Number of bases to trim from the right end [forcetrimright2=0]
type:	`basic:integer`
default:	`0`

operations.force_trim_mod

label:	Modulo to right-trim reads [forcetrimmod=0]
type:	`basic:integer`
description:	Trim reads to the largest multiple of modulo.
default:	`0`

operations.restrict_left

label:	Number of leftmost bases to look in for kmer matches [restrictleft=0]
type:	`basic:integer`
default:	`0`

operations.restrict_right

label:	Number of rightmosot bases to look in for kmer matches [restricright=0]
type:	`basic:integer`
default:	`0`

operations.min_GC

label:	Minimum GC content [mingc=0.0]
type:	`basic:decimal`
description:	Discard reads with lower GC content.
default:	`0.0`

operations.max_GC

label:	Maximum GC content [maxgc=1.0]
type:	`basic:decimal`
description:	Discard reads with higher GC content.
default:	`1.0`

operations.maxns

label:	Max Ns after trimming [maxns=-1]
type:	`basic:integer`
description:	If non-negative, reads with more Ns than this (after trimming) will be discarded.
default:	`-1`

operations.toss_junk

label:	Discard reads with invalid characters as bases [tossjunk=f]
type:	`basic:boolean`
default:	`False`

header_parsing.chastity_filter

label:	Discard reads that fail Illumina chastity filtering [chastityfilter=f]
type:	`basic:boolean`
description:	Discard reads with id containing ‘ 1:Y:’ or ‘ 2:Y:’.
default:	`False`

header_parsing.barcode_filter

label:	Remove reads with unexpected barcodes if barcodes are set, or barcodes containing ‘N’ otherwise [barcodefilter=f]
type:	`basic:boolean`
description:	A barcode must be the last part of the read header.
default:	`False`

header_parsing.barcode_files

label:	Barcode sequences [barcodes]
type:	`list:data:seq:nucleotide`
required:	False

header_parsing.barcode_sequences

label:	Literal barcode sequences [barcodes]
type:	`list:basic:string`
description:	Literal barcode sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:	False
default:	`[]`

header_parsing.x_min

label:	Minimum X coordinate [xmin=-1]
type:	`basic:integer`
description:	If positive, discard reads with a smaller X coordinate.
default:	`-1`

header_parsing.y_min

label:	Minimum Y coordinate [ymin=-1]
type:	`basic:integer`
description:	If positive, discard reads with a smaller Y coordinate.
default:	`-1`

header_parsing.x_max

label:	Maximum X coordinate [xmax=-1]
type:	`basic:integer`
description:	If positive, discard reads with a larger X coordinate.
default:	`-1`

header_parsing.y_max

label:	Maximum Y coordinate [ymax=-1]
type:	`basic:integer`
description:	If positive, discard reads with a larger Y coordinate.
default:	`-1`

complexity.entropy

label:	Minimum entropy [entropy=-1]
type:	`basic:decimal`
description:	Set between 0 and 1 to filter reads with entropy below that value. Higher is more stringent.
default:	`-1.0`

complexity.entropy_window

label:	Length of sliding window used to calculate entropy [entropywindow=50]
type:	`basic:integer`
description:	To use the sliding window set minimum entropy in range between 0.0 and 1.0.
default:	`50`

complexity.entropy_k

label:	Length of kmers used to calcuate entropy [entropyk=5]
type:	`basic:integer`
default:	`5`

complexity.entropy_mask

label:	Mask low-entropy parts of sequences with N instead of discarding [entropymask=f]
type:	`basic:boolean`
default:	`False`

complexity.min_base_frequency

label:	Minimum base frequency [minbasefrequency=0]
type:	`basic:integer`
default:	`0`

fastqc.nogroup

label:	Disable grouping of bases for reads >50bp [nogroup]
type:	`basic:boolean`
description:	All reports will show data for every base in the read. Using this option will cause fastqc to crash and burn if you use it on really long reads.
default:	`False`

fastq

label:	Remaining reads
type:	`list:basic:file`

statistics

label:	Statistics
type:	`list:basic:file`

fastqc_url

label:	Quality control with FastQC
type:	`list:basic:file:html`

fastqc_archive

label:	Download FastQC archive
type:	`list:basic:file`

BBDuk - STAR - FeatureCounts (3’ mRNA-Seq, paired-end)¶

data:workflow:quant:featurecounts:pairedworkflow-bbduk-star-fc-quant-paired (data:reads:fastq:paired reads, data:genomeindex:star star_index, list:data:seq:nucleotide adapters, data:annotation annotation, basic:string stranded, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, data:genomeindex:star rrna_reference, data:genomeindex:star globin_reference)[Source: v1.1.0]

This 3’ mRNA-Seq pipeline is comprised of QC, preprocessing, alignment and quantification steps. Reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Preprocessed reads are aligned by __STAR__ aligner. For read-count quantification, the __FeatureCounts__ tool is used. QC steps include downsampling, QoRTs QC analysis and alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate.

reads

label:	Paired-end reads
type:	`data:reads:fastq:paired`

star_index

label:	Star index
type:	`data:genomeindex:star`
description:	Genome index prepared by STAR aligner indexing tool.

adapters

label:	Adapters
type:	`list:data:seq:nucleotide`
description:	Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.
required:	False

annotation

label:	Annotation
type:	`data:annotation`

stranded

label:	Select the type of kit used for library preparation.
type:	`basic:string`
choices:	Strand-specific forward: `forward` Strand-specific reverse: `reverse`

downsampling.n_reads

label:	Number of reads
type:	`basic:integer`
default:	`1000000`

downsampling.advanced.seed

label:	Seed
type:	`basic:integer`
default:	`11`

downsampling.advanced.fraction

label:	Fraction
type:	`basic:decimal`
description:	Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
required:	False

downsampling.advanced.two_pass

label:	2-pass mode
type:	`basic:boolean`
description:	Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
default:	`False`

qc.rrna_reference

label:	Indexed rRNA reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

qc.globin_reference

label:	Indexed Globin reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

BBDuk - STAR - FeatureCounts (3’ mRNA-Seq, single-end)¶

data:workflow:quant:featurecounts:singleworkflow-bbduk-star-fc-quant-single (data:reads:fastq:single reads, data:genomeindex:star star_index, list:data:seq:nucleotide adapters, data:annotation annotation, basic:string stranded, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, data:genomeindex:star rrna_reference, data:genomeindex:star globin_reference)[Source: v1.1.0]

This 3’ mRNA-Seq pipeline is comprised of QC, preprocessing, alignment and quantification steps. Reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Preprocessed reads are aligned by __STAR__ aligner. For read-count quantification, the __FeatureCounts__ tool is used. QC steps include downsampling, QoRTs QC analysis and alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate.

reads

label:	Input single-end reads
type:	`data:reads:fastq:single`

star_index

label:	Star index
type:	`data:genomeindex:star`
description:	Genome index prepared by STAR aligner indexing tool.

adapters

label:	Adapters
type:	`list:data:seq:nucleotide`
description:	Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.
required:	False

annotation

label:	Annotation
type:	`data:annotation`

stranded

label:	Select the type of kit used for library preparation.
type:	`basic:string`
choices:	Strand-specific forward: `forward` Strand-specific reverse: `reverse`

downsampling.n_reads

label:	Number of reads
type:	`basic:integer`
default:	`1000000`

downsampling.advanced.seed

label:	Seed
type:	`basic:integer`
default:	`11`

downsampling.advanced.fraction

label:	Fraction
type:	`basic:decimal`
description:	Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
required:	False

downsampling.advanced.two_pass

label:	2-pass mode
type:	`basic:boolean`
description:	Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
default:	`False`

qc.rrna_reference

label:	Indexed rRNA reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

qc.globin_reference

label:	Indexed Globin reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

BBDuk - STAR - HTSeq-count (paired-end)¶

data:workflow:rnaseq:htseq:pairedworkflow-bbduk-star-htseq-paired (data:reads:fastq:paired reads, data:genomeindex:star star_index, list:data:seq:nucleotide adapters, data:annotation annotation, basic:string stranded)[Source: v1.0.1]

This RNA-seq pipeline is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.nature.com/articles/nmeth.4106). Finally, aligned reads are summarized to genes by __HTSeq-count__. Compared to featureCounts, HTSeq-count is not as computationally efficient. All three tools in this workflow support parallelization to accelerate the analysis.

reads

label:	Paired-end reads
type:	`data:reads:fastq:paired`

star_index

label:	Star index
type:	`data:genomeindex:star`
description:	Genome index prepared by STAR aligner indexing tool.

adapters

label:	Adapters
type:	`list:data:seq:nucleotide`
description:	Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.
required:	False

annotation

label:	Annotation
type:	`data:annotation`

stranded

label:	Select the QuantSeq kit used for library preparation.
type:	`basic:string`
choices:	QuantSeq FWD: `yes` QuantSeq REV: `reverse`

BBDuk - STAR - HTSeq-count (single-end)¶

data:workflow:rnaseq:htseq:singleworkflow-bbduk-star-htseq (data:reads:fastq:single reads, data:genomeindex:star star_index, list:data:seq:nucleotide adapters, data:annotation annotation, basic:string stranded)[Source: v1.0.1]

This RNA-seq pipeline is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.nature.com/articles/nmeth.4106). Finally, aligned reads are summarized to genes by __HTSeq-count__. Compared to featureCounts, HTSeq-count is not as computationally efficient. All three tools in this workflow support parallelization to accelerate the analysis.

reads

label:	Input single-end reads
type:	`data:reads:fastq:single`

star_index

label:	Star index
type:	`data:genomeindex:star`
description:	Genome index prepared by STAR aligner indexing tool.

adapters

label:	Adapters
type:	`list:data:seq:nucleotide`
description:	Provide a list of sequencing adapters files (.fasta) to be removed by BBDuk.
required:	False

annotation

label:	annotation
type:	`data:annotation`

stranded

label:	Select the QuantSeq kit used for library preparation.
type:	`basic:string`
choices:	QuantSeq FWD: `yes` QuantSeq REV: `reverse`

BBDuk - STAR - featureCounts - QC (paired-end)¶

data:workflow:rnaseq:featurecounts:qcworkflow-bbduk-star-featurecounts-qc-paired (data:reads:fastq:paired reads, list:data:seq:nucleotide adapters, basic:boolean show_advanced, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, data:genomeindex:star genome, basic:boolean show_advanced, basic:boolean unstranded, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chimSegmentMin, basic:boolean quantmode, basic:boolean singleend, basic:boolean gene_counts, basic:string outFilterType, basic:integer outFilterMultimapNmax, basic:integer outFilterMismatchNmax, basic:decimal outFilterMismatchNoverLmax, basic:integer outFilterScoreMin, basic:integer alignSJoverhangMin, basic:integer alignSJDBoverhangMin, basic:integer alignIntronMin, basic:integer alignIntronMax, basic:integer alignMatesGapMax, basic:string alignEndsType, basic:string outSAMunmapped, basic:string outSAMattributes, basic:string outSAMattrRGline, data:annotation annotation, basic:boolean show_advanced, basic:string assay_type, data:index:salmon cdna_index, basic:integer n_reads, basic:string feature_class, basic:string feature_type, basic:string id_attribute, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, data:genomeindex:star rrna_reference, data:genomeindex:star globin_reference)[Source: v1.4.0]

This RNA-seq pipeline is comprised of three steps preprocessing, alignment, and quantification. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __featureCounts__. Gaining wide adoption among the bioinformatics community, featureCounts yields expressions in a computationally efficient manner. All three tools in this workflow support parallelization to accelerate the analysis. rRNA contamination rate in the sample is determined using the STAR aligner. Quality-trimmed reads are down-sampled (using Seqtk tool) and aligned to the rRNA reference sequences. The alignment rate indicates the percentage of the reads in the sample that are derived from the rRNA sequences.

preprocessing.reads

label:	Reads
type:	`data:reads:fastq:paired`

preprocessing.adapters

label:	Adapters
type:	`list:data:seq:nucleotide`
required:	False

preprocessing.show_advanced

label:	Show advanced parameters
type:	`basic:boolean`
default:	`False`

preprocessing.custom_adapter_sequences

label:	Custom adapter sequences [literal]
type:	`list:basic:string`
description:	Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:	False
hidden:	!preprocessing.show_advanced
default:	`[]`

preprocessing.kmer_length

label:	K-mer length
type:	`basic:integer`
description:	K-mer length must be smaller or equal to the length of adapters.
hidden:	!preprocessing.show_advanced
default:	`23`

preprocessing.min_k

label:	Minimum k-mer length at right end of reads used for trimming
type:	`basic:integer`
disabled:	preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0
hidden:	!preprocessing.show_advanced
default:	`11`

preprocessing.hamming_distance

label:	Maximum Hamming distance for k-mers
type:	`basic:integer`
hidden:	!preprocessing.show_advanced
default:	`1`

preprocessing.maxns

label:	Max Ns after trimming [maxns=-1]
type:	`basic:integer`
description:	If non-negative, reads with more Ns than this (after trimming) will be discarded.
hidden:	!preprocessing.show_advanced
default:	`-1`

preprocessing.trim_quality

label:	Quality below which to trim reads from the right end
type:	`basic:integer`
description:	Phred algorithm is used, which is more accurate than naive trimming.
hidden:	!preprocessing.show_advanced
default:	`10`

preprocessing.min_length

label:	Minimum read length
type:	`basic:integer`
description:	Reads shorter than minimum read length after trimming are discarded.
hidden:	!preprocessing.show_advanced
default:	`20`

alignment.genome

label:	Indexed reference genome
type:	`data:genomeindex:star`
description:	Genome index prepared by STAR aligner indexing tool.

alignment.show_advanced

label:	Show advanced parameters
type:	`basic:boolean`
default:	`False`

alignment.unstranded

label:	The data is unstranded
type:	`basic:boolean`
description:	For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, c ufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.
hidden:	!alignment.show_advanced
default:	`False`

alignment.noncannonical

label:	Remove non-cannonical junctions (Cufflinks compatibility)
type:	`basic:boolean`
description:	It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
hidden:	!alignment.show_advanced
default:	`False`

alignment.detect_chimeric.chimeric

label:	Detect chimeric and circular alignments
type:	`basic:boolean`
description:	To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
default:	`False`

alignment.detect_chimeric.chimSegmentMin

label:	–chimSegmentMin
type:	`basic:integer`
disabled:	detect_chimeric.chimeric != true
default:	`20`

alignment.t_coordinates.quantmode

label:	Output in transcript coordinates
type:	`basic:boolean`
description:	With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
default:	`False`

alignment.t_coordinates.singleend

label:	Allow soft-clipping and indels
type:	`basic:boolean`
description:	By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).
disabled:	t_coordinates.quantmode != true
default:	`False`

alignment.t_coordinates.gene_counts

label:	Count reads
type:	`basic:boolean`
description:	With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).
disabled:	t_coordinates.quantmode != true
default:	`False`

alignment.filtering.outFilterType

label:	Type of filtering
type:	`basic:string`
description:	Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab
default:	`Normal`
choices:	Normal: `Normal` BySJout: `BySJout`

alignment.filtering.outFilterMultimapNmax

label:	–outFilterMultimapNmax
type:	`basic:integer`
description:	Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).
required:	False

alignment.filtering.outFilterMismatchNmax

label:	–outFilterMismatchNmax
type:	`basic:integer`
description:	Alignment will be output only if it has fewer mismatches than this value (default: 10).
required:	False

alignment.filtering.outFilterMismatchNoverLmax

label:	–outFilterMismatchNoverLmax
type:	`basic:decimal`
description:	Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.
required:	False

alignment.filtering.outFilterScoreMin

label:	–outFilterScoreMin
type:	`basic:integer`
description:	Alignment will be output only if its score is higher than or equal to this value (default: 0).
required:	False

alignment.alignment.alignSJoverhangMin

label:	–alignSJoverhangMin
type:	`basic:integer`
description:	Minimum overhang (i.e. block size) for spliced alignments (default: 5).
required:	False

alignment.alignment.alignSJDBoverhangMin

label:	–alignSJDBoverhangMin
type:	`basic:integer`
description:	Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
required:	False

alignment.alignment.alignIntronMin

label:	–alignIntronMin
type:	`basic:integer`
description:	Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
required:	False

alignment.alignment.alignIntronMax

label:	–alignIntronMax
type:	`basic:integer`
description:	Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:	False

alignment.alignment.alignMatesGapMax

label:	–alignMatesGapMax
type:	`basic:integer`
description:	Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:	False

alignment.alignment.alignEndsType

label:	–alignEndsType
type:	`basic:string`
description:	Type of read ends alignment (default: Local).
required:	False
default:	`Local`
choices:	Local: `Local` EndToEnd: `EndToEnd` Extend5pOfRead1: `Extend5pOfRead1` Extend5pOfReads12: `Extend5pOfReads12`

alignment.output_sam_bam.outSAMunmapped

label:	–outSAMunmapped
type:	`basic:string`
description:	Output of unmapped reads in the SAM format.
required:	False
default:	`None`
choices:	None: `None` Within: `Within`

alignment.output_sam_bam.outSAMattributes

label:	–outSAMattributes
type:	`basic:string`
description:	a string of desired SAM attributes, in the order desired for the output SAM.
required:	False
default:	`Standard`
choices:	None: `None` Standard: `Standard` All: `All`

alignment.output_sam_bam.outSAMattrRGline

label:	–outSAMattrRGline
type:	`basic:string`
description:	SAM/BAM read group line. The first word contains the read group identifier and must start with “ID:”, e.g. –outSAMattrRGline ID:xxx CN:yy “DS:z z z”
required:	False

quantification.annotation

label:	Annotation
type:	`data:annotation`

quantification.show_advanced

label:	Show advanced parameters
type:	`basic:boolean`
default:	`False`

quantification.assay_type

label:	Assay type
type:	`basic:string`
description:	In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
hidden:	!quantification.show_advanced
default:	`non_specific`
choices:	Strand non-specific: `non_specific` Strand-specific forward: `forward` Strand-specific reverse: `reverse` Detect automatically: `auto`

quantification.cdna_index

label:	cDNA index file
type:	`data:index:salmon`
description:	Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results.
required:	False
hidden:	quantification.assay_type != ‘auto’

quantification.n_reads

label:	Number of reads in subsampled alignment file
type:	`basic:integer`
description:	Alignment (.bam) file subsample size. Increase the number of reads to make automatic detection more reliable. Decrease the number of reads to make automatic detection run faster.
hidden:	quantification.assay_type != ‘auto’
default:	`5000000`

quantification.feature_class

label:	Feature class
type:	`basic:string`
description:	Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.
hidden:	!quantification.show_advanced
default:	`exon`

quantification.feature_type

label:	Feature type
type:	`basic:string`
description:	The type of feature the quantification program summarizes over (e.g. gene or transcript-level analysis). The value of this parameter needs to be chosen in line with ‘ID attribute’ below.
hidden:	!quantification.show_advanced
default:	`gene`
choices:	gene: `gene` transcript: `transcript`

quantification.id_attribute

label:	ID attribute
type:	`basic:string`
description:	GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID are considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.
hidden:	!quantification.show_advanced
default:	`gene_id`
choices:	gene_id: `gene_id` transcript_id: `transcript_id` ID: `ID` geneid: `geneid`

downsampling.n_reads

label:	Number of reads
type:	`basic:integer`
default:	`1000000`

downsampling.advanced.seed

label:	Seed
type:	`basic:integer`
default:	`11`

downsampling.advanced.fraction

label:	Fraction
type:	`basic:decimal`
description:	Use the fraction of reads [0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
required:	False

downsampling.advanced.two_pass

label:	2-pass mode
type:	`basic:boolean`
description:	Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
default:	`False`

qc.rrna_reference

label:	Indexed rRNA reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

qc.globin_reference

label:	Indexed Globin reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

BBDuk - STAR - featureCounts - QC (single-end)¶

data:workflow:rnaseq:featurecounts:qcworkflow-bbduk-star-featurecounts-qc-single (data:reads:fastq:single reads, list:data:seq:nucleotide adapters, basic:boolean show_advanced, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, data:genomeindex:star genome, basic:boolean show_advanced, basic:boolean unstranded, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chimSegmentMin, basic:boolean quantmode, basic:boolean singleend, basic:boolean gene_counts, basic:string outFilterType, basic:integer outFilterMultimapNmax, basic:integer outFilterMismatchNmax, basic:decimal outFilterMismatchNoverLmax, basic:integer outFilterScoreMin, basic:integer alignSJoverhangMin, basic:integer alignSJDBoverhangMin, basic:integer alignIntronMin, basic:integer alignIntronMax, basic:integer alignMatesGapMax, basic:string alignEndsType, basic:string outSAMunmapped, basic:string outSAMattributes, basic:string outSAMattrRGline, data:annotation annotation, basic:boolean show_advanced, basic:string assay_type, data:index:salmon cdna_index, basic:integer n_reads, basic:string feature_class, basic:string feature_type, basic:string id_attribute, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass, data:genomeindex:star rrna_reference, data:genomeindex:star globin_reference)[Source: v1.4.0]

This RNA-seq pipeline is comprised of three steps preprocessing, alignment, and quantification. First, reads are preprocessed by __BBDuk__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Compared to similar tools, BBDuk is regarded for its computational efficiency. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __featureCounts__. Gaining wide adoption among the bioinformatics community, featureCounts yields expressions in a computationally efficient manner. All three tools in this workflow support parallelization to accelerate the analysis. rRNA contamination rate in the sample is determined using the STAR aligner. Quality-trimmed reads are down-sampled (using Seqtk tool) and aligned to the rRNA reference sequences. The alignment rate indicates the percentage of the reads in the sample that are derived from the rRNA sequences.

preprocessing.reads

label:	Reads
type:	`data:reads:fastq:single`

preprocessing.adapters

label:	Adapters
type:	`list:data:seq:nucleotide`
required:	False

preprocessing.show_advanced

label:	Show advanced parameters
type:	`basic:boolean`
default:	`False`

preprocessing.custom_adapter_sequences

label:	Custom adapter sequences [literal]
type:	`list:basic:string`
description:	Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:	False
hidden:	!preprocessing.show_advanced
default:	`[]`

preprocessing.kmer_length

label:	K-mer length
type:	`basic:integer`
description:	K-mer length must be smaller or equal to the length of adapters.
hidden:	!preprocessing.show_advanced
default:	`23`

preprocessing.min_k

label:	Minimum k-mer length at right end of reads used for trimming
type:	`basic:integer`
disabled:	preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0
hidden:	!preprocessing.show_advanced
default:	`11`

preprocessing.hamming_distance

label:	Maximum Hamming distance for k-mers
type:	`basic:integer`
hidden:	!preprocessing.show_advanced
default:	`1`

preprocessing.maxns

label:	Max Ns after trimming [maxns=-1]
type:	`basic:integer`
description:	If non-negative, reads with more Ns than this (after trimming) will be discarded.
hidden:	!preprocessing.show_advanced
default:	`-1`

preprocessing.trim_quality

label:	Quality below which to trim reads from the right end
type:	`basic:integer`
description:	Phred algorithm is used, which is more accurate than naive trimming.
hidden:	!preprocessing.show_advanced
default:	`10`

preprocessing.min_length

label:	Minimum read length
type:	`basic:integer`
description:	Reads shorter than minimum read length after trimming are discarded.
hidden:	!preprocessing.show_advanced
default:	`20`

alignment.genome

label:	Indexed reference genome
type:	`data:genomeindex:star`
description:	Genome index prepared by STAR aligner indexing tool.

alignment.show_advanced

label:	Show advanced parameters
type:	`basic:boolean`
default:	`False`

alignment.unstranded

label:	The data is unstranded
type:	`basic:boolean`
description:	For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, c ufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.
hidden:	!alignment.show_advanced
default:	`False`

alignment.noncannonical

label:	Remove non-cannonical junctions (Cufflinks compatibility)
type:	`basic:boolean`
description:	It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
hidden:	!alignment.show_advanced
default:	`False`

alignment.detect_chimeric.chimeric

label:	Detect chimeric and circular alignments
type:	`basic:boolean`
description:	To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
default:	`False`

alignment.detect_chimeric.chimSegmentMin

label:	–chimSegmentMin
type:	`basic:integer`
disabled:	detect_chimeric.chimeric != true
default:	`20`

alignment.t_coordinates.quantmode

label:	Output in transcript coordinates
type:	`basic:boolean`
description:	With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
default:	`False`

alignment.t_coordinates.singleend

label:	Allow soft-clipping and indels
type:	`basic:boolean`
description:	By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).
disabled:	t_coordinates.quantmode != true
default:	`False`

alignment.t_coordinates.gene_counts

label:	Count reads
type:	`basic:boolean`
description:	With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).
disabled:	t_coordinates.quantmode != true
default:	`False`

alignment.filtering.outFilterType

label:	Type of filtering
type:	`basic:string`
description:	Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab
default:	`Normal`
choices:	Normal: `Normal` BySJout: `BySJout`

alignment.filtering.outFilterMultimapNmax

label:	–outFilterMultimapNmax
type:	`basic:integer`
description:	Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).
required:	False

alignment.filtering.outFilterMismatchNmax

label:	–outFilterMismatchNmax
type:	`basic:integer`
description:	Alignment will be output only if it has fewer mismatches than this value (default: 10).
required:	False

alignment.filtering.outFilterMismatchNoverLmax

label:	–outFilterMismatchNoverLmax
type:	`basic:decimal`
description:	Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.
required:	False

alignment.filtering.outFilterScoreMin

label:	–outFilterScoreMin
type:	`basic:integer`
description:	Alignment will be output only if its score is higher than or equal to this value (default: 0).
required:	False

alignment.alignment.alignSJoverhangMin

label:	–alignSJoverhangMin
type:	`basic:integer`
description:	Minimum overhang (i.e. block size) for spliced alignments (default: 5).
required:	False

alignment.alignment.alignSJDBoverhangMin

label:	–alignSJDBoverhangMin
type:	`basic:integer`
description:	Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
required:	False

alignment.alignment.alignIntronMin

label:	–alignIntronMin
type:	`basic:integer`
description:	Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
required:	False

alignment.alignment.alignIntronMax

label:	–alignIntronMax
type:	`basic:integer`
description:	Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:	False

alignment.alignment.alignMatesGapMax

label:	–alignMatesGapMax
type:	`basic:integer`
description:	Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:	False

alignment.alignment.alignEndsType

label:	–alignEndsType
type:	`basic:string`
description:	Type of read ends alignment (default: Local).
required:	False
default:	`Local`
choices:	Local: `Local` EndToEnd: `EndToEnd` Extend5pOfRead1: `Extend5pOfRead1` Extend5pOfReads12: `Extend5pOfReads12`

alignment.output_sam_bam.outSAMunmapped

label:	–outSAMunmapped
type:	`basic:string`
description:	Output of unmapped reads in the SAM format.
required:	False
default:	`None`
choices:	None: `None` Within: `Within`

alignment.output_sam_bam.outSAMattributes

label:	–outSAMattributes
type:	`basic:string`
description:	a string of desired SAM attributes, in the order desired for the output SAM.
required:	False
default:	`Standard`
choices:	None: `None` Standard: `Standard` All: `All`

alignment.output_sam_bam.outSAMattrRGline

label:	–outSAMattrRGline
type:	`basic:string`
description:	SAM/BAM read group line. The first word contains the read group identifier and must start with “ID:”, e.g. –outSAMattrRGline ID:xxx CN:yy “DS:z z z”
required:	False

quantification.annotation

label:	Annotation
type:	`data:annotation`

quantification.show_advanced

label:	Show advanced parameters
type:	`basic:boolean`
default:	`False`

quantification.assay_type

label:	Assay type
type:	`basic:string`
description:	In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
hidden:	!quantification.show_advanced
default:	`non_specific`
choices:	Strand non-specific: `non_specific` Strand-specific forward: `forward` Strand-specific reverse: `reverse` Detect automatically: `auto`

quantification.cdna_index

label:	cDNA index file
type:	`data:index:salmon`
description:	Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results.
required:	False
hidden:	quantification.assay_type != ‘auto’

quantification.n_reads

label:	Number of reads in subsampled alignment file
type:	`basic:integer`
description:	Alignment (.bam) file subsample size. Increase the number of reads to make automatic detection more reliable. Decrease the number of reads to make automatic detection run faster.
hidden:	quantification.assay_type != ‘auto’
default:	`5000000`

quantification.feature_class

label:	Feature class
type:	`basic:string`
description:	Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.
hidden:	!quantification.show_advanced
default:	`exon`

quantification.feature_type

label:	Feature type
type:	`basic:string`
description:	The type of feature the quantification program summarizes over (e.g. gene or transcript-level analysis). The value of this parameter needs to be chosen in line with ‘ID attribute’ below.
hidden:	!quantification.show_advanced
default:	`gene`
choices:	gene: `gene` transcript: `transcript`

quantification.id_attribute

label:	ID attribute
type:	`basic:string`
description:	GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.
hidden:	!quantification.show_advanced
default:	`gene_id`
choices:	gene_id: `gene_id` transcript_id: `transcript_id` ID: `ID` geneid: `geneid`

downsampling.n_reads

label:	Number of reads
type:	`basic:integer`
default:	`1000000`

downsampling.advanced.seed

label:	Seed
type:	`basic:integer`
default:	`11`

downsampling.advanced.fraction

label:	Fraction
type:	`basic:decimal`
description:	Use the fraction of reads [0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
required:	False

downsampling.advanced.two_pass

label:	2-pass mode
type:	`basic:boolean`
description:	Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
default:	`False`

qc.rrna_reference

label:	Indexed rRNA reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

qc.globin_reference

label:	Indexed Globin reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

BBDuk - Salmon - QC (paired-end)¶

data:workflow:rnaseq:salmonworkflow-bbduk-salmon-qc-paired (data:reads:fastq:paired reads, data:index:salmon salmon_index, data:genomeindex:star genome, data:annotation annotation, data:genomeindex:star rrna_reference, data:genomeindex:star globin_reference, basic:boolean show_advanced, list:data:seq:nucleotide adapters, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, basic:boolean seq_bias, basic:boolean gc_bias, basic:boolean validate_mappings, basic:decimal consensus_slack, basic:decimal min_score_fraction, basic:integer range_factorization_bins, basic:integer min_assigned_frag, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v1.0.1]

Alignment-free RNA-seq pipeline. Salmon tool and tximport package are used in quantification step to produce gene-level abundance estimates. rRNA and globin-sequence contamination rate in the sample is determined using STAR aligner. Quality-trimmed reads are down-sampled (using Seqtk tool) and aligned to the genome, rRNA and globin reference sequences. The rRNA and globin-sequence alignment rates indicate the percentage of the reads in the sample that are of rRNA and globin origin, respectively. Alignment of down-sampled data to a whole genome reference sequence is used to produce an alignment file suitable for Samtools and QoRTs QC analysis. Per-sample analysis results and QC data is summarized by the MultiQC tool.

reads

label:	Select sample(s)
type:	`data:reads:fastq:paired`

salmon_index

label:	Salmon index
type:	`data:index:salmon`

genome

label:	Indexed reference genome
type:	`data:genomeindex:star`
description:	Genome index prepared by STAR aligner indexing tool.

annotation

label:	Annotation
type:	`data:annotation`

rrna_reference

label:	Indexed rRNA reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

globin_reference

label:	Indexed Globin reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

show_advanced

label:	Show advanced parameters
type:	`basic:boolean`
default:	`False`

preprocessing.adapters

label:	Adapters
type:	`list:data:seq:nucleotide`
required:	False

preprocessing.custom_adapter_sequences

label:	Custom adapter sequences [literal]
type:	`list:basic:string`
description:	Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:	False
default:	`[]`

preprocessing.kmer_length

label:	K-mer length
type:	`basic:integer`
description:	K-mer length must be smaller or equal to the length of adapters.
default:	`23`

preprocessing.min_k

label:	Minimum k-mer length at right end of reads used for trimming
type:	`basic:integer`
disabled:	preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0
default:	`11`

preprocessing.hamming_distance

label:	Maximum Hamming distance for k-mers
type:	`basic:integer`
default:	`1`

preprocessing.maxns

label:	Max Ns after trimming [maxns=-1]
type:	`basic:integer`
description:	If non-negative, reads with more Ns than this (after trimming) will be discarded.
default:	`-1`

preprocessing.trim_quality

label:	Quality below which to trim reads from the right end
type:	`basic:integer`
description:	Phred algorithm is used, which is more accurate than naive trimming.
default:	`10`

preprocessing.min_length

label:	Minimum read length
type:	`basic:integer`
description:	Reads shorter than minimum read length after trimming are discarded.
default:	`20`

quantification.seq_bias

label:	Perform sequence-specific bias correction
type:	`basic:boolean`
default:	`True`

quantification.gc_bias

label:	Perform fragment GC bias correction.
type:	`basic:boolean`
default:	`True`

quantification.validate_mappings

label:	Validate mappings using alignment-based verification.
type:	`basic:boolean`
default:	`True`

quantification.consensus_slack

label:	Consensus slack
type:	`basic:decimal`
description:	The amount of slack allowed in the quasi-mapping consensus mechanism. Normally, a transcript must cover all hits to be considered for mapping. If this is set to a fraction, X, greater than 0 (and in [0,1)), then a transcript can fail to cover up to (100 * X)% of the hits before it is discounted as a mapping candidate. The default value of this option is 0.2 if –validateMappings is given and 0 otherwise”.
required:	False
hidden:	!quantification.validate_mappings

quantification.min_score_fraction

label:	Minimum alignment score fraction
type:	`basic:decimal`
description:	The fraction of the optimal possible alignment score that a mapping must achieve in order to be considered valid - should be in (0,1].
hidden:	!quantification.validate_mappings
default:	`0.65`

quantification.range_factorization_bins

label:	Range factorization bins
type:	`basic:integer`
description:	Factorizes the likelihood used in quantification by adopting a new notion of equivalence classes based on the conditional probabilities with which fragments are generated from different transcripts. This is a more fine-grained factorization than the normal rich equivalence classes. The default value (0) corresponds to the standard rich equivalence classes, and larger values imply a more fine-grained factorization. If range factorization is enabled, a common value to select for this parameter is 4.
default:	`4`

quantification.min_assigned_frag

label:	Minimum number of assigned fragments
type:	`basic:integer`
description:	The minimum number of fragments that must be assigned to the transcriptome for quantification to proceed.
default:	`10`

downsampling.n_reads

label:	Number of reads
type:	`basic:integer`
default:	`10000000`

downsampling.seed

label:	Seed
type:	`basic:integer`
default:	`11`

downsampling.fraction

label:	Fraction
type:	`basic:decimal`
description:	Use the fraction of reads [0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
required:	False

downsampling.two_pass

label:	2-pass mode
type:	`basic:boolean`
description:	Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
default:	`False`

BBDuk - Salmon - QC (single-end)¶

data:workflow:rnaseq:salmonworkflow-bbduk-salmon-qc-single (data:reads:fastq:single reads, data:index:salmon salmon_index, data:genomeindex:star genome, data:annotation annotation, data:genomeindex:star rrna_reference, data:genomeindex:star globin_reference, basic:boolean show_advanced, list:data:seq:nucleotide adapters, list:basic:string custom_adapter_sequences, basic:integer kmer_length, basic:integer min_k, basic:integer hamming_distance, basic:integer maxns, basic:integer trim_quality, basic:integer min_length, basic:boolean seq_bias, basic:boolean gc_bias, basic:boolean validate_mappings, basic:decimal consensus_slack, basic:decimal min_score_fraction, basic:integer range_factorization_bins, basic:integer min_assigned_frag, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v1.0.1]

Alignment-free RNA-seq pipeline. Salmon tool and tximport package are used in quantification step to produce gene-level abundance estimates. rRNA and globin-sequence contamination rate in the sample is determined using STAR aligner. Quality-trimmed reads are down-sampled (using Seqtk tool) and aligned to the genome, rRNA and globin reference sequences. The rRNA and globin-sequence alignment rates indicate the percentage of the reads in the sample that are of rRNA and globin origin, respectively. Alignment of down-sampled data to a whole genome reference sequence is used to produce an alignment file suitable for Samtools and QoRTs QC analysis. Per-sample analysis results and QC data is summarized by the MultiQC tool.

reads

label:	Select sample(s)
type:	`data:reads:fastq:single`

salmon_index

label:	Salmon index
type:	`data:index:salmon`

genome

label:	Indexed reference genome
type:	`data:genomeindex:star`
description:	Genome index prepared by STAR aligner indexing tool.

annotation

label:	Annotation
type:	`data:annotation`

rrna_reference

label:	Indexed rRNA reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

globin_reference

label:	Indexed Globin reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

show_advanced

label:	Show advanced parameters
type:	`basic:boolean`
default:	`False`

preprocessing.adapters

label:	Adapters
type:	`list:data:seq:nucleotide`
required:	False

preprocessing.custom_adapter_sequences

label:	Custom adapter sequences [literal]
type:	`list:basic:string`
description:	Custom adapter sequences can be specified by inputting them one by one and pressing Enter after each sequence.
required:	False
default:	`[]`

preprocessing.kmer_length

label:	K-mer length
type:	`basic:integer`
description:	K-mer length must be smaller or equal to the length of adapters.
default:	`23`

preprocessing.min_k

label:	Minimum k-mer length at right end of reads used for trimming
type:	`basic:integer`
disabled:	preprocessing.adapters.length === 0 && preprocessing.custom_adapter_sequences.length === 0
default:	`11`

preprocessing.hamming_distance

label:	Maximum Hamming distance for k-mers
type:	`basic:integer`
default:	`1`

preprocessing.maxns

label:	Max Ns after trimming [maxns=-1]
type:	`basic:integer`
description:	If non-negative, reads with more Ns than this (after trimming) will be discarded.
default:	`-1`

preprocessing.trim_quality

label:	Quality below which to trim reads from the right end
type:	`basic:integer`
description:	Phred algorithm is used, which is more accurate than naive trimming.
default:	`10`

preprocessing.min_length

label:	Minimum read length
type:	`basic:integer`
description:	Reads shorter than minimum read length after trimming are discarded.
default:	`20`

quantification.seq_bias

label:	Perform sequence-specific bias correction
type:	`basic:boolean`
default:	`True`

quantification.gc_bias

label:	Perform fragment GC bias correction.
type:	`basic:boolean`
default:	`False`

quantification.validate_mappings

label:	Validate mappings using alignment-based verification.
type:	`basic:boolean`
default:	`True`

quantification.consensus_slack

label:	Consensus slack
type:	`basic:decimal`
description:	The amount of slack allowed in the quasi-mapping consensus mechanism. Normally, a transcript must cover all hits to be considered for mapping. If this is set to a fraction, X, greater than 0 (and in [0,1)), then a transcript can fail to cover up to (100 * X)% of the hits before it is discounted as a mapping candidate. The default value of this option is 0.2 if –validateMappings is given and 0 otherwise”.
required:	False
hidden:	!quantification.validate_mappings

quantification.min_score_fraction

label:	Minimum alignment score fraction
type:	`basic:decimal`
description:	The fraction of the optimal possible alignment score that a mapping must achieve in order to be considered valid - should be in (0,1].
hidden:	!quantification.validate_mappings
default:	`0.65`

quantification.range_factorization_bins

label:	Range factorization bins
type:	`basic:integer`
description:	Factorizes the likelihood used in quantification by adopting a new notion of equivalence classes based on the conditional probabilities with which fragments are generated from different transcripts. This is a more fine-grained factorization than the normal rich equivalence classes. The default value (0) corresponds to the standard rich equivalence classes, and larger values imply a more fine-grained factorization. If range factorization is enabled, a common value to select for this parameter is 4.
default:	`4`

quantification.min_assigned_frag

label:	Minimum number of assigned fragments
type:	`basic:integer`
description:	The minimum number of fragments that must be assigned to the transcriptome for quantification to proceed.
default:	`10`

downsampling.n_reads

label:	Number of reads
type:	`basic:integer`
default:	`10000000`

downsampling.seed

label:	Seed
type:	`basic:integer`
default:	`11`

downsampling.fraction

label:	Fraction
type:	`basic:decimal`
description:	Use the fraction of reads [0 - 1.0] from the original input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
required:	False

downsampling.two_pass

label:	2-pass mode
type:	`basic:boolean`
description:	Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
default:	`False`

BED file¶

data:bedupload-bed (basic:file src, basic:string species, basic:string build)[Source: v1.3.1]

Import a BED file (.bed) which is a tab-delimited text file that defines a feature track. It can have any file extension, but .bed is recommended. The BED file format is described on the [UCSC Genome Bioinformatics web site](http://genome.ucsc.edu/FAQ/FAQformat#format1).

src

label:	BED file
type:	`basic:file`
description:	Upload BED file annotation track. The first three required BED fields are chrom, chromStart and chromEnd.
required:	True
validate_regex:	`\.(bed\|narrowPeak)$`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

build

label:	Genome build
type:	`basic:string`

bed

label:	BED file
type:	`basic:file`

bed_jbrowse

label:	Bgzip bed file for JBrowse
type:	`basic:file`

tbi_jbrowse

label:	Bed file index for Jbrowse
type:	`basic:file`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

BWA ALN¶

data:alignment:bam:bwaalnalignment-bwa-aln (data:genome:fasta genome, data:reads:fastq reads, basic:integer q, basic:boolean use_edit, basic:integer edit_value, basic:decimal fraction, basic:boolean seeds, basic:integer seed_length, basic:integer seed_dist)[Source: v1.5.0]

Read aligner for mapping low-divergent sequences against a large reference genome. Designed for Illumina sequence reads up to 100bp.

genome

label:	Reference genome
type:	`data:genome:fasta`

reads

label:	Reads
type:	`data:reads:fastq`

q

label:	Quality threshold
type:	`basic:integer`
description:	Parameter for dynamic read trimming.
default:	`0`

use_edit

label:	Use maximum edit distance (excludes fraction of missing alignments)
type:	`basic:boolean`
default:	`False`

edit_value

label:	Maximum edit distance
type:	`basic:integer`
hidden:	!use_edit
default:	`5`

fraction

label:	Fraction of missing alignments
type:	`basic:decimal`
description:	The fraction of missing alignments given 2% uniform base error rate. The maximum edit distance is automatically chosen for different read lengths.
hidden:	use_edit
default:	`0.04`

seeds

label:	Use seeds
type:	`basic:boolean`
default:	`False`

seed_length

label:	Seed length
type:	`basic:integer`
description:	Take the first X subsequence as seed. If X is larger than the query sequence, seeding will be disabled. For long reads, this option is typically ranged from 25 to 35 for value 2 in seed maximum edit distance.
hidden:	!seeds
default:	`35`

seed_dist

label:	Seed maximum edit distance
type:	`basic:integer`
hidden:	!seeds
default:	`2`

bam

label:	Alignment file
type:	`basic:file`
description:	Position sorted alignment

bai

label:	Index BAI
type:	`basic:file`

unmapped

label:	Unmapped reads
type:	`basic:file`
required:	False

stats

label:	Statistics
type:	`basic:file`

bigwig

label:	BigWig file
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

BWA MEM¶

data:alignment:bam:bwamemalignment-bwa-mem (data:genome:fasta genome, data:reads:fastq reads, basic:integer seed_l, basic:integer band_w, basic:decimal re_seeding, basic:boolean m, basic:integer match, basic:integer missmatch, basic:integer gap_o, basic:integer gap_e, basic:integer clipping, basic:integer unpaired_p, basic:boolean report_all, basic:integer report_tr)[Source: v2.3.0]

BWA MEM is a read aligner for mapping low-divergent sequences against a large reference genome. Designed for longer sequences ranged from 70bp to 1Mbp. The algorithm works by seeding alignments with maximal exact matches (MEMs) and then extending seeds with the affine-gap Smith-Waterman algorithm (SW). See [here](http://bio-bwa.sourceforge.net/) for more information.

genome

label:	Reference genome
type:	`data:genome:fasta`

reads

label:	Reads
type:	`data:reads:fastq`

seed_l

label:	Minimum seed length
type:	`basic:integer`
description:	Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates from 20.
default:	`19`

band_w

label:	Band width
type:	`basic:integer`
description:	Gaps longer than this will not be found.
default:	`100`

re_seeding

label:	Re-seeding factor
type:	`basic:decimal`
description:	Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.
default:	`1.5`

m

label:	Mark shorter split hits as secondary
type:	`basic:boolean`
description:	Mark shorter split hits as secondary (for Picard compatibility)
default:	`False`

scoring.match

label:	Score of a match
type:	`basic:integer`
default:	`1`

scoring.missmatch

label:	Mismatch penalty
type:	`basic:integer`
default:	`4`

scoring.gap_o

label:	Gap open penalty
type:	`basic:integer`
default:	`6`

scoring.gap_e

label:	Gap extension penalty
type:	`basic:integer`
default:	`1`

scoring.clipping

label:	Clipping penalty
type:	`basic:integer`
description:	Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)
default:	`5`

scoring.unpaired_p

label:	Penalty for an unpaired read pair
type:	`basic:integer`
description:	Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty
default:	`9`

reporting.report_all

label:	Report all found alignments
type:	`basic:boolean`
description:	Output all found alignments for single-end or unpaired paired-end reads. These alignments will be flagged as secondary alignments.
default:	`False`

reporting.report_tr

label:	Report threshold score
type:	`basic:integer`
description:	Don’t output alignment with score lower than defined number. This option only affects output.
default:	`30`

bam

label:	Alignment file
type:	`basic:file`
description:	Position sorted alignment

bai

label:	Index BAI
type:	`basic:file`

unmapped

label:	Unmapped reads
type:	`basic:file`
required:	False

stats

label:	Statistics
type:	`basic:file`

bigwig

label:	BigWig file
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

BWA SW¶

data:alignment:bam:bwaswalignment-bwa-sw (data:genome:fasta genome, data:reads:fastq reads, basic:integer match, basic:integer missmatch, basic:integer gap_o, basic:integer gap_e)[Source: v1.4.0]

Read aligner for mapping low-divergent sequences against a large reference genome. Designed for longer sequences ranged from 70bp to 1Mbp. The paired-end mode only works for reads Illumina short-insert libraries.

genome

label:	Reference genome
type:	`data:genome:fasta`

reads

label:	Reads
type:	`data:reads:fastq`

match

label:	Score of a match
type:	`basic:integer`
default:	`1`

missmatch

label:	Mismatch penalty
type:	`basic:integer`
default:	`3`

gap_o

label:	Gap open penalty
type:	`basic:integer`
default:	`5`

gap_e

label:	Gap extension penalty
type:	`basic:integer`
default:	`2`

bam

label:	Alignment file
type:	`basic:file`
description:	Position sorted alignment

bai

label:	Index BAI
type:	`basic:file`

unmapped

label:	Unmapped reads
type:	`basic:file`
required:	False

stats

label:	Statistics
type:	`basic:file`

bigwig

label:	BigWig file
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Bam split¶

data:alignment:bam:primarybam-split (data:alignment:bam bam, data:sam:header header, data:sam:header header2)[Source: v0.5.0]

Split hybrid bam file into two bam files.

bam

label:	Hybrid alignment bam
type:	`data:alignment:bam`

header

label:	Primary header sam file (optional)
type:	`data:sam:header`
description:	If no header file is provided, the headers will be extracted from the hybrid alignment bam file.
required:	False

header2

label:	Secondary header sam file (optional)
type:	`data:sam:header`
description:	If no header file is provided, the headers will be extracted from the hybrid alignment bam file.
required:	False

bam

label:	Uploaded file
type:	`basic:file`

bai

label:	Index BAI
type:	`basic:file`

bigwig

label:	BigWig file
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Bamliquidator¶

data:bam:plot:bamliquidatorbamliquidator (basic:string analysis_type, list:data:alignment:bam bam, basic:string cell_type, basic:integer bin_size, data:annotation:gtf regions_gtf, data:bed regions_bed, basic:integer extension, basic:string sense, basic:boolean skip_plot, list:basic:string black_list, basic:integer threads)[Source: v0.2.1]

Set of tools for analyzing the density of short DNA sequence read alignments in the BAM file format.

analysis_type

label:	Analysis type
type:	`basic:string`
default:	`bin`
choices:	Bin mode: `bin` Region mode: `region` BED mode: `bed`

bam

label:	BAM File
type:	`list:data:alignment:bam`

cell_type

label:	Cell type
type:	`basic:string`
default:	`cell_type`

bin_size

label:	Bin size
type:	`basic:integer`
description:	Number of base pairs in each bin. The smaller the bin size the longer the runtime and the larger the data files. Default is 100000.
required:	False
hidden:	analysis_type != ‘bin’

regions_gtf

label:	Region gff file / Annotation file (.gff\|.gtf)
type:	`data:annotation:gtf`
required:	False
hidden:	analysis_type != ‘region’

regions_bed

label:	Region bed file / Annotation file (.bed)
type:	`data:bed`
required:	False
hidden:	analysis_type != ‘bed’

extension

label:	Extension
type:	`basic:integer`
description:	Extends reads by number of bp
default:	`200`

sense

label:	Mapping strand to gff file
type:	`basic:string`
default:	`.`
choices:	Forward: `+` Reverse: `-` Both: `.`

skip_plot

label:	Skip plot
type:	`basic:boolean`
required:	False

black_list

label:	Black list
type:	`list:basic:string`
description:	One or more chromosome patterns to skip during bin liquidation. Default is to skip any chromosomes that contain any of the following substrings chrUn _random Zv9_ _hap.
required:	False

threads

label:	Threads
type:	`basic:integer`
description:	Number of threads to run concurrently during liquidation.
default:	`1`

analysis_type

label:	Analysis type
type:	`basic:string`
hidden:	True

output_dir

label:	Output directory
type:	`basic:file`

counts

label:	Counts HDF5 file
type:	`basic:file`

matrix

label:	Matrix file
type:	`basic:file`
required:	False
hidden:	analysis_type != ‘region’

summary

label:	Summary file
type:	`basic:file:html`
required:	False
hidden:	analysis_type != ‘bin’

Bamplot¶

data:bam:plot:bamplotbamplot (basic:string genome, data:annotation:gtf input_gff, basic:string input_region, list:data:alignment:bam bam, basic:integer stretch_input, basic:string color, basic:string sense, basic:integer extension, basic:boolean rpm, basic:string yscale, list:basic:string names, basic:string plot, basic:string title, basic:string scale, list:data:bed bed, basic:boolean multi_page)[Source: v1.3.1]

Plot a single locus from a bam.

genome

label:	Genome
type:	`basic:string`
choices:	HG19: `HG19` HG18: `HG18` MM8: `MM8` MM9: `MM9` MM10: `MM10` RN6: `RN6` RN4: `RN4`

input_gff

label:	Region string
type:	`data:annotation:gtf`
description:	Enter .gff file.
required:	False

input_region

label:	Region string
type:	`basic:string`
description:	Enter genomic region e.g. chr1:+:1-1000.
required:	False

bam

label:	Bam
type:	`list:data:alignment:bam`
description:	bam to plot from
required:	False

stretch_input

label:	Stretch-input
type:	`basic:integer`
description:	Stretch the input regions to a minimum length in bp, e.g. 10000 (for 10kb).
required:	False

color

label:	Color
type:	`basic:string`
description:	Enter a colon separated list of colors e.g. 255,0,0:255,125,0, default samples the rainbow.
default:	`255,0,0:255,125,0`

sense

label:	Sense
type:	`basic:string`
description:	Map to forward, reverse or’both strands. Default maps to both.
default:	`both`
choices:	Forward: `forward` Reverse: `reverse` Both: `both`

extension

label:	Extension
type:	`basic:integer`
description:	Extends reads by n bp. Default value is 200bp.
default:	`200`

rpm

label:	rpm
type:	`basic:boolean`
description:	Normalizes density to reads per million (rpm) Default is False.
required:	False

yscale

label:	y scale
type:	`basic:string`
description:	Choose either relative or uniform y axis scaling. Default is relative scaling.
default:	`relative`
choices:	relative: `relative` uniform: `uniform`

names

label:	Names
type:	`list:basic:string`
description:	Enter a comma separated list of names for your bams.
required:	False

plot

label:	Single or multiple polt
type:	`basic:string`
description:	Choose either all lines on a single plot or multiple plots.
default:	`merge`
choices:	single: `single` multiple: `multiple` merge: `merge`

title

label:	Title
type:	`basic:string`
description:	Specify a title for the output plot(s), default will be the coordinate region.
default:	`output`

scale

label:	Scale
type:	`basic:string`
description:	Enter a comma separated list of multiplicative scaling factors for your bams. Default is none.
required:	False

bed

label:	Bed
type:	`list:data:bed`
description:	Add a space-delimited list of bed files to plot.
required:	False

multi_page

label:	Multi page
type:	`basic:boolean`
description:	If flagged will create a new pdf for each region.
default:	`False`

plot

label:	region plot
type:	`basic:file`

BaseSpace file¶

data:filebasespace-file-import (basic:string file_id, basic:secret access_token_secret)[Source: v1.1.0]

Import a file from Illumina BaseSpace.

file_id

label:	BaseSpace file ID
type:	`basic:string`

access_token_secret

label:	BaseSpace access token
type:	`basic:secret`
description:	BaseSpace access token secret handle needed to download the file.

file

label:	File
type:	`basic:file`

Bowtie (Dicty)¶

data:alignment:bam:bowtie1alignment-bowtie (data:genome:fasta genome, data:reads:fastq reads, basic:string mode, basic:integer m, basic:integer l, basic:boolean use_se, basic:integer trim_5, basic:integer trim_3, basic:integer trim_nucl, basic:integer trim_iter, basic:string r)[Source: v1.5.0]

An ultrafast memory-efficient short read aligner.

genome

label:	Reference genome
type:	`data:genome:fasta`

reads

label:	Reads
type:	`data:reads:fastq`

mode

label:	Alignment mode
type:	`basic:string`
description:	When the -n option is specified (which is the default), bowtie determines which alignments are valid according to the following policy, which is similar to Maq’s default policy. 1. Alignments may have no more than N mismatches (where N is a number 0-3, set with -n) in the first L bases (where L is a number 5 or greater, set with -l) on the high-quality (left) end of the read. The first L bases are called the “seed”. 2. The sum of the Phred quality values at all mismatched positions (not just in the seed) may not exceed E (set with -e). Where qualities are unavailable (e.g. if the reads are from a FASTA file), the Phred quality defaults to 40. In -v mode, alignments may have no more than V mismatches, where V may be a number from 0 through 3 set using the -v option. Quality values are ignored. The -v option is mutually exclusive with the -n option.
default:	`-n`
choices:	Use qualities (-n): `-n` Use mismatches (-v): `-v`

m

label:	Allowed mismatches
type:	`basic:integer`
description:	When used with “Use qualities (-n)” it is the maximum number of mismatches permitted in the “seed”, i.e. the first L base pairs of the read (where L is set with -l/–seedlen). This may be 0, 1, 2 or 3 and the default is 2 When used with “Use mismatches (-v)” report alignments with at most <int> mismatches.
default:	`2`

l

label:	Seed length (for -n only)
type:	`basic:integer`
description:	Only for “Use qualities (-n)”. Seed length (-l) is the number of bases on the high-quality end of the read to which the -n ceiling applies. The lowest permitted setting is 5 and the default is 28. bowtie is faster for larger values of -l.
default:	`28`

use_se

label:	Map as single-ended (for paired end reads only)
type:	`basic:boolean`
description:	If this option is selected paired-end reads will be mapped as single-ended.
default:	`False`

start_trimming.trim_5

label:	Bases to trim from 5’
type:	`basic:integer`
description:	Number of bases to trim from from 5’ (left) end of each read before alignment
default:	`0`

start_trimming.trim_3

label:	Bases to trim from 3’
type:	`basic:integer`
description:	Number of bases to trim from from 3’ (right) end of each read before alignment
default:	`0`

trimming.trim_nucl

label:	Bases to trim
type:	`basic:integer`
description:	Number of bases to trim from 3’ end in each iteration.
default:	`2`

trimming.trim_iter

label:	Iterations
type:	`basic:integer`
description:	Number of iterations.
default:	`0`

reporting.r

label:	Reporting mode
type:	`basic:string`
description:	Report up to <int> valid alignments per read or pair (-k) (default: 1). Validity of alignments is determined by the alignment policy (combined effects of -n, -v, -l, and -e). If more than one valid alignment exists and the –best and –strata options are specified, then only those alignments belonging to the best alignment “stratum” will be reported. Bowtie is designed to be very fast for small -k but bowtie can become significantly slower as -k increases. If you would like to use Bowtie for larger values of -k, consider building an index with a denser suffix-array sample, i.e. specify a smaller -o/–offrate when invoking bowtie-build for the relevant index (see the Performance tuning section for details).
default:	`-a -m 1 --best --strata`
choices:	Report unique alignments: `-a -m 1 --best --strata` Report all alignments: `-a --best` Report all alignments in the best stratum: `-a --best --strata`

bam

label:	Alignment file
type:	`basic:file`
description:	Position sorted alignment

bai

label:	Index BAI
type:	`basic:file`

unmapped

label:	Unmapped reads
type:	`basic:file`
required:	False

stats

label:	Statistics
type:	`basic:file`

bigwig

label:	BigWig file
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Bowtie2¶

data:alignment:bam:bowtie2alignment-bowtie2 (data:genome:fasta genome, data:reads:fastq reads, basic:string mode, basic:string speed, basic:boolean use_se, basic:boolean discordantly, basic:boolean rep_se, basic:integer minins, basic:integer maxins, basic:integer N, basic:integer L, basic:integer gbar, basic:string mp, basic:string rdg, basic:string rfg, basic:string score_min, basic:integer trim_5, basic:integer trim_3, basic:integer trim_iter, basic:integer trim_nucl, basic:string rep_mode, basic:integer k_reports)[Source: v1.6.0]

Bowtie is an ultrafast, memory-efficient short read aligner. It aligns short DNA sequences (reads) to the human genome at a rate of over 25 million 35-bp reads per hour. Bowtie indexes the genome with a Burrows-Wheeler index to keep its memory footprint small–typically about 2.2 GB for the human genome (2.9 GB for paired-end). See [here](http://bowtie-bio.sourceforge.net/index.shtml) for more information.

genome

label:	Reference genome
type:	`data:genome:fasta`

reads

label:	Reads
type:	`data:reads:fastq`

mode

label:	Alignment mode
type:	`basic:string`
description:	End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.
default:	`--end-to-end`
choices:	end to end mode: `--end-to-end` local: `--local`

speed

label:	Speed vs. Sensitivity
type:	`basic:string`
description:	A quick setting for aligning fast or accurately. This option is a shortcut for parameters as follows: For –end-to-end: –very-fast -D 5 -R 1 -N 0 -L 22 -i S,0,2.50 –fast -D 10 -R 2 -N 0 -L 22 -i S,0,2.50 –sensitive -D 15 -R 2 -N 0 -L 22 -i S,1,1.15 (default) –very-sensitive -D 20 -R 3 -N 0 -L 20 -i S,1,0.50 For –local: –very-fast-local -D 5 -R 1 -N 0 -L 25 -i S,1,2.00 –fast-local -D 10 -R 2 -N 0 -L 22 -i S,1,1.75 –sensitive-local -D 15 -R 2 -N 0 -L 20 -i S,1,0.75 (default) –very-sensitive-local -D 20 -R 3 -N 0 -L 20 -i S,1,0.50
required:	False
choices:	Very fast: `--very-fast` Fast: `--fast` Sensitive: `--sensitive` Very sensitive: `--very-sensitive`

PE_options.use_se

label:	Map as single-ended (for paired-end reads only)
type:	`basic:boolean`
description:	If this option is selected paired-end reads will be mapped as single-ended and other paired-end options are ignored.
default:	`False`

PE_options.discordantly

label:	Report discordantly matched read
type:	`basic:boolean`
description:	If both mates have unique alignments, but the alignments do not match paired-end expectations (orientation and relative distance) then alignment will be reported. Useful for detecting structural variations.
default:	`True`

PE_options.rep_se

label:	Report single ended
type:	`basic:boolean`
description:	If paired alignment can not be found Bowtie2 tries to find alignments for the individual mates.
default:	`True`

PE_options.minins

label:	Minimal distance
type:	`basic:integer`
description:	The minimum fragment length for valid paired-end alignments. 0 imposes no minimum.
default:	`0`

PE_options.maxins

label:	Maximal distance
type:	`basic:integer`
description:	The maximum fragment length for valid paired-end alignments.
default:	`500`

alignment_options.N

label:	Number of mismatches allowed in seed alignment (N)
type:	`basic:integer`
description:	Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.
required:	False

alignment_options.L

label:	Length of seed substrings (L)
type:	`basic:integer`
description:	Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the –sensitive preset is used by default for end-to-end alignment and –sensitive-local for local alignment. See documentation for details.
required:	False

alignment_options.gbar

label:	Disallow gaps within positions (gbar)
type:	`basic:integer`
description:	Disallow gaps within <int> positions of the beginning or end of the read. Default: 4.
required:	False

alignment_options.mp

label:	Maximal and minimal mismatch penalty (mp)
type:	`basic:string`
description:	Sets the maximum (MX) and minimum (MN) mismatch penalties, both integers. A number less than or equal to MX and greater than or equal to MN is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an N. If –ignore-quals is specified, the number subtracted quals MX. Otherwise, the number subtracted is MN + floor((MX-MN)(MIN(Q, 40.0)/40.0)) where Q is the Phred quality value. Default for MX, MN: 6,2.
required:	False

alignment_options.rdg

label:	Set read gap open and extend penalties (rdg)
type:	`basic:string`
description:	Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.
required:	False

alignment_options.rfg

label:	Set reference gap open and close penalties (rfg)
type:	`basic:string`
description:	Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.
required:	False

alignment_options.score_min

label:	Minimum alignment score needed for “valid” alignment (score_min)
type:	`basic:string`
description:	Sets a function governing the minimum alignment score needed for an alignment to be considered “valid” (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.6 sets the minimum-score function to f(x) = 0 + -0.6 * x, where x is the read length. The default in –end-to-end mode is L,-0.6,-0.6 and the default in –local mode is G,20,8.
required:	False

start_trimming.trim_5

label:	Bases to trim from 5’
type:	`basic:integer`
description:	Number of bases to trim from from 5’ (left) end of each read before alignment
default:	`0`

start_trimming.trim_3

label:	Bases to trim from 3’
type:	`basic:integer`
description:	Number of bases to trim from from 3’ (right) end of each read before alignment
default:	`0`

trimming.trim_iter

label:	Iterations
type:	`basic:integer`
description:	Number of iterations.
default:	`0`

trimming.trim_nucl

label:	Bases to trim
type:	`basic:integer`
description:	Number of bases to trim from 3’ end in each iteration.
default:	`2`

reporting.rep_mode

label:	Report mode
type:	`basic:string`
description:	Default mode: search for multiple alignments, report the best one; -k mode: search for one or more alignments, report each; -a mode: search for and report all alignments
default:	`def`
choices:	Default mode: `def` -k mode: `k` -a mode (very slow): `a`

reporting.k_reports

label:	Number of reports (for -k mode only)
type:	`basic:integer`
description:	Searches for at most X distinct, valid alignments for each read. The search terminates when it can’t find more distinct valid alignments, or when it finds X, whichever happens first. default: 5
default:	`5`

bam

label:	Alignment file
type:	`basic:file`
description:	Position sorted alignment

bai

label:	Index BAI
type:	`basic:file`

unmapped

label:	Unmapped reads
type:	`basic:file`
required:	False

stats

label:	Statistics
type:	`basic:file`

bigwig

label:	BigWig file
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

ChIP-Seq (Gene Score)¶

data:chipseq:genescorechipseq-genescore (data:chipseq:peakscore peakscore, basic:decimal fdr, basic:decimal pval, basic:decimal logratio)[Source: v1.1.1]

Chip-Seq analysis - Gene Score (BCM)

peakscore

label:	PeakScore file
type:	`data:chipseq:peakscore`
description:	PeakScore file

fdr

label:	FDR threshold
type:	`basic:decimal`
description:	FDR threshold value (default = 0.00005).
default:	`5e-05`

pval

label:	Pval threshold
type:	`basic:decimal`
description:	Pval threshold value (default = 0.00005).
default:	`5e-05`

logratio

label:	Log-ratio threshold
type:	`basic:decimal`
description:	Log-ratio threshold value (default = 2).
default:	`2.0`

genescore

label:	Gene Score
type:	`basic:file`

ChIP-Seq (Peak Score)¶

data:chipseq:peakscorechipseq-peakscore (data:chipseq:callpeak:macs2 peaks, data:bed bed)[Source: v2.1.0]

Chip-Seq analysis - Peak Score (BCM)

peaks

label:	MACS2 results
type:	`data:chipseq:callpeak:macs2`
description:	MACS2 results file (NarrowPeak)

bed

label:	BED file
type:	`data:bed`

peak_score

label:	Peak Score
type:	`basic:file`

ChIP-seq (MACS2)¶

data:chipseq:batch:macs2macs2-batch (list:data:alignment:bam alignments, basic:boolean advanced, data:bed promoter, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff)[Source: v1.0.3]

This process runs MACS2 in batch mode. MACS2 analysis is triggered for pairs of samples as defined using treatment-background sample relations. If there are no sample relations defined, each sample is treated individually for the MACS analysis. Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq).

alignments

label:	Aligned reads
type:	`list:data:alignment:bam`
description:	Select multiple treatment/background samples.

advanced

label:	Show advanced options
type:	`basic:boolean`
description:	Inspect and modify parameters.
default:	`False`

promoter

label:	Promoter regions BED file
type:	`data:bed`
description:	BED file containing promoter regions (TSS+-1000bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
required:	False
hidden:	!advanced

tagalign

label:	Use tagAlign files
type:	`basic:boolean`
description:	Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
hidden:	!advanced
default:	`False`

prepeakqc_settings.q_threshold

label:	Quality filtering threshold
type:	`basic:integer`
default:	`30`

prepeakqc_settings.n_sub

label:	Number of reads to subsample
type:	`basic:integer`
default:	`15000000`

prepeakqc_settings.tn5

label:	TN5 shifting
type:	`basic:boolean`
description:	Tn5 transposon shifting. Shift reads on “+” strand by 4bp and reads on “-” strand by 5bp.
default:	`False`

prepeakqc_settings.shift

label:	User-defined cross-correlation peak strandshift
type:	`basic:integer`
description:	If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
required:	False

settings.duplicates

label:	Number of duplicates
type:	`basic:string`
description:	It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
required:	False
hidden:	tagalign
choices:	1: `1` auto: `auto` all: `all`

settings.duplicates_prepeak

label:	Number of duplicates
type:	`basic:string`
description:	It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
required:	False
hidden:	!tagalign
default:	`all`
choices:	1: `1` auto: `auto` all: `all`

settings.qvalue

label:	Q-value cutoff
type:	`basic:decimal`
description:	The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
required:	False
disabled:	settings.pvalue && settings.pvalue_prepeak

settings.pvalue

label:	P-value cutoff
type:	`basic:decimal`
description:	The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
required:	False
disabled:	settings.qvalue
hidden:	tagalign

settings.pvalue_prepeak

label:	P-value cutoff
type:	`basic:decimal`
description:	The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
disabled:	settings.qvalue
hidden:	!tagalign \|\| settings.qvalue
default:	`1e-05`

settings.cap_num

label:	Cap number of peaks by taking top N peaks
type:	`basic:integer`
description:	To keep all peaks set value to 0.
disabled:	settings.broad
default:	`500000`

settings.mfold_lower

label:	MFOLD range (lower limit)
type:	`basic:integer`
description:	This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:	False

settings.mfold_upper

label:	MFOLD range (upper limit)
type:	`basic:integer`
description:	This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:	False

settings.slocal

label:	Small local region
type:	`basic:integer`
description:	Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:	False

settings.llocal

label:	Large local region
type:	`basic:integer`
description:	Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:	False

settings.extsize

label:	extsize
type:	`basic:integer`
description:	While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
required:	False

settings.shift

label:	Shift
type:	`basic:integer`
description:	Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
required:	False

settings.band_width

label:	Band width
type:	`basic:integer`
description:	The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
required:	False

settings.nolambda

label:	Use backgroud lambda as local lambda
type:	`basic:boolean`
description:	With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
default:	`False`

settings.fix_bimodal

label:	Turn on the auto paired-peak model process
type:	`basic:boolean`
description:	Whether turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tags. If set, MACS will be terminated if paired-peak model is failed.
default:	`False`

settings.nomodel

label:	Bypass building the shifting model
type:	`basic:boolean`
description:	While on, MACS will bypass building the shifting model.
hidden:	tagalign
default:	`False`

settings.nomodel_prepeak

label:	Bypass building the shifting model
type:	`basic:boolean`
description:	While on, MACS will bypass building the shifting model.
hidden:	!tagalign
default:	`True`

settings.down_sample

label:	Down-sample
type:	`basic:boolean`
description:	When set, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change. Consider to use ‘randsample’ script before MACS2 runs instead.
default:	`False`

settings.bedgraph

label:	Save fragment pileup and control lambda
type:	`basic:boolean`
description:	If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
default:	`True`

settings.spmr

label:	Save signal per million reads for fragment pileup profiles
type:	`basic:boolean`
disabled:	settings.bedgraph === false
default:	`True`

settings.call_summits

label:	Call summits
type:	`basic:boolean`
description:	MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
default:	`False`

settings.broad

label:	Composite broad regions
type:	`basic:boolean`
description:	When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
disabled:	settings.call_summits === true
default:	`False`

settings.broad_cutoff

label:	Broad cutoff
type:	`basic:decimal`
description:	Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
required:	False
disabled:	settings.call_summits === true \|\| settings.broad !== true

ChIP-seq (MACS2-ROSE2)¶

data:chipseq:batch:macs2macs2-rose2-batch (list:data:alignment:bam alignments, basic:boolean advanced, data:bed promoter, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff, basic:integer tss, basic:integer stitch, data:bed mask)[Source: v1.0.3]

This process runs MACS2 in batch mode. MACS2 analysis is triggered for pairs of samples as defined using treatment-background sample relations. If there are no sample relations defined, each sample is treated individually for the MACS analysis. Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq). For identification of super enhancers R2 uses the Rank Ordering of Super-Enhancers algorithm (ROSE2). This takes the peaks called by RSEG for acetylation and calculates the distances in-between to judge whether they can be considered super-enhancers. The ranked values can be plotted and by locating the inflection point in the resulting graph, super-enhancers can be assigned. It can also be used with the MACS calculated data. See [here](http://younglab.wi.mit.edu/super_enhancer_code.html) for more information.

alignments

label:	Aligned reads
type:	`list:data:alignment:bam`
description:	Select multiple treatment/background samples.

advanced

label:	Show advanced options
type:	`basic:boolean`
description:	Inspect and modify parameters.
default:	`False`

promoter

label:	Promoter regions BED file
type:	`data:bed`
description:	BED file containing promoter regions (TSS+-1000bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
required:	False
hidden:	!advanced

tagalign

label:	Use tagAlign files
type:	`basic:boolean`
description:	Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
hidden:	!advanced
default:	`False`

prepeakqc_settings.q_threshold

label:	Quality filtering threshold
type:	`basic:integer`
default:	`30`

prepeakqc_settings.n_sub

label:	Number of reads to subsample
type:	`basic:integer`
default:	`15000000`

prepeakqc_settings.tn5

label:	TN5 shifting
type:	`basic:boolean`
description:	Tn5 transposon shifting. Shift reads on “+” strand by 4bp and reads on “-” strand by 5bp.
default:	`False`

prepeakqc_settings.shift

label:	User-defined cross-correlation peak strandshift
type:	`basic:integer`
description:	If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
required:	False

settings.duplicates

label:	Number of duplicates
type:	`basic:string`
description:	It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
required:	False
hidden:	tagalign
choices:	1: `1` auto: `auto` all: `all`

settings.duplicates_prepeak

label:	Number of duplicates
type:	`basic:string`
description:	It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
required:	False
hidden:	!tagalign
default:	`all`
choices:	1: `1` auto: `auto` all: `all`

settings.qvalue

label:	Q-value cutoff
type:	`basic:decimal`
description:	The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
required:	False
disabled:	settings.pvalue && settings.pvalue_prepeak

settings.pvalue

label:	P-value cutoff
type:	`basic:decimal`
description:	The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
required:	False
disabled:	settings.qvalue
hidden:	tagalign

settings.pvalue_prepeak

label:	P-value cutoff
type:	`basic:decimal`
description:	The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
disabled:	settings.qvalue
hidden:	!tagalign \|\| settings.qvalue
default:	`1e-05`

settings.cap_num

label:	Cap number of peaks by taking top N peaks
type:	`basic:integer`
description:	To keep all peaks set value to 0.
disabled:	settings.broad
default:	`500000`

settings.mfold_lower

label:	MFOLD range (lower limit)
type:	`basic:integer`
description:	This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:	False

settings.mfold_upper

label:	MFOLD range (upper limit)
type:	`basic:integer`
description:	This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:	False

settings.slocal

label:	Small local region
type:	`basic:integer`
description:	Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:	False

settings.llocal

label:	Large local region
type:	`basic:integer`
description:	Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:	False

settings.extsize

label:	extsize
type:	`basic:integer`
description:	While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
required:	False

settings.shift

label:	Shift
type:	`basic:integer`
description:	Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
required:	False

settings.band_width

label:	Band width
type:	`basic:integer`
description:	The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
required:	False

settings.nolambda

label:	Use backgroud lambda as local lambda
type:	`basic:boolean`
description:	With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
default:	`False`

settings.fix_bimodal

label:	Turn on the auto paired-peak model process
type:	`basic:boolean`
description:	Whether turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tags. If set, MACS will be terminated if paired-peak model is failed.
default:	`False`

settings.nomodel

label:	Bypass building the shifting model
type:	`basic:boolean`
description:	While on, MACS will bypass building the shifting model.
hidden:	tagalign
default:	`False`

settings.nomodel_prepeak

label:	Bypass building the shifting model
type:	`basic:boolean`
description:	While on, MACS will bypass building the shifting model.
hidden:	!tagalign
default:	`True`

settings.down_sample

label:	Down-sample
type:	`basic:boolean`
description:	When set, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change. Consider to use ‘randsample’ script before MACS2 runs instead.
default:	`False`

settings.bedgraph

label:	Save fragment pileup and control lambda
type:	`basic:boolean`
description:	If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
default:	`True`

settings.spmr

label:	Save signal per million reads for fragment pileup profiles
type:	`basic:boolean`
disabled:	settings.bedgraph === false
default:	`True`

settings.call_summits

label:	Call summits
type:	`basic:boolean`
description:	MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
default:	`False`

settings.broad

label:	Composite broad regions
type:	`basic:boolean`
description:	When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
disabled:	settings.call_summits === true
default:	`False`

settings.broad_cutoff

label:	Broad cutoff
type:	`basic:decimal`
description:	Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
required:	False
disabled:	settings.call_summits === true \|\| settings.broad !== true

rose_settings.tss

label:	TSS exclusion
type:	`basic:integer`
description:	Enter a distance from TSS to exclude. 0 = no TSS exclusion
default:	`0`

rose_settings.stitch

label:	Stitch
type:	`basic:integer`
description:	Enter a max linking distance for stitching. If not given, optimal stitching parameter will be determined automatically.
required:	False

rose_settings.mask

label:	Masking BED file
type:	`data:bed`
description:	Mask a set of regions from analysis. Provide a BED of masking regions.
required:	False

Chemical Mutagenesis¶

data:workflow:chemutworkflow-chemut (basic:string analysis_type, data:genome:fasta genome, list:data:alignment:bam parental_strains, list:data:alignment:bam mutant_strains, basic:boolean advanced, basic:boolean br_and_ind_ra, basic:boolean dbsnp, data:variants:vcf known_sites, list:data:variants:vcf known_indels, basic:integer stand_emit_conf, basic:integer stand_call_conf, basic:boolean rf, basic:boolean advanced, basic:integer read_depth)[Source: v0.0.6]

analysis_type

label:	Analysis type
type:	`basic:string`
description:	Choice of the analysis type. Use “SNV” or “INDEL” options to run the GATK analysis only on the haploid portion of the dicty genome. Choose options SNV_CHR2 or INDEL_CHR2 to run the analysis only on the diploid portion of CHR2 (-ploidy 2 -L chr2:2263132-3015703).
default:	`snv`
choices:	SNV: `snv` INDEL: `indel` SNV_CHR2: `snv_chr2` INDEL_CHR2: `indel_chr2`

genome

label:	Reference genome
type:	`data:genome:fasta`

parental_strains

label:	Parental strains
type:	`list:data:alignment:bam`

mutant_strains

label:	Mutant strains
type:	`list:data:alignment:bam`

Vc.advanced

label:	Advanced options
type:	`basic:boolean`
required:	False
default:	`False`

Vc.br_and_ind_ra

label:	Do variant base recalibration and indel realignment
type:	`basic:boolean`
required:	False
hidden:	Vc.advanced === false
default:	`False`

Vc.dbsnp

label:	Use dbSNP file
type:	`basic:boolean`
description:	rsIDs from this file are used to populate the ID column of the output. Also, the DB INFO flag will be set when appropriate. dbSNP is not used in any way for the calculations themselves.
required:	False
hidden:	Vc.advanced === false
default:	`False`

Vc.known_sites

label:	Known sites (dbSNP)
type:	`data:variants:vcf`
required:	False
hidden:	Vc.advanced === false \|\| Vc.br_and_ind_ra === false && Vc.dbsnp === false

Vc.known_indels

label:	Known indels
type:	`list:data:variants:vcf`
required:	False
hidden:	Vc.advanced === false \|\| Vc.br_and_ind_ra === false
default:	`[]`

Vc.stand_emit_conf

label:	Emission confidence threshold
type:	`basic:integer`
description:	The minimum confidence threshold (phred-scaled) at which the program should emit sites that appear to be possibly variant.
required:	False
hidden:	Vc.advanced === false
default:	`10`

Vc.stand_call_conf

label:	Calling confidence threshold
type:	`basic:integer`
description:	The minimum confidence threshold (phred-scaled) at which the program should emit variant sites as called. If a site’s associated genotype has a confidence score lower than the calling threshold, the program will emit the site as filtered and will annotate it as LowQual. This threshold separates high confidence calls from low confidence calls.
required:	False
hidden:	Vc.advanced === false
default:	`30`

Vc.rf

label:	ReasignOneMappingQuality Filter
type:	`basic:boolean`
description:	This read transformer will change a certain read mapping quality to a different value without affecting reads that have other mapping qualities. This is intended primarily for users of RNA-Seq data handling programs such as TopHat, which use MAPQ = 255 to designate uniquely aligned reads. According to convention, 255 normally designates “unknown” quality, and most GATK tools automatically ignore such reads. By reassigning a different mapping quality to those specific reads, users of TopHat and other tools can circumvent this problem without affecting the rest of their dataset.
required:	False
hidden:	Vc.advanced === false
default:	`False`

Vf.advanced

label:	Advanced options
type:	`basic:boolean`
required:	False
default:	`False`

Vf.read_depth

label:	Read depth cutoff
type:	`basic:integer`
description:	The minimum number of replicate reads required for a variant site to be included.
required:	False
hidden:	Vf.advanced === false
default:	`5`

Convert GFF3 to GTF¶

data:annotation:gtfgff-to-gtf (data:annotation:gff3 annotation)[Source: v0.4.0]

Convert GFF3 file to GTF format.

annotation

label:	Annotation (GFF3)
type:	`data:annotation:gff3`
description:	Annotation in GFF3 format.

annot

label:	Converted GTF file
type:	`basic:file`

annot_sorted

label:	Sorted GTF file
type:	`basic:file`

annot_sorted_idx_igv

label:	Igv index for sorted GTF file
type:	`basic:file`

annot_sorted_track_jbrowse

label:	Jbrowse track for sorted GTF
type:	`basic:file`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Convert files to reads (paired-end)¶

data:reads:fastq:pairedfiles-to-fastq-paired (list:data:file src1, list:data:file src2, basic:boolean merge_lanes)[Source: v1.3.0]

Convert FASTQ files to paired-end reads.

src1

label:	Mate1
type:	`list:data:file`

src2

label:	Mate2
type:	`list:data:file`

merge_lanes

label:	Merge lanes
type:	`basic:boolean`
description:	Merge paired-end sample data split into multiple sequencing lanes into a single pair of FASTQ files.
default:	`False`

fastq

label:	Reads file (mate 1)
type:	`list:basic:file`

fastq2

label:	Reads file (mate 2)
type:	`list:basic:file`

fastqc_url

label:	Quality control with FastQC (Upstream)
type:	`list:basic:file:html`

fastqc_url2

label:	Quality control with FastQC (Downstream)
type:	`list:basic:file:html`

fastqc_archive

label:	Download FastQC archive (Upstream)
type:	`list:basic:file`

fastqc_archive2

label:	Download FastQC archive (Downstream)
type:	`list:basic:file`

Convert files to reads (single-end)¶

data:reads:fastq:singlefiles-to-fastq-single (list:data:file src, basic:boolean merge_lanes)[Source: v1.3.0]

Convert FASTQ files to single-end reads.

src

label:	Reads
type:	`list:data:file`
description:	Sequencing reads in FASTQ format

merge_lanes

label:	Merge lanes
type:	`basic:boolean`
description:	Merge sample data split into multiple sequencing lanes into a single FASTQ file.
default:	`False`

fastq

label:	Reads file
type:	`list:basic:file`

fastqc_url

label:	Quality control with FastQC
type:	`list:basic:file:html`

fastqc_archive

label:	Download FastQC archive
type:	`list:basic:file`

Cuffdiff 2.2¶

data:differentialexpression:cuffdiffcuffdiff (list:data:cufflinks:cuffquant case, list:data:cufflinks:cuffquant control, list:basic:string labels, data:annotation annotation, data:genome:fasta genome, basic:boolean multi_read_correct, basic:decimal fdr, basic:string library_type, basic:string library_normalization, basic:string dispersion_method)[Source: v2.3.0]

Cuffdiff finds significant changes in transcript expression, splicing, and promoter use. You can use it to find differentially expressed genes and transcripts, as well as genes that are being differentially regulated at the transcriptional and post-transcriptional level. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffdiff/) and [here](https://software.broadinstitute.org/cancer/software/genepattern/modules/docs/Cuffdiff/7) for more information.

case

label:	Case samples
type:	`list:data:cufflinks:cuffquant`

control

label:	Control samples
type:	`list:data:cufflinks:cuffquant`

labels

label:	Group labels
type:	`list:basic:string`
description:	Define labels for each sample group.
default:	`['control', 'case']`

annotation

label:	Annotation (GTF/GFF3)
type:	`data:annotation`
description:	A transcript annotation file produced by cufflinks, cuffcompare, or other tool.

genome

label:	Run bias detection and correction algorithm
type:	`data:genome:fasta`
description:	Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.
required:	False

multi_read_correct

label:	Do initial estimation procedure to more accurately weight reads with multiple genome mappings
type:	`basic:boolean`
default:	`False`

fdr

label:	Allowed FDR
type:	`basic:decimal`
description:	The allowed false discovery rate. The default is 0.05.
default:	`0.05`

library_type

label:	Library type
type:	`basic:string`
description:	In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads are point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
default:	`fr-unstranded`
choices:	fr-unstranded: `fr-unstranded` fr-firststrand: `fr-firststrand` fr-secondstrand: `fr-secondstrand`

library_normalization

label:	Library normalization method
type:	`basic:string`
description:	You can control how library sizes (i.e. sequencing depths) are normalized in Cufflinks and Cuffdiff. Cuffdiff has several methods that require multiple libraries in order to work. Library normalization methods supported by Cufflinks work on one library at a time.
default:	`geometric`
choices:	geometric: `geometric` classic-fpkm: `classic-fpkm` quartile: `quartile`

dispersion_method

label:	Dispersion method
type:	`basic:string`
description:	Cuffdiff works by modeling the variance in fragment counts across replicates as a function of the mean fragment count across replicates. Strictly speaking, models a quantitity called dispersion - the variance present in a group of samples beyond what is expected from a simple Poisson model of RNA_Seq. You can control how Cuffdiff constructs its model of dispersion in locus fragment counts. Each condition that has replicates can receive its own model, or Cuffdiff can use a global model for all conditions. All of these policies are identical to those used by DESeq (Anders and Huber, Genome Biology, 2010).
default:	`pooled`
choices:	pooled: `pooled` per-condition: `per-condition` blind: `blind` poisson: `poisson`

raw

label:	Differential expression (gene level)
type:	`basic:file`

de_json

label:	Results table (JSON)
type:	`basic:json`

de_file

label:	Results table (file)
type:	`basic:file`

transcript_diff_exp

label:	Differential expression (transcript level)
type:	`basic:file`

tss_group_diff_exp

label:	Differential expression (primary transcript)
type:	`basic:file`

cds_diff_exp

label:	Differential expression (coding sequence)
type:	`basic:file`

cuffdiff_output

label:	Cuffdiff output
type:	`basic:file`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

Cufflinks 2.2¶

data:cufflinks:cufflinkscufflinks (data:alignment:bam alignment, data:annotation annotation, data:genome:fasta genome, data:annotation:gtf mask_file, basic:string library_type, basic:string annotation_usage, basic:boolean multi_read_correct)[Source: v2.2.0]

Cufflinks assembles transcripts, estimates their abundances, and tests for differential expression and regulation in RNA-Seq samples. It accepts aligned RNA-Seq reads and assembles the alignments into a parsimonious set of transcripts. Cufflinks then estimates the relative abundances of these transcripts based on how many reads support each one, taking into account biases in library preparation protocols. See [here](http://cole-trapnell-lab.github.io/cufflinks/) for more information.

alignment

label:	Aligned reads
type:	`data:alignment:bam`

annotation

label:	Annotation (GTF/GFF3)
type:	`data:annotation`
required:	False

genome

label:	Run bias detection and correction algorithm
type:	`data:genome:fasta`
description:	Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.
required:	False

mask_file

label:	Mask file
type:	`data:annotation:gtf`
description:	Ignore all reads that could have come from transcripts in this GTF file. We recommend including any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.
required:	False

library_type

label:	Library type
type:	`basic:string`
description:	In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads are point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
default:	`fr-unstranded`
choices:	fr-unstranded: `fr-unstranded` fr-firststrand: `fr-firststrand` fr-secondstrand: `fr-secondstrand`

annotation_usage

label:	Instruct Cufflinks how to use the provided annotation (GFF/GTF) file
type:	`basic:string`
description:	GTF-guide - tells Cufflinks to use the supplied reference annotation (GFF) to guide RABT assembly. Reference transcripts will be tiled with faux-reads to provide additional information in assembly. Output will include all reference transcripts as well as any novel genes and isoforms that are assembled. –GTF - tells Cufflinks to use the supplied reference annotation (a GFF file) to estimate isoform expression. It will not assemble novel transcripts, and the program will ignore alignments not structurally compatible with any reference transcript.
default:	`--GTF-guide`
choices:	Use supplied reference annotation to guide RABT assembly (–GTF-guide): `--GTF-guide` Use supplied reference annotation to estimate isoform expression (–GTF): `--GTF`

multi_read_correct

label:	Do initial estimation procedure to more accurately weight reads with multiple genome mappings
type:	`basic:boolean`
description:	Run an initial estimation procedure that weights reads mapping to multiple locations more accurately.
default:	`False`

transcripts

label:	Assembled transcript isoforms
type:	`basic:file`

isoforms_fpkm_tracking

label:	Isoforms FPKM tracking
type:	`basic:file`

genes_fpkm_tracking

label:	Genes FPKM tracking
type:	`basic:file`

skipped_loci

label:	Skipped loci
type:	`basic:file`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Cuffmerge¶

data:annotation:cuffmergecuffmerge (list:data:cufflinks:cufflinks expressions, list:data:annotation:gtf gtf, data:annotation gff, data:genome:fasta genome, basic:integer threads)[Source: v1.4.0]

Cufflinks includes a script called Cuffmerge that you can use to merge together several Cufflinks assemblies. It also handles running Cuffcompare for you, and automatically filters a number of transfrags that are probably artfifacts. The main purpose of Cuffmerge is to make it easier to make an assembly GTF file suitable for use with Cuffdiff. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffmerge/) for more information.

expressions

label:	Cufflinks transcripts (GTF)
type:	`list:data:cufflinks:cufflinks`
required:	False

gtf

label:	Annotation files (GTF)
type:	`list:data:annotation:gtf`
description:	Annotation files you wish to merge together with Cufflinks produced annotation files (e.g. upload Cufflinks annotation GTF file)
required:	False

gff

label:	Reference annotation (GTF/GFF3)
type:	`data:annotation`
description:	An optional “reference” annotation GTF. The input assemblies are merged together with the reference GTF and included in the final output.
required:	False

genome

label:	Reference genome
type:	`data:genome:fasta`
description:	This argument should point to the genomic DNA sequences for the reference. If a directory, it should contain one fasta file per contig. If a multifasta file, all contigs should be present. The merge script will pass this option to cuffcompare, which will use the sequences to assist in classifying transfrags and excluding artifacts (e.g. repeats). For example, Cufflinks transcripts consisting mostly of lower-case bases are classified as repeats. Note that <seq_dir> must contain one fasta file per reference chromosome, and each file must be named after the chromosome, and have a .fa or .fasta extension
required:	False

threads

label:	Use this many processor threads
type:	`basic:integer`
description:	Use this many threads to align reads. The default is 1.
default:	`1`

annot

label:	Merged GTF file
type:	`basic:file`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Cuffnorm¶

data:cuffnormcuffnorm (list:data:cufflinks:cuffquant cuffquant, data:annotation annotation, basic:boolean useERCC)[Source: v2.2.0]

Cufflinks includes a program, Cuffnorm, that you can use to generate tables of expression values that are properly normalized for library size. Cuffnorm takes a GTF2/GFF3 file of transcripts as input, along with two or more SAM, BAM, or CXB files for two or more samples. See [here](http://cole-trapnell-lab.github.io/cufflinks/cuffnorm/) for more information. Replicate relation needs to be defined for Cuffnorm to account for replicates. If the replicate relation is not defined, each sample will be treated individually.

cuffquant

label:	Cuffquant expression file
type:	`list:data:cufflinks:cuffquant`

annotation

label:	Annotation (GTF/GFF3)
type:	`data:annotation`
description:	A transcript annotation file produced by cufflinks, cuffcompare, or other source.

useERCC

label:	ERCC spike-in normalization
type:	`basic:boolean`
description:	Use ERRCC spike-in controls for normalization.
default:	`False`

genes_count

label:	Genes count
type:	`basic:file`

genes_fpkm

label:	Genes FPKM
type:	`basic:file`

genes_attr

label:	Genes attr table
type:	`basic:file`

isoform_count

label:	Isoform count
type:	`basic:file`

isoform_fpkm

label:	Isoform FPKM
type:	`basic:file`

isoform_attr

label:	Isoform attr table
type:	`basic:file`

cds_count

label:	CDS count
type:	`basic:file`

cds_fpkm

label:	CDS FPKM
type:	`basic:file`

cds_attr

label:	CDS attr table
type:	`basic:file`

tss_groups_count

label:	TSS groups count
type:	`basic:file`

tss_groups_fpkm

label:	TSS groups FPKM
type:	`basic:file`

tss_attr

label:	TSS attr table
type:	`basic:file`

run_info

label:	Run info
type:	`basic:file`

raw_scatter

label:	FPKM exp scatter plot
type:	`basic:file`

boxplot

label:	Boxplot
type:	`basic:file`

fpkm_exp_raw

label:	FPKM exp raw
type:	`basic:file`

replicate_correlations

label:	Replicate correlatios plot
type:	`basic:file`

fpkm_means

label:	FPKM means
type:	`basic:file`

exp_fpkm_means

label:	Exp FPKM means
type:	`basic:file`

norm_scatter

label:	FKPM exp scatter normalized plot
type:	`basic:file`
required:	False

fpkm_exp_norm

label:	FPKM exp normalized
type:	`basic:file`
required:	False

spike_raw

label:	Spike raw
type:	`basic:file`
required:	False

spike_norm

label:	Spike normalized
type:	`basic:file`
required:	False

R_data

label:	All R normalization data
type:	`basic:file`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Cuffquant 2.2¶

data:cufflinks:cuffquantcuffquant (data:alignment:bam alignment, data:annotation annotation, data:genome:fasta genome, data:annotation:gtf mask_file, basic:string library_type, basic:boolean multi_read_correct)[Source: v1.4.0]

Cuffquant allows you to compute the gene and transcript expression profiles and save these profiles to files that you can analyze later with Cuffdiff or Cuffnorm. See [here](http://cole-trapnell-lab.github.io/cufflinks/manual/) for more information.

alignment

label:	Aligned reads
type:	`data:alignment:bam`

annotation

label:	Annotation (GTF/GFF3)
type:	`data:annotation`

genome

label:	Run bias detection and correction algorithm
type:	`data:genome:fasta`
description:	Provide Cufflinks with a multifasta file (genome file) via this option to instruct it to run a bias detection and correction algorithm which can significantly improve accuracy of transcript abundance estimates.
required:	False

mask_file

label:	Mask file
type:	`data:annotation:gtf`
description:	Ignore all reads that could have come from transcripts in this GTF file. We recommend including any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.
required:	False

library_type

label:	Library type
type:	`basic:string`
description:	In cases where Cufflinks cannot determine the platform and protocol used to generate input reads, you can supply this information manually, which will allow Cufflinks to infer source strand information with certain protocols. The available options are listed below. For paired-end data, we currently only support protocols where reads are point towards each other: fr-unstranded - Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand; fr-firststrand - Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced; fr-secondstrand - Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
default:	`fr-unstranded`
choices:	fr-unstranded: `fr-unstranded` fr-firststrand: `fr-firststrand` fr-secondstrand: `fr-secondstrand`

multi_read_correct

label:	Do initial estimation procedure to more accurately weight reads with multiple genome mappings
type:	`basic:boolean`
description:	Run an initial estimation procedure that weights reads mapping to multiple locations more accurately.
default:	`False`

cxb

label:	Abundances (.cxb)
type:	`basic:file`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Cuffquant results¶

data:cufflinks:cuffquantupload-cxb (basic:file src, basic:string source, basic:string species, basic:string build, basic:string feature_type)[Source: v1.2.1]

Upload Cuffquant results file (.cxb)

src

label:	Cuffquant file
type:	`basic:file`
description:	Upload Cuffquant results file. Supported extention: *.cxb
required:	True
validate_regex:	`\.(cxb)$`

source

label:	Gene ID database
type:	`basic:string`
choices:	AFFY: `AFFY` DICTYBASE: `DICTYBASE` ENSEMBL: `ENSEMBL` NCBI: `NCBI` UCSC: `UCSC`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`
default:	`gene`
choices:	gene: `gene` transcript: `transcript` exon: `exon`

cxb

label:	Cuffquant results
type:	`basic:file`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

Custom master file¶

data:masterfile:ampliconupload-master-file (basic:file src, basic:string panel_name)[Source: v1.1.1]

This should be a tab delimited file (*.txt). Please check the [example](http://genial.is/amplicon-masterfile) file for details.

src

label:	Master file
type:	`basic:file`
validate_regex:	`\.txt(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z)$`

panel_name

label:	Panel name
type:	`basic:string`

master_file

label:	Master file
type:	`basic:file`

bedfile

label:	BED file (merged targets)
type:	`basic:file`

nomergebed

label:	BED file (nonmerged targets)
type:	`basic:file`

olapfreebed

label:	BED file (overlap-free targets)
type:	`basic:file`

primers

label:	Primers
type:	`basic:file`

panel_name

label:	Panel name
type:	`basic:string`

Cutadapt (Diagenode CATS, paired-end)¶

data:reads:fastq:paired:cutadaptcutadapt-custom-paired (data:reads:fastq:paired reads)[Source: v1.2.0]

Cutadapt process configured to be used with the Diagenode CATS kits.

reads

label:	NGS reads
type:	`data:reads:fastq:paired`

fastq

label:	Reads file (forward)
type:	`list:basic:file`

fastq2

label:	Reads file (reverse)
type:	`list:basic:file`

report

label:	Cutadapt report
type:	`basic:file`

fastqc_url

label:	Quality control with FastQC (forward)
type:	`list:basic:file:html`

fastqc_url2

label:	Quality control with FastQC (reverse)
type:	`list:basic:file:html`

fastqc_archive

label:	Download FastQC archive (forward)
type:	`list:basic:file`

fastqc_archive2

label:	Download FastQC archive (reverse)
type:	`list:basic:file`

Cutadapt (Diagenode CATS, single-end)¶

data:reads:fastq:single:cutadaptcutadapt-custom-single (data:reads:fastq:single reads)[Source: v1.2.0]

Cutadapt process configured to be used with the Diagenode CATS kits.

reads

label:	NGS reads
type:	`data:reads:fastq:single`

fastq

label:	Reads file
type:	`list:basic:file`

report

label:	Cutadapt report
type:	`basic:file`

fastqc_url

label:	Quality control with FastQC
type:	`list:basic:file:html`

fastqc_archive

label:	Download FastQC archive
type:	`list:basic:file`

Cutadapt (paired-end)¶

data:reads:fastq:paired:cutadaptcutadapt-paired (data:reads:fastq:paired reads, data:seq:nucleotide mate1_5prime_file, data:seq:nucleotide mate1_3prime_file, data:seq:nucleotide mate2_5prime_file, data:seq:nucleotide mate2_3prime_file, list:basic:string mate1_5prime_seq, list:basic:string mate1_3prime_seq, list:basic:string mate2_5prime_seq, list:basic:string mate2_3prime_seq, basic:integer times, basic:decimal error_rate, basic:integer min_overlap, basic:boolean match_read_wildcards, basic:integer nextseq_trim, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:integer max_n, basic:string pair_filter)[Source: v2.3.0]

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. More information about Cutadapt can be found [here](http://cutadapt.readthedocs.io/en/stable/).

reads

label:	Select sample(s)
type:	`data:reads:fastq:paired`

adapters.mate1_5prime_file

label:	5 prime adapter file for Mate 1
type:	`data:seq:nucleotide`
required:	False

adapters.mate1_3prime_file

label:	3 prime adapter file for Mate 1
type:	`data:seq:nucleotide`
required:	False

adapters.mate2_5prime_file

label:	5 prime adapter file for Mate 2
type:	`data:seq:nucleotide`
required:	False

adapters.mate2_3prime_file

label:	3 prime adapter file for Mate 2
type:	`data:seq:nucleotide`
required:	False

adapters.mate1_5prime_seq

label:	5 prime adapter sequence for Mate 1
type:	`list:basic:string`
required:	False

adapters.mate1_3prime_seq

label:	3 prime adapter sequence for Mate 1
type:	`list:basic:string`
required:	False

adapters.mate2_5prime_seq

label:	5 prime adapter sequence for Mate 2
type:	`list:basic:string`
required:	False

adapters.mate2_3prime_seq

label:	3 prime adapter sequence for Mate 2
type:	`list:basic:string`
required:	False

adapters.times

label:	Times
type:	`basic:integer`
description:	Remove up to COUNT adapters from each read.
default:	`1`

adapters.error_rate

label:	Error rate
type:	`basic:decimal`
description:	Maximum allowed error rate (no. of errors divided by the length of the matching region).
default:	`0.1`

adapters.min_overlap

label:	Minimal overlap
type:	`basic:integer`
description:	Minimum overlap for an adapter match.
default:	`3`

adapters.match_read_wildcards

label:	Match read wildcards
type:	`basic:boolean`
description:	Interpret IUPAC wildcards in reads.
default:	`False`

modify_reads.nextseq_trim

label:	NextSeq-specific quality trimming
type:	`basic:integer`
description:	NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of regular (-g) quality trimming.
required:	False

modify_reads.leading

label:	Quality on 5 prime
type:	`basic:integer`
description:	Remove low quality bases from 5 prime. Specifies the minimum quality required to keep a base.
required:	False

modify_reads.trailing

label:	Quality on 3 prime
type:	`basic:integer`
description:	Remove low quality bases from the 3 prime. Specifies the minimum quality required to keep a base.
required:	False

modify_reads.crop

label:	Crop
type:	`basic:integer`
description:	Cut the specified number of bases from the end of the reads.
required:	False

modify_reads.headcrop

label:	Headcrop
type:	`basic:integer`
description:	Cut the specified number of bases from the start of the reads.
required:	False

filtering.minlen

label:	Min length
type:	`basic:integer`
description:	Drop the read if it is below a specified.
required:	False

filtering.max_n

label:	Max numebr of N-s
type:	`basic:integer`
description:	Discard reads having more ‘N’ bases than specified.
required:	False

filtering.pair_filter

label:	Which of the reads have to match the filtering criterion
type:	`basic:string`
description:	Which of the reads in a paired-end read have to match the filtering criterion in order for the pair to be filtered.
default:	`any`
choices:	Any of the reads in a paired-end read have to match the filtering criterion: `any` Both of the reads in a paired-end read have to match the filtering criterion: `both`

fastq

label:	Reads file (forward)
type:	`list:basic:file`

fastq2

label:	Reads file (reverse)
type:	`list:basic:file`

report

label:	Cutadapt report
type:	`basic:file`

fastqc_url

label:	Quality control with FastQC (forward)
type:	`list:basic:file:html`

fastqc_url2

label:	Quality control with FastQC (reverse)
type:	`list:basic:file:html`

fastqc_archive

label:	Download FastQC archive (forward)
type:	`list:basic:file`

fastqc_archive2

label:	Download FastQC archive (reverse)
type:	`list:basic:file`

Cutadapt (single-end)¶

data:reads:fastq:single:cutadaptcutadapt-single (data:reads:fastq:single reads, data:seq:nucleotide up_primers_file, data:seq:nucleotide down_primers_file, list:basic:string up_primers_seq, list:basic:string down_primers_seq, basic:integer polya_tail, basic:integer min_overlap, basic:integer nextseq_trim, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:integer max_n, basic:boolean match_read_wildcards, basic:integer times, basic:decimal error_rate)[Source: v2.1.0]

Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. More information about Cutadapt can be found [here](http://cutadapt.readthedocs.io/en/stable/).

reads

label:	Select sample(s)
type:	`data:reads:fastq:single`

adapters.up_primers_file

label:	5 prime adapter file
type:	`data:seq:nucleotide`
required:	False

adapters.down_primers_file

label:	3 prime adapter file
type:	`data:seq:nucleotide`
required:	False

adapters.up_primers_seq

label:	5 prime adapter sequence
type:	`list:basic:string`
required:	False

adapters.down_primers_seq

label:	3 prime adapter sequence
type:	`list:basic:string`
required:	False

adapters.polya_tail

label:	Poly-A tail
type:	`basic:integer`
description:	Length of poly-A tail, example - AAAN -> 3, AAAAAN -> 5
required:	False

adapters.min_overlap

label:	Minimal overlap
type:	`basic:integer`
description:	Minimum overlap for an adapter match
default:	`3`

modify_reads.nextseq_trim

label:	NextSeq-specific quality trimming
type:	`basic:integer`
description:	NextSeq-specific quality trimming (each read). Trims also dark cycles appearing as high-quality G bases. This option is mutually exclusive with the use of regular (-g) quality trimming.
required:	False

modify_reads.leading

label:	Quality on 5 prime
type:	`basic:integer`
description:	Remove low quality bases from 5 prime. Specifies the minimum quality required to keep a base. This option is mutually exclusive with the use of NextSeq-specific quality trimming.
required:	False

modify_reads.trailing

label:	Quality on 3 prime
type:	`basic:integer`
description:	Remove low quality bases from the 3 prime. Specifies the minimum quality required to keep a base. This option is mutually exclusive with the use of NextSeq-specific quality trimming.
required:	False

modify_reads.crop

label:	Crop
type:	`basic:integer`
description:	Cut the read to a specified length by removing bases from the end
required:	False

modify_reads.headcrop

label:	Headcrop
type:	`basic:integer`
description:	Cut the specified number of bases from the start of the read
required:	False

filtering.minlen

label:	Min length
type:	`basic:integer`
description:	Drop the read if it is below a specified length
required:	False

filtering.max_n

label:	Max numebr of N-s
type:	`basic:integer`
description:	Discard reads having more ‘N’ bases than specified.
required:	False

filtering.match_read_wildcards

label:	Match read wildcards
type:	`basic:boolean`
description:	Interpret IUPAC wildcards in reads.
required:	False
default:	`False`

filtering.times

label:	Times
type:	`basic:integer`
description:	Remove up to COUNT adapters from each read.
default:	`1`

filtering.error_rate

label:	Error rate
type:	`basic:decimal`
description:	Maximum allowed error rate (no. of errors divided by the length of the matching region).
default:	`0.1`

fastq

label:	Reads file
type:	`list:basic:file`

report

label:	Cutadapt report
type:	`basic:file`

fastqc_url

label:	Quality control with FastQC
type:	`list:basic:file:html`

fastqc_archive

label:	Download FastQC archive
type:	`list:basic:file`

Cutadapt - STAR - FeatureCounts (3’ mRNA-Seq, single-end)¶

data:workflow:quant:featurecounts:singleworkflow-cutadapt-star-fc-quant-single (data:reads:fastq:single reads, data:genomeindex:star star_index, data:annotation annotation, data:genomeindex:star rrna_reference, data:genomeindex:star globin_reference, basic:boolean show_advanced, basic:integer quality_cutoff, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v1.0.1]

This 3’ mRNA-Seq pipeline is comprised of QC, preprocessing, alignment and quantification steps. Reads are preprocessed by __Cutadapt__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Preprocessed reads are aligned by __STAR__ aligner. For read-count quantification, the __FeatureCounts__ tool is used. QoRTs QC and Samtools idxstats tools are used to report alignment QC metrics. Additional QC steps operate on downsampled reads and include an alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate.

reads

label:	Select sample(s)
type:	`data:reads:fastq:single`

star_index

label:	Genome
type:	`data:genomeindex:star`
description:	Genome index prepared by STAR aligner indexing tool.

annotation

label:	Annotation
type:	`data:annotation`
description:	Genome annotation file (GTF).

rrna_reference

label:	Indexed rRNA reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

globin_reference

label:	Indexed Globin reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

show_advanced

label:	Show advanced parameters
type:	`basic:boolean`
default:	`False`

cutadapt.quality_cutoff

label:	Reads quality cutoff
type:	`basic:integer`
description:	Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq-specific trim option.
required:	False

downsampling.n_reads

label:	Number of reads
type:	`basic:integer`
default:	`1000000`

downsampling.seed

label:	Seed
type:	`basic:integer`
default:	`11`

downsampling.fraction

label:	Fraction
type:	`basic:decimal`
description:	Use the fraction of reads in range [0.0, 1.0] from the original input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
required:	False

downsampling.two_pass

label:	2-pass mode
type:	`basic:boolean`
description:	Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
default:	`False`

Cutadapt - STAR - FeatureCounts - basic QC (3’ mRNA-Seq, single-end)¶

data:workflow:quant:featurecounts:singleworkflow-cutadapt-star-fc-quant-wo-depletion-single (data:reads:fastq:single reads, data:genomeindex:star star_index, data:annotation annotation, basic:boolean show_advanced, basic:integer quality_cutoff)[Source: v1.0.0]

This 3’ mRNA-Seq pipeline is comprised of QC, preprocessing, alignment and quantification steps. Reads are preprocessed by __Cutadapt__ which removes adapters, trims reads for quality from the 3’-end, and discards reads that are too short after trimming. Preprocessed reads are aligned by __STAR__ aligner. For read-count quantification, the __FeatureCounts__ tool is used. QoRTs QC and Samtools idxstats tools are used to report alignment QC metrics.

reads

label:	Select sample(s)
type:	`data:reads:fastq:single`

star_index

label:	Genome
type:	`data:genomeindex:star`
description:	Genome index prepared by STAR aligner indexing tool.

annotation

label:	Annotation
type:	`data:annotation`
description:	Genome annotation file (GTF).

show_advanced

label:	Show advanced parameters
type:	`basic:boolean`
default:	`False`

cutadapt.quality_cutoff

label:	Reads quality cutoff
type:	`basic:integer`
description:	Trim low-quality bases from 3’ end of each read before adapter removal. The use of this option will override the use of NextSeq/NovaSeq-specific trim option.
required:	False

Cutadapt - STAR - HTSeq-count (paired-end)¶

data:workflow:rnaseq:htseqworkflow-custom-cutadapt-star-htseq-paired (data:reads:fastq:paired reads, data:genomeindex:star genome, data:annotation:gtf gff, basic:string stranded, basic:boolean advanced, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chimSegmentMin, basic:boolean quantmode, basic:boolean singleend, basic:boolean gene_counts, basic:string outFilterType, basic:integer outFilterMultimapNmax, basic:integer outFilterMismatchNmax, basic:decimal outFilterMismatchNoverLmax, basic:integer alignSJoverhangMin, basic:integer alignSJDBoverhangMin, basic:integer alignIntronMin, basic:integer alignIntronMax, basic:integer alignMatesGapMax, basic:string mode, basic:string feature_class, basic:string id_attribute, basic:boolean name_ordered)[Source: v1.0.1]

This RNA-seq pipeline is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by __cutadapt__ which finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __HTSeq-count__. Compared to featureCounts, HTSeq-count is not as computationally efficient. All three tools in this workflow support parallelization to accelerate the analysis.

reads

label:	NGS reads
type:	`data:reads:fastq:paired`

genome

label:	Indexed reference genome
type:	`data:genomeindex:star`
description:	Genome index prepared by STAR aligner indexing tool

gff

label:	Annotation (GFF)
type:	`data:annotation:gtf`

stranded

label:	Assay type
type:	`basic:string`
description:	In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
default:	`no`
choices:	Strand non-specific: `no` Strand-specific forward: `yes` Strand-specific reverse: `reverse`

advanced

label:	Advanced
type:	`basic:boolean`
default:	`False`

star.noncannonical

label:	Remove non-cannonical junctions (Cufflinks compatibility)
type:	`basic:boolean`
description:	It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
default:	`False`

star.detect_chimeric.chimeric

label:	Detect chimeric and circular alignments
type:	`basic:boolean`
description:	To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
default:	`False`

star.detect_chimeric.chimSegmentMin

label:	–chimSegmentMin
type:	`basic:integer`
disabled:	!star.detect_chimeric.chimeric
default:	`20`

star.t_coordinates.quantmode

label:	Output in transcript coordinates
type:	`basic:boolean`
description:	With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
default:	`False`

star.t_coordinates.singleend

label:	Allow soft-clipping and indels
type:	`basic:boolean`
description:	By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).
disabled:	!star.t_coordinates.quantmode
default:	`False`

star.t_coordinates.gene_counts

label:	Count reads
type:	`basic:boolean`
description:	With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).
disabled:	!star.t_coordinates.quantmode
default:	`False`

star.filtering.outFilterType

label:	Type of filtering
type:	`basic:string`
description:	Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab
default:	`Normal`
choices:	Normal: `Normal` BySJout: `BySJout`

star.filtering.outFilterMultimapNmax

label:	–outFilterMultimapNmax
type:	`basic:integer`
description:	Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).
required:	False

star.filtering.outFilterMismatchNmax

label:	–outFilterMismatchNmax
type:	`basic:integer`
description:	Alignment will be output only if it has fewer mismatches than this value (default: 10).
required:	False

star.filtering.outFilterMismatchNoverLmax

label:	–outFilterMismatchNoverLmax
type:	`basic:decimal`
description:	Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.
required:	False

star.alignment.alignSJoverhangMin

label:	–alignSJoverhangMin
type:	`basic:integer`
description:	Minimum overhang (i.e. block size) for spliced alignments (default: 5).
required:	False

star.alignment.alignSJDBoverhangMin

label:	–alignSJDBoverhangMin
type:	`basic:integer`
description:	Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
required:	False

star.alignment.alignIntronMin

label:	–alignIntronMin
type:	`basic:integer`
description:	Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
required:	False

star.alignment.alignIntronMax

label:	–alignIntronMax
type:	`basic:integer`
description:	Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:	False

star.alignment.alignMatesGapMax

label:	–alignMatesGapMax
type:	`basic:integer`
description:	Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:	False

htseq.mode

label:	Mode
type:	`basic:string`
description:	Mode to handle reads overlapping more than one feature. Possible values for <mode> are union, intersection-strict and intersection-nonempty
default:	`union`
choices:	union: `union` intersection-strict: `intersection-strict` intersection-nonempty: `intersection-nonempty`

htseq.feature_class

label:	Feature class
type:	`basic:string`
description:	Feature class (3rd column in GFF file) to be used. All other features will be ignored.
default:	`exon`

htseq.id_attribute

label:	ID attribute
type:	`basic:string`
description:	GFF attribute to be used as feature ID. Several GFF lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identity the counts in the output table.
default:	`gene_id`

htseq.name_ordered

label:	Use name-ordered BAM file for counting reads
type:	`basic:boolean`
description:	Use name-sorted BAM file for reads quantification. Improves compatibility with larger BAM files, but requires more computational time.
required:	False
default:	`False`

Cutadapt - STAR - HTSeq-count (single-end)¶

data:workflow:rnaseq:htseqworkflow-custom-cutadapt-star-htseq-single (data:reads:fastq:single reads, data:genomeindex:star genome, data:annotation:gtf gff, basic:string stranded, basic:boolean advanced, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chimSegmentMin, basic:boolean quantmode, basic:boolean singleend, basic:boolean gene_counts, basic:string outFilterType, basic:integer outFilterMultimapNmax, basic:integer outFilterMismatchNmax, basic:decimal outFilterMismatchNoverLmax, basic:integer alignSJoverhangMin, basic:integer alignSJDBoverhangMin, basic:integer alignIntronMin, basic:integer alignIntronMax, basic:integer alignMatesGapMax, basic:string mode, basic:string feature_class, basic:string id_attribute, basic:boolean name_ordered)[Source: v1.0.1]

This RNA-seq pipeline is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by __cutadapt__ which finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. Next, preprocessed reads are aligned by __STAR__ aligner. At the time of implementation, STAR is considered a state-of-the-art tool that consistently produces accurate results from diverse sets of reads, and performs well even with default settings. For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __HTSeq-count__. Compared to featureCounts, HTSeq-count is not as computationally efficient. All three tools in this workflow support parallelization to accelerate the analysis.

reads

label:	NGS reads
type:	`data:reads:fastq:single`

genome

label:	Indexed reference genome
type:	`data:genomeindex:star`
description:	Genome index prepared by STAR aligner indexing tool

gff

label:	Annotation (GFF)
type:	`data:annotation:gtf`

stranded

label:	Assay type
type:	`basic:string`
description:	In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
default:	`no`
choices:	Strand non-specific: `no` Strand-specific forward: `yes` Strand-specific reverse: `reverse`

advanced

label:	Advanced
type:	`basic:boolean`
default:	`False`

star.noncannonical

label:	Remove non-cannonical junctions (Cufflinks compatibility)
type:	`basic:boolean`
description:	It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
default:	`False`

star.detect_chimeric.chimeric

label:	Detect chimeric and circular alignments
type:	`basic:boolean`
description:	To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
default:	`False`

star.detect_chimeric.chimSegmentMin

label:	–chimSegmentMin
type:	`basic:integer`
disabled:	!star.detect_chimeric.chimeric
default:	`20`

star.t_coordinates.quantmode

label:	Output in transcript coordinates
type:	`basic:boolean`
description:	With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
default:	`False`

star.t_coordinates.singleend

label:	Allow soft-clipping and indels
type:	`basic:boolean`
description:	By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).
disabled:	!star.t_coordinates.quantmode
default:	`False`

star.t_coordinates.gene_counts

label:	Count reads
type:	`basic:boolean`
description:	With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).
disabled:	!star.t_coordinates.quantmode
default:	`False`

star.filtering.outFilterType

label:	Type of filtering
type:	`basic:string`
description:	Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab
default:	`Normal`
choices:	Normal: `Normal` BySJout: `BySJout`

star.filtering.outFilterMultimapNmax

label:	–outFilterMultimapNmax
type:	`basic:integer`
description:	Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).
required:	False

star.filtering.outFilterMismatchNmax

label:	–outFilterMismatchNmax
type:	`basic:integer`
description:	Alignment will be output only if it has fewer mismatches than this value (default: 10).
required:	False

star.filtering.outFilterMismatchNoverLmax

label:	–outFilterMismatchNoverLmax
type:	`basic:decimal`
description:	Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.
required:	False

star.alignment.alignSJoverhangMin

label:	–alignSJoverhangMin
type:	`basic:integer`
description:	Minimum overhang (i.e. block size) for spliced alignments (default: 5).
required:	False

star.alignment.alignSJDBoverhangMin

label:	–alignSJDBoverhangMin
type:	`basic:integer`
description:	Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
required:	False

star.alignment.alignIntronMin

label:	–alignIntronMin
type:	`basic:integer`
description:	Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
required:	False

star.alignment.alignIntronMax

label:	–alignIntronMax
type:	`basic:integer`
description:	Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:	False

star.alignment.alignMatesGapMax

label:	–alignMatesGapMax
type:	`basic:integer`
description:	Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:	False

htseq.mode

label:	Mode
type:	`basic:string`
description:	Mode to handle reads overlapping more than one feature. Possible values for <mode> are union, intersection-strict and intersection-nonempty
default:	`union`
choices:	union: `union` intersection-strict: `intersection-strict` intersection-nonempty: `intersection-nonempty`

htseq.feature_class

label:	Feature class
type:	`basic:string`
description:	Feature class (3rd column in GFF file) to be used. All other features will be ignored.
default:	`exon`

htseq.id_attribute

label:	ID attribute
type:	`basic:string`
description:	GFF attribute to be used as feature ID. Several GFF lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identity the counts in the output table.
default:	`gene_id`

htseq.name_ordered

label:	Use name-ordered BAM file for counting reads
type:	`basic:boolean`
description:	Use name-sorted BAM file for reads quantification. Improves compatibility with larger BAM files, but requires more computational time.
required:	False
default:	`False`

Cutadapt - STAR - RSEM (Diagenode CATS, paired-end)¶

data:workflow:rnaseq:rsemworkflow-custom-cutadapt-star-rsem-paired (data:reads:fastq:paired reads, data:genomeindex:star star_index, data:index:expression expression_index, basic:string stranded, basic:boolean advanced, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chimSegmentMin, basic:boolean quantmode, basic:boolean singleend, basic:boolean gene_counts, basic:string outFilterType, basic:integer outFilterMultimapNmax, basic:integer outFilterMismatchNmax, basic:decimal outFilterMismatchNoverLmax, basic:integer alignSJoverhangMin, basic:integer alignSJDBoverhangMin, basic:integer alignIntronMin, basic:integer alignIntronMax, basic:integer alignMatesGapMax)[Source: v1.0.2]

This RNA-seq pipeline is configured to be used with the Diagenode CATS RNA-seq kits. It is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by cutadapt which finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. Next, preprocessed reads are aligned by STAR aligner. Finally, RSEM estimates gene and isoform expression levels from the aligned reads.

reads

label:	NGS reads
type:	`data:reads:fastq:paired`

star_index

label:	STAR genome index
type:	`data:genomeindex:star`

expression_index

label:	Gene expression indices
type:	`data:index:expression`

stranded

label:	Assay type
type:	`basic:string`
description:	In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
default:	`no`
choices:	Strand non-specific: `no` Strand-specific forward: `yes` Strand-specific reverse: `reverse`

advanced

label:	Advanced
type:	`basic:boolean`
default:	`False`

star.noncannonical

label:	Remove non-cannonical junctions (Cufflinks compatibility)
type:	`basic:boolean`
description:	It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
default:	`False`

star.detect_chimeric.chimeric

label:	Detect chimeric and circular alignments
type:	`basic:boolean`
description:	To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
default:	`False`

star.detect_chimeric.chimSegmentMin

label:	–chimSegmentMin
type:	`basic:integer`
disabled:	!star.detect_chimeric.chimeric
default:	`20`

star.t_coordinates.quantmode

label:	Output in transcript coordinates
type:	`basic:boolean`
description:	With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
default:	`True`

star.t_coordinates.singleend

label:	Allow soft-clipping and indels
type:	`basic:boolean`
description:	By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).
disabled:	!star.t_coordinates.quantmode
default:	`False`

star.t_coordinates.gene_counts

label:	Count reads
type:	`basic:boolean`
description:	With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).
disabled:	!star.t_coordinates.quantmode
default:	`False`

star.filtering.outFilterType

label:	Type of filtering
type:	`basic:string`
description:	Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab
default:	`Normal`
choices:	Normal: `Normal` BySJout: `BySJout`

star.filtering.outFilterMultimapNmax

label:	–outFilterMultimapNmax
type:	`basic:integer`
description:	Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).
required:	False

star.filtering.outFilterMismatchNmax

label:	–outFilterMismatchNmax
type:	`basic:integer`
description:	Alignment will be output only if it has fewer mismatches than this value (default: 10).
required:	False

star.filtering.outFilterMismatchNoverLmax

label:	–outFilterMismatchNoverLmax
type:	`basic:decimal`
description:	Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.
required:	False

star.alignment.alignSJoverhangMin

label:	–alignSJoverhangMin
type:	`basic:integer`
description:	Minimum overhang (i.e. block size) for spliced alignments (default: 5).
required:	False

star.alignment.alignSJDBoverhangMin

label:	–alignSJDBoverhangMin
type:	`basic:integer`
description:	Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
required:	False

star.alignment.alignIntronMin

label:	–alignIntronMin
type:	`basic:integer`
description:	Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
required:	False

star.alignment.alignIntronMax

label:	–alignIntronMax
type:	`basic:integer`
description:	Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:	False

star.alignment.alignMatesGapMax

label:	–alignMatesGapMax
type:	`basic:integer`
description:	Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:	False

Cutadapt - STAR - RSEM (Diagenode CATS, single-end)¶

data:workflow:rnaseq:rsemworkflow-custom-cutadapt-star-rsem-single (data:reads:fastq:single reads, data:genomeindex:star star_index, data:index:expression expression_index, basic:string stranded, basic:boolean advanced, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chimSegmentMin, basic:boolean quantmode, basic:boolean singleend, basic:boolean gene_counts, basic:string outFilterType, basic:integer outFilterMultimapNmax, basic:integer outFilterMismatchNmax, basic:decimal outFilterMismatchNoverLmax, basic:integer alignSJoverhangMin, basic:integer alignSJDBoverhangMin, basic:integer alignIntronMin, basic:integer alignIntronMax, basic:integer alignMatesGapMax)[Source: v1.0.2]

This RNA-seq pipeline is configured to be used with the Diagenode CATS RNA-seq kits. It is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by cutadapt which finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from high-throughput sequencing reads. Next, preprocessed reads are aligned by STAR aligner. Finally, RSEM estimates gene and isoform expression levels from the aligned reads.

reads

label:	NGS reads
type:	`data:reads:fastq:single`

star_index

label:	STAR genome index
type:	`data:genomeindex:star`

expression_index

label:	Gene expression indices
type:	`data:index:expression`

stranded

label:	Assay type
type:	`basic:string`
description:	In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
default:	`no`
choices:	Strand non-specific: `no` Strand-specific forward: `yes` Strand-specific reverse: `reverse`

advanced

label:	Advanced
type:	`basic:boolean`
default:	`False`

star.noncannonical

label:	Remove non-cannonical junctions (Cufflinks compatibility)
type:	`basic:boolean`
description:	It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
default:	`False`

star.detect_chimeric.chimeric

label:	Detect chimeric and circular alignments
type:	`basic:boolean`
description:	To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
default:	`False`

star.detect_chimeric.chimSegmentMin

label:	–chimSegmentMin
type:	`basic:integer`
disabled:	!star.detect_chimeric.chimeric
default:	`20`

star.t_coordinates.quantmode

label:	Output in transcript coordinates
type:	`basic:boolean`
description:	With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
default:	`True`

star.t_coordinates.singleend

label:	Allow soft-clipping and indels
type:	`basic:boolean`
description:	By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).
disabled:	!star.t_coordinates.quantmode
default:	`False`

star.t_coordinates.gene_counts

label:	Count reads
type:	`basic:boolean`
description:	With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).
disabled:	!star.t_coordinates.quantmode
default:	`False`

star.filtering.outFilterType

label:	Type of filtering
type:	`basic:string`
description:	Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab
default:	`Normal`
choices:	Normal: `Normal` BySJout: `BySJout`

star.filtering.outFilterMultimapNmax

label:	–outFilterMultimapNmax
type:	`basic:integer`
description:	Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).
required:	False

star.filtering.outFilterMismatchNmax

label:	–outFilterMismatchNmax
type:	`basic:integer`
description:	Alignment will be output only if it has fewer mismatches than this value (default: 10).
required:	False

star.filtering.outFilterMismatchNoverLmax

label:	–outFilterMismatchNoverLmax
type:	`basic:decimal`
description:	Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.
required:	False

star.alignment.alignSJoverhangMin

label:	–alignSJoverhangMin
type:	`basic:integer`
description:	Minimum overhang (i.e. block size) for spliced alignments (default: 5).
required:	False

star.alignment.alignSJDBoverhangMin

label:	–alignSJDBoverhangMin
type:	`basic:integer`
description:	Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
required:	False

star.alignment.alignIntronMin

label:	–alignIntronMin
type:	`basic:integer`
description:	Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
required:	False

star.alignment.alignIntronMax

label:	–alignIntronMax
type:	`basic:integer`
description:	Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:	False

star.alignment.alignMatesGapMax

label:	–alignMatesGapMax
type:	`basic:integer`
description:	Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:	False

Cutadapt - STAR - StringTie (Corall, paired-end)¶

data:workflow:rnaseq:corallworkflow-corall-paired (data:reads:fastq:paired reads, data:genomeindex:star star_index, data:annotation annotation, data:genomeindex:star rrna_reference, data:genomeindex:star globin_reference, basic:boolean show_advanced, basic:integer quality_cutoff, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v1.0.1]

RNA-seq pipeline optimized for the Lexogen Corall Total RNA-Seq Library Prep Kit. UMI-sequences are extracted from the raw reads before the reads are trimmed and quality filtered using Cutadapt. Preprocessed reads are aligned by the STAR aligner and de-duplicated using UMI-tools. Gene abundance estimates are reported by the StringTie tool. QC operates on downsampled reads and includes alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate. The analysis results and QC reports are summarized by the MultiQC.

reads

label:	Select sample(s)
type:	`data:reads:fastq:paired`

star_index

label:	Genome
type:	`data:genomeindex:star`
description:	Genome index prepared by STAR aligner indexing tool.

annotation

label:	Annotation
type:	`data:annotation`
description:	Genome annotation file (GTF).

rrna_reference

label:	Indexed rRNA reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

globin_reference

label:	Indexed Globin reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

show_advanced

label:	Show advanced parameters
type:	`basic:boolean`
default:	`False`

cutadapt.quality_cutoff

label:	Reads quality cutoff
type:	`basic:integer`
description:	Trim low-quality bases from 3’ end of each read before adapter removal. Use this option when processing the data generated by older Illumina machines. The use of this option will override the NextSeq/NovaSeq-specific trimming procedure which is enabled by default and is recommended for Illumina machines that utilize 2-color chemistry to encode the four bases.
required:	False

downsampling.n_reads

label:	Number of reads
type:	`basic:integer`
default:	`1000000`

downsampling.seed

label:	Seed
type:	`basic:integer`
default:	`11`

downsampling.fraction

label:	Fraction
type:	`basic:decimal`
description:	Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
required:	False

downsampling.two_pass

label:	2-pass mode
type:	`basic:boolean`
description:	Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
default:	`False`

Cutadapt - STAR - StringTie (Corall, single-end)¶

data:workflow:rnaseq:corallworkflow-corall-single (data:reads:fastq:single reads, data:genomeindex:star star_index, data:annotation annotation, data:genomeindex:star rrna_reference, data:genomeindex:star globin_reference, basic:boolean show_advanced, basic:integer quality_cutoff, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v1.0.1]

RNA-seq pipeline optimized for the Lexogen Corall Total RNA-Seq Library Prep Kit. UMI-sequences are extracted from the raw reads before the reads are trimmed and quality filtered using Cutadapt. Preprocessed reads are aligned by the STAR aligner and de-duplicated using UMI-tools. Gene abundance estimates are reported by the StringTie tool. QC operates on downsampled reads and includes alignment of input reads to the rRNA/globin reference sequences. The reported alignment rate is used to asses the rRNA/globin sequence depletion rate. The analysis results and QC reports are summarized by the MultiQC.

reads

label:	Select sample(s)
type:	`data:reads:fastq:single`

star_index

label:	Genome
type:	`data:genomeindex:star`
description:	Genome index prepared by STAR aligner indexing tool.

annotation

label:	Annotation
type:	`data:annotation`
description:	Genome annotation file (GTF).

rrna_reference

label:	Indexed rRNA reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

globin_reference

label:	Indexed Globin reference sequence
type:	`data:genomeindex:star`
description:	Reference sequence index prepared by STAR aligner indexing tool.

show_advanced

label:	Show advanced parameters
type:	`basic:boolean`
default:	`False`

cutadapt.quality_cutoff

label:	Reads quality cutoff
type:	`basic:integer`
description:	Trim low-quality bases from 3’ end of each read before adapter removal. Use this option when processing the data generated by older Illumina machines. The use of this option will override the NextSeq/NovaSeq-specific trimming procedure which is enabled by default and is recommended for Illumina machines that utilize 2-color chemistry to encode the four bases.
required:	False

downsampling.n_reads

label:	Number of reads
type:	`basic:integer`
default:	`1000000`

downsampling.seed

label:	Seed
type:	`basic:integer`
default:	`11`

downsampling.fraction

label:	Fraction
type:	`basic:decimal`
description:	Use the fraction of reads in range [0.0, 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
required:	False

downsampling.two_pass

label:	2-pass mode
type:	`basic:boolean`
description:	Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
default:	`False`

DESeq2¶

data:differentialexpression:deseq2differentialexpression-deseq2 (list:data:expression case, list:data:expression control, basic:boolean beta_prior, basic:boolean count, basic:integer min_count_sum, basic:boolean cook, basic:decimal cooks_cutoff, basic:boolean independent, basic:decimal alpha)[Source: v2.5.0]

The DESeq2 package estimates variance-mean dependence in count data from high-throughput sequencing assays and tests for differential expression based on a model using the negative binomial distribution. See [here](https://www.bioconductor.org/packages/release/bioc/manuals/DESeq2/man/DESeq2.pdf) and [here](http://bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) for more information.

case

label:	Case
type:	`list:data:expression`
description:	Case samples (replicates)

control

label:	Control
type:	`list:data:expression`
description:	Control samples (replicates)

options.beta_prior

label:	Beta prior
type:	`basic:boolean`
description:	Whether or not to put a zero-mean normal prior on the non-intercept coefficients.
default:	`False`

filter.count

label:	Filter genes based on expression count
type:	`basic:boolean`
default:	`True`

filter.min_count_sum

label:	Minimum raw gene expression count summed over all samples
type:	`basic:integer`
description:	Filter genes in the expression matrix input. Remove genes where the expression count sum over all samples is below the threshold.
hidden:	!filter.count
default:	`10`

filter.cook

label:	Filter genes based on Cook’s distance
type:	`basic:boolean`
default:	`False`

filter.cooks_cutoff

label:	Threshold on Cook’s distance
type:	`basic:decimal`
description:	If one or more samples have Cook’s distance larger than the threshold set here, the p-value for the row is set to NA. If left empty, the default threshold of 0.99 quantile of the F(p, m-p) distribution is used, where p is the number of coefficients being fitted and m is the number of samples. This test excludes Cook’s distance of samples belonging to experimental groups with only two samples.
required:	False
hidden:	!filter.cook

filter.independent

label:	Apply independent gene filtering
type:	`basic:boolean`
default:	`False`

filter.alpha

label:	Significance cut-off used for optimizing independent gene filtering
type:	`basic:decimal`
description:	The value should be set to adjusted p-value cut-off (FDR).
hidden:	!filter.independent
default:	`0.1`

raw

label:	Differential expression
type:	`basic:file`

de_json

label:	Results table (JSON)
type:	`basic:json`

de_file

label:	Results table (file)
type:	`basic:file`

count_matrix

label:	Count matrix
type:	`basic:file`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

Detect library strandedness¶

data:strandednesslibrary-strandedness (data:reads:fastq reads, basic:integer read_number, data:index:salmon salmon_index)[Source: v0.2.0]

This process uses the Salmon transcript quantification tool to automatically infer the NGS library strandedness. For more details, please see the Salmon [documentation](https://salmon.readthedocs.io/en/latest/library_type.html)

reads

label:	Sequencing reads
type:	`data:reads:fastq`
description:	Sequencing reads in .fastq format. Both single and paired-end libraries are supported

read_number

label:	Number of input reads
type:	`basic:integer`
description:	Number of sequencing reads that are subsampled from each of the original .fastq files before library strand detection
default:	`50000`

salmon_index

label:	Transcriptome index file
type:	`data:index:salmon`
description:	Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results

strandedness

label:	Library strandedness type
type:	`basic:string`
description:	The predicted library strandedness type. The codes U and IU indicate ‘strand non-specific’ library for single or paired-end reads, respectively. Codes SF and ISF correspond to the ‘strand-specific forward’ library, for the single or paired-end reads, respectively. For ‘strand-specific reverse’ library, the corresponding codes are SR and ISR. For more details, please see the Salmon [documentation](https://salmon.readthedocs.io/en/latest/library_type.html)

fragment_ratio

label:	Compatible fragment ratio
type:	`basic:decimal`
description:	The ratio of fragments that support the predicted library strandedness type

log

label:	Log file
type:	`basic:file`
description:	Analysis log file.

Dictyostelium expressions¶

data:expression:polyaexpression-dicty (data:alignment:bam alignment, data:annotation:gff3 gff, data:mappability:bcm mappable)[Source: v1.3.1]

Dictyostelium-specific pipeline. Developed by Bioinformatics Laboratory, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Shaulsky Lab, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.

alignment

label:	Aligned sequence
type:	`data:alignment:bam`

gff

label:	Features (GFF3)
type:	`data:annotation:gff3`

mappable

label:	Mappability
type:	`data:mappability:bcm`

exp

label:	Expression RPKUM (polyA)
type:	`basic:file`
description:	mRNA reads scaled by uniquely mappable part of exons.

rpkmpolya

label:	Expression RPKM (polyA)
type:	`basic:file`
description:	mRNA reads scaled by exon length.

rc

label:	Read counts (polyA)
type:	`basic:file`
description:	mRNA reads uniquely mapped to gene exons.

rpkum

label:	Expression RPKUM
type:	`basic:file`
description:	Reads scaled by uniquely mappable part of exons.

rpkm

label:	Expression RPKM
type:	`basic:file`
description:	Reads scaled by exon length.

rc_raw

label:	Read counts (raw)
type:	`basic:file`
description:	Reads uniquely mapped to gene exons.

exp_json

label:	Expression RPKUM (polyA) (json)
type:	`basic:json`

exp_type

label:	Expression Type (default output)
type:	`basic:string`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

Differential Expression (table)¶

data:differentialexpression:uploadupload-diffexp (basic:file src, basic:string gene_id, basic:string logfc, basic:string fdr, basic:string logodds, basic:string fwer, basic:string pvalue, basic:string stat, basic:string source, basic:string species, basic:string build, basic:string feature_type, list:data:expression case, list:data:expression control)[Source: v1.3.0]

Upload Differential Expression table.

src

label:	Differential expression file
type:	`basic:file`
description:	Differential expression file. Supported file types: .xls, .xlsx, .tab (tab-delimited file), .diff. DE file must include columns with log2(fold change) and FDR or pval information. DE file must contain header row with column names. Accepts DESeq, DESeq2, edgeR and CuffDiff output files.
validate_regex:	`\.(xls\|xlsx\|tab\|tab.gz\|diff\|diff.gz)$`

gene_id

label:	Gene ID label
type:	`basic:string`

logfc

label:	LogFC label
type:	`basic:string`

fdr

label:	FDR label
type:	`basic:string`
required:	False

logodds

label:	LogOdds label
type:	`basic:string`
required:	False

fwer

label:	FWER label
type:	`basic:string`
required:	False

pvalue

label:	Pvalue label
type:	`basic:string`
required:	False

stat

label:	Statistics label
type:	`basic:string`
required:	False

source

label:	Gene ID database
type:	`basic:string`
choices:	AFFY: `AFFY` DICTYBASE: `DICTYBASE` ENSEMBL: `ENSEMBL` NCBI: `NCBI` UCSC: `UCSC`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

build

label:	Build
type:	`basic:string`
description:	Genome build or annotation version.

feature_type

label:	Feature type
type:	`basic:string`
default:	`gene`
choices:	gene: `gene` transcript: `transcript` exon: `exon`

case

label:	Case
type:	`list:data:expression`
description:	Case samples (replicates)
required:	False

control

label:	Control
type:	`list:data:expression`
description:	Control samples (replicates)
required:	False

raw

label:	Differential expression
type:	`basic:file`

de_json

label:	Results table (JSON)
type:	`basic:json`

de_file

label:	Results table (file)
type:	`basic:file`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

Expression Time Course¶

data:etcetc-bcm (list:data:expression expressions, basic:boolean avg)[Source: v1.1.1]

Select gene expression data and form a time course.

expressions

label:	RPKM expression profile
type:	`list:data:expression`
required:	True

avg

label:	Average by time
type:	`basic:boolean`
default:	`True`

etcfile

label:	Expression time course file
type:	`basic:file`

etc

label:	Expression time course
type:	`basic:json`

Expression aggregator¶

data:aggregator:expressionexpression-aggregator (list:data:expression exps, basic:string group_by, data:aggregator:expression expr_aggregator)[Source: v0.3.0]

Collect expression data from samples grouped by sample descriptor field. The Expression aggregator process should not be run in Batch Mode, as this will create redundant outputs. Rather, select multiple samples below for which you wish to aggregate the expression matrix.

exps

label:	Expressions
type:	`list:data:expression`

group_by

label:	Sample descriptor field
type:	`basic:string`

expr_aggregator

label:	Expression aggregator
type:	`data:aggregator:expression`
required:	False

exp_matrix

label:	Expression matrix
type:	`basic:file`

box_plot

label:	Box plot
type:	`basic:json`

log_box_plot

label:	Log box plot
type:	`basic:json`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

exp_type

label:	Expression type
type:	`basic:string`

Expression data¶

data:expressionupload-expression (basic:file rc, basic:file exp, basic:string exp_name, basic:string exp_type, basic:string source, basic:string species, basic:string build, basic:string feature_type)[Source: v2.3.0]

Upload expression data by providing raw expression data (read counts) and/or normalized expression data together with the associated data normalization type.

rc

label:	Read counts (raw expression)
type:	`basic:file`
description:	Reads mapped to genomic features (raw count data). Supported extensions: .txt.gz (preferred), .tab.* or .txt.*
required:	False
validate_regex:	`\.(txt\|tab\|gz)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z)$`

exp

label:	Normalized expression
type:	`basic:file`
description:	Normalized expression data. Supported extensions: .tab.gz (preferred) or .tab.*
required:	False
validate_regex:	`\.(tab\|gz)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z)$`

exp_name

label:	Expression name
type:	`basic:string`

exp_type

label:	Normalization type
type:	`basic:string`
description:	Normalization type
required:	False

source

label:	Gene ID source
type:	`basic:string`
choices:	AFFY: `AFFY` DICTYBASE: `DICTYBASE` ENSEMBL: `ENSEMBL` NCBI: `NCBI` UCSC: `UCSC`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

build

label:	Build
type:	`basic:string`
description:	Genome build or annotation version.

feature_type

label:	Feature type
type:	`basic:string`
default:	`gene`
choices:	gene: `gene` transcript: `transcript` exon: `exon`

exp

label:	Normalized expression
type:	`basic:file`
description:	Normalized expression

rc

label:	Read counts
type:	`basic:file`
description:	Reads mapped to genomic features.
required:	False

exp_json

label:	Expression (json)
type:	`basic:json`

exp_type

label:	Expression type
type:	`basic:string`

exp_set

label:	Expressions
type:	`basic:file`

exp_set_json

label:	Expressions (json)
type:	`basic:json`

source

label:	Gene ID source
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

Expression data (Cuffnorm)¶

data:expressionupload-expression-cuffnorm (basic:file exp, data:cufflinks:cuffquant cxb, basic:string exp_type)[Source: v1.5.0]

Upload expression data by providing Cuffnorm results.

exp

label:	Normalized expression
type:	`basic:file`

cxb

label:	Cuffquant analysis
type:	`data:cufflinks:cuffquant`
description:	Cuffquant analysis.

exp_type

label:	Normalization type
type:	`basic:string`
default:	`Cuffnorm`

exp

label:	Normalized expression
type:	`basic:file`
description:	Normalized expression

rc

label:	Read counts
type:	`basic:file`
description:	Reads mapped to genomic features.
required:	False

exp_json

label:	Expression (json)
type:	`basic:json`

exp_type

label:	Expression type
type:	`basic:string`

exp_set

label:	Expressions
type:	`basic:file`

exp_set_json

label:	Expressions (json)
type:	`basic:json`

source

label:	Gene ID source
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

Expression data (STAR)¶

data:expression:starupload-expression-star (basic:file rc, basic:string stranded, basic:string source, basic:string species, basic:string build, basic:string feature_type)[Source: v1.4.0]

Upload expression data by providing STAR aligner results.

rc

label:	Read counts (raw expression)
type:	`basic:file`
description:	Reads mapped to genomic features (raw count data). Supported extensions: .txt.gz (preferred), .tab.* or .txt.*
validate_regex:	`\.(txt\|tab\|gz)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z)$`

stranded

label:	Is data from a strand specific assay?
type:	`basic:string`
description:	For stranded=no, a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. For stranded=yes and single-end reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. For stranded=reverse, these rules are reversed.
default:	`yes`
choices:	yes: `yes` no: `no` reverse: `reverse`

source

label:	Gene ID source
type:	`basic:string`
choices:	AFFY: `AFFY` DICTYBASE: `DICTYBASE` ENSEMBL: `ENSEMBL` NCBI: `NCBI` UCSC: `UCSC`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

build

label:	Build
type:	`basic:string`
description:	Genome build or annotation version.

feature_type

label:	Feature type
type:	`basic:string`
default:	`gene`
choices:	gene: `gene` transcript: `transcript` exon: `exon`

rc

label:	Read counts (raw data)
type:	`basic:file`
description:	Reads mapped to genomic features.

exp

label:	Expression data
type:	`basic:file`

exp_json

label:	Expression (json)
type:	`basic:json`

exp_type

label:	Expression type
type:	`basic:string`

exp_set

label:	Expressions
type:	`basic:file`

exp_set_json

label:	Expressions (json)
type:	`basic:json`

source

label:	Gene ID source
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

Expression matrix¶

data:expressionsetmergeexpressions (list:data:expression exps, list:basic:string genes)[Source: v1.2.0]

Merge expression data to create an expression matrix where each column represents all the gene expression levels from a single experiment, and each row represents the expression of a gene across all experiments.

exps

label:	Gene expressions
type:	`list:data:expression`

genes

label:	Filter genes
type:	`list:basic:string`
required:	False

expset

label:	Expression set
type:	`basic:file`

expset_type

label:	Expression set type
type:	`basic:string`

Expression time course¶

data:etcupload-etc (basic:file src)[Source: v1.2.0]

Upload Expression time course.

src

label:	Expression time course file (xls or tab)
type:	`basic:file`
description:	Expression time course
required:	True
validate_regex:	`\.(xls\|xlsx\|tab)$`

etcfile

label:	Expression time course file
type:	`basic:file`

etc

label:	Expression time course
type:	`basic:json`

FASTA file¶

data:seq:nucleotideupload-fasta-nucl (basic:file src, basic:string species, basic:string build, basic:string source)[Source: v2.1.0]

Import a FASTA file, which is a text-based format for representing either nucleotide sequences or peptide sequences, in which nucleotides or amino acids are represented using single-letter codes.

src

label:	Sequence file (FASTA)
type:	`basic:file`
description:	Sequence file (containing single or multiple sequences) in FASTA format. Supported extensions: .fasta.gz (preferred), .fa., .fna. or .fasta.*
validate_regex:	`\.(fasta\|fa\|fna)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z)$`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
required:	False
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum`

build

label:	Genome build
type:	`basic:string`
required:	False

source

label:	Database source
type:	`basic:string`
required:	False

fastagz

label:	FASTA file (compressed)
type:	`basic:file`

fasta

label:	FASTA file
type:	`basic:file`

fai

label:	FASTA file index
type:	`basic:file`

number

label:	Number of sequences
type:	`basic:integer`

species

label:	Species
type:	`basic:string`
required:	False

source

label:	Database source
type:	`basic:string`
required:	False

build

label:	Build
type:	`basic:string`
required:	False

FASTQ file (paired-end)¶

data:reads:fastq:pairedupload-fastq-paired (list:basic:file src1, list:basic:file src2, basic:boolean merge_lanes)[Source: v2.3.0]

Import paired-end reads in FASTQ format, which is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.

src1

label:	Mate1
type:	`list:basic:file`
description:	Sequencing reads in FASTQ format. Supported extensions: .fastq.gz (preferred), .fq.* or .fastq.*
validate_regex:	`(\.(fastq\|fq)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z))\|(\.bz2)$`

src2

label:	Mate2
type:	`list:basic:file`
description:	Sequencing reads in FASTQ format. Supported extensions: .fastq.gz (preferred), .fq.* or .fastq.*
validate_regex:	`(\.(fastq\|fq)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z))\|(\.bz2)$`

merge_lanes

label:	Merge lanes
type:	`basic:boolean`
description:	Merge paired-end sample data split into multiple sequencing lanes into a single pair of FASTQ files.
default:	`False`

fastq

label:	Reads file (mate 1)
type:	`list:basic:file`

fastq2

label:	Reads file (mate 2)
type:	`list:basic:file`

fastqc_url

label:	Quality control with FastQC (Upstream)
type:	`list:basic:file:html`

fastqc_url2

label:	Quality control with FastQC (Downstream)
type:	`list:basic:file:html`

fastqc_archive

label:	Download FastQC archive (Upstream)
type:	`list:basic:file`

fastqc_archive2

label:	Download FastQC archive (Downstream)
type:	`list:basic:file`

FASTQ file (single-end)¶

data:reads:fastq:singleupload-fastq-single (list:basic:file src, basic:boolean merge_lanes)[Source: v2.3.0]

Import single-end reads in FASTQ format, which is a text-based format for storing both a biological sequence (usually nucleotide sequence) and its corresponding quality scores.

src

label:	Reads
type:	`list:basic:file`
description:	Sequencing reads in FASTQ format. Supported extensions: .fastq.gz (preferred), .fq.* or .fastq.*
validate_regex:	`(\.(fastq\|fq)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z))\|(\.bz2)$`

merge_lanes

label:	Merge lanes
type:	`basic:boolean`
description:	Merge sample data split into multiple sequencing lanes into a single FASTQ file.
default:	`False`

fastq

label:	Reads file
type:	`list:basic:file`

fastqc_url

label:	Quality control with FastQC
type:	`list:basic:file:html`

fastqc_archive

label:	Download FastQC archive
type:	`list:basic:file`

GAF file¶

data:gaf:2:0upload-gaf (basic:file src, basic:string source, basic:string species)[Source: v1.2.0]

GO annotation file (GAF v2.0) relating gene ID and associated GO terms

src

label:	GO annotation file (GAF v2.0)
type:	`basic:file`
description:	Upload GO annotation file (GAF v2.0) relating gene ID and associated GO terms

source

label:	Gene ID database
type:	`basic:string`
choices:	AFFY: `AFFY` DICTYBASE: `DICTYBASE` ENSEMBL: `ENSEMBL` MGI: `MGI` NCBI: `NCBI` UCSC: `UCSC` UniProtKB: `UniProtKB`

species

label:	Species
type:	`basic:string`

gaf

label:	GO annotation file (GAF v2.0)
type:	`basic:file`

gaf_obj

label:	GAF object
type:	`basic:file`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

GATK3 (HaplotypeCaller)¶

data:variants:vcf:gatk:hcvc-gatk-hc (data:alignment:bam alignment, data:genome:fasta genome, data:masterfile:amplicon intervals, data:bed intervals_bed, data:variants:vcf dbsnp, basic:integer stand_call_conf, basic:integer stand_emit_conf, basic:integer mbq)[Source: v0.4.0]

GATK HaplotypeCaller Variant Calling

alignment

label:	Alignment file (BAM)
type:	`data:alignment:bam`

genome

label:	Genome
type:	`data:genome:fasta`

intervals

label:	Intervals (from master file)
type:	`data:masterfile:amplicon`
description:	Use this option to perform the analysis over only part of the genome. This option is not compatible with ``intervals_bed`` option.
required:	False

intervals_bed

label:	Intervals (from BED file)
type:	`data:bed`
description:	Use this option to perform the analysis over only part of the genome. This options is not compatible with ``intervals`` option.
required:	False

dbsnp

label:	dbSNP file
type:	`data:variants:vcf`

stand_call_conf

label:	Min call confidence threshold
type:	`basic:integer`
description:	The minimum phred-scaled confidence threshold at which variants should be called.
default:	`20`

stand_emit_conf

label:	Emission confidence threshold
type:	`basic:integer`
description:	The minimum confidence threshold (phred-scaled) at which the program should emit sites that appear to be possibly variant.
default:	`20`

mbq

label:	Min Base Quality
type:	`basic:integer`
description:	Minimum base quality required to consider a base for calling.
default:	`20`

vcf

label:	Variants
type:	`basic:file`

tbi

label:	Tabix index
type:	`basic:file`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

GATK4 (HaplotypeCaller)¶

data:variants:vcf:gatk:hcvc-gatk4-hc (data:alignment:bam alignment, data:genome:fasta genome, data:masterfile:amplicon intervals, data:bed intervals_bed, data:variants:vcf dbsnp, basic:integer stand_call_conf, basic:integer mbq, basic:integer max_reads)[Source: v0.2.0]

GATK HaplotypeCaller Variant Calling

alignment

label:	Alignment file (BAM)
type:	`data:alignment:bam`

genome

label:	Genome
type:	`data:genome:fasta`

intervals

label:	Intervals (from master file)
type:	`data:masterfile:amplicon`
description:	Use this option to perform the analysis over only part of the genome. This option is not compatible with ``intervals_bed`` option.
required:	False

intervals_bed

label:	Intervals (from BED file)
type:	`data:bed`
description:	Use this option to perform the analysis over only part of the genome. This options is not compatible with ``intervals`` option.
required:	False

dbsnp

label:	dbSNP file
type:	`data:variants:vcf`

stand_call_conf

label:	Min call confidence threshold
type:	`basic:integer`
description:	The minimum phred-scaled confidence threshold at which variants should be called.
default:	`20`

mbq

label:	Min Base Quality
type:	`basic:integer`
description:	Minimum base quality required to consider a base for calling.
default:	`20`

max_reads

label:	Max reads per aligment start site
type:	`basic:integer`
description:	Maximum number of reads to retain per alignment start position. Reads above this threshold will be downsampled. Set to 0 to disable.
default:	`50`

vcf

label:	Variants
type:	`basic:file`

tbi

label:	Tabix index
type:	`basic:file`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

GFF3 file¶

data:annotation:gff3upload-gff3 (basic:file src, basic:string source, basic:string species, basic:string build)[Source: v3.3.0]

Import a General Feature Format (GFF) file which is a file format used for describing genes and other features of DNA, RNA and protein sequences. See [here](https://useast.ensembl.org/info/website/upload/gff3.html) and [here](https://en.wikipedia.org/wiki/General_feature_format) for more information.

src

label:	Annotation (GFF3)
type:	`basic:file`
description:	Annotation in GFF3 format. Supported extensions are: .gff, .gff3 and .gtf
validate_regex:	`\.(gff\|gff3\|gtf)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z)$`

source

label:	Gene ID database
type:	`basic:string`
choices:	DICTYBASE: `DICTYBASE` ENSEMBL: `ENSEMBL` NCBI: `NCBI` UCSC: `UCSC`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum`

build

label:	Build
type:	`basic:string`

annot

label:	Uploaded GFF3 file
type:	`basic:file`

annot_sorted

label:	Sorted GFF3 file
type:	`basic:file`

annot_sorted_idx_igv

label:	IGV index for sorted GFF3
type:	`basic:file`

annot_sorted_track_jbrowse

label:	Jbrowse track for sorted GFF3
type:	`basic:file`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

GO Enrichment analysis¶

data:goeagoenrichment (data:ontology:obo ontology, data:gaf gaf, list:basic:string genes, basic:string source, basic:string species, basic:decimal pval_threshold, basic:integer min_genes)[Source: v3.2.1]

Identify significantly enriched Gene Ontology terms for given genes.

ontology

label:	Gene Ontology
type:	`data:ontology:obo`

gaf

label:	GO annotation file (GAF v2.0)
type:	`data:gaf`

genes

label:	List of genes
type:	`list:basic:string`
placeholder:	`new gene id`

source

label:	Source
type:	`basic:string`

species

label:	Species
type:	`basic:string`
description:	Species latin name. This field is required if gene subset is set.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

pval_threshold

label:	P-value threshold
type:	`basic:decimal`
required:	False
default:	`0.1`

min_genes

label:	Minimum number of genes
type:	`basic:integer`
description:	Minimum number of genes on a GO term.
required:	False
default:	`1`

terms

label:	Enriched terms
type:	`basic:json`

source

label:	Source
type:	`basic:string`

species

label:	Species
type:	`basic:string`

GTF file¶

data:annotation:gtfupload-gtf (basic:file src, basic:string source, basic:string species, basic:string build)[Source: v3.3.0]

Import a Gene Transfer Format (GTF) file. It is a file format used to hold information about gene structure. It is a tab-delimited text format based on the general feature format (GFF), but contains some additional conventions specific to gene information. See [here](https://en.wikipedia.org/wiki/General_feature_format) for differences between GFF and GTF files.

src

label:	Annotation (GTF)
type:	`basic:file`
description:	Annotation in GTF format.
validate_regex:	`\.(gtf\|gff)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z)$`

source

label:	Gene ID database
type:	`basic:string`
choices:	DICTYBASE: `DICTYBASE` ENSEMBL: `ENSEMBL` NCBI: `NCBI` UCSC: `UCSC`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum`

build

label:	Build
type:	`basic:string`

annot

label:	Uploaded GTF file
type:	`basic:file`

annot_sorted

label:	Sorted GTF file
type:	`basic:file`

annot_sorted_idx_igv

label:	IGV index for sorted GTF file
type:	`basic:file`
required:	False

annot_sorted_track_jbrowse

label:	Jbrowse track for sorted GTF
type:	`basic:file`
required:	False

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Gene expression indices¶

data:index:expressionindex-fasta-nucl (data:seq:nucleotide nucl, basic:string nucl_genome, data:genome:fasta genome, data:annotation:gtf annotation, basic:string source, basic:string species, basic:string build)[Source: v0.4.0]

Generate gene expression indices.

nucl

label:	Nucleotide sequence
type:	`data:seq:nucleotide`
required:	False
hidden:	genome

nucl_genome

label:	Type of nucleotide sequence
type:	`basic:string`
hidden:	!nucl
default:	`gs`
choices:	Genome sequence: `gs` Transcript sequences: `ts`

genome

label:	Genome sequence
type:	`data:genome:fasta`
required:	False
hidden:	nucl

annotation

label:	Annotation
type:	`data:annotation:gtf`
required:	False
hidden:	nucl && nucl_genome == ‘ts’

source

label:	Gene ID database
type:	`basic:string`
required:	False
hidden:	!(nucl && nucl_genome == ‘ts’)
choices:	AFFY: `AFFY` DICTYBASE: `DICTYBASE` ENSEMBL: `ENSEMBL` NCBI: `NCBI` UCSC: `UCSC`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
required:	False
hidden:	!(nucl && nucl_genome == ‘ts’)
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

build

label:	Genome build
type:	`basic:string`
required:	False
hidden:	!(nucl && nucl_genome == ‘ts’)

rsem_index

label:	RSEM index
type:	`basic:dir`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Gene set¶

data:genesetupload-geneset (basic:file src, basic:string source, basic:string species)[Source: v1.1.2]

Import a set of genes. Provide one gene ID per line in a .tab, .tab.gz, or .txt file format.

src

label:	Gene set
type:	`basic:file`
description:	List of genes (.tab/.txt, one Gene ID per line. Supported extensions: .tab, .tab.gz (preferred), tab.*
validate_regex:	`(\.(tab\|txt)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z))\|(\.bz2)$`

source

label:	Gene ID source
type:	`basic:string`
choices:	AFFY: `AFFY` DICTYBASE: `DICTYBASE` ENSEMBL: `ENSEMBL` NCBI: `NCBI` UCSC: `UCSC`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

geneset

label:	Gene set
type:	`basic:file`

geneset_json

label:	Gene set (JSON)
type:	`basic:json`

source

label:	Gene ID source
type:	`basic:string`

species

label:	Species
type:	`basic:string`

Gene set (create from Venn diagram)¶

data:geneset:venncreate-geneset-venn (list:basic:string genes, basic:string source, basic:string species, basic:file venn)[Source: v1.1.2]

Create a gene set from a Venn diagram.

genes

label:	Genes
type:	`list:basic:string`
description:	List of genes.

source

label:	Gene ID source
type:	`basic:string`
choices:	AFFY: `AFFY` DICTYBASE: `DICTYBASE` ENSEMBL: `ENSEMBL` NCBI: `NCBI` UCSC: `UCSC`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

venn

label:	Venn diagram
type:	`basic:file`
description:	JSON file. Supported extensions: .json.gz
validate_regex:	`(\.json)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z)$`

geneset

label:	Gene set
type:	`basic:file`

geneset_json

label:	Gene set (JSON)
type:	`basic:json`

source

label:	Gene ID source
type:	`basic:string`

species

label:	Species
type:	`basic:string`

venn

label:	Venn diagram
type:	`basic:json`

Gene set (create)¶

data:genesetcreate-geneset (list:basic:string genes, basic:string source, basic:string species)[Source: v1.1.2]

Create a gene set from a list of genes.

genes

label:	Genes
type:	`list:basic:string`
description:	List of genes.

source

label:	Gene ID source
type:	`basic:string`
choices:	AFFY: `AFFY` DICTYBASE: `DICTYBASE` ENSEMBL: `ENSEMBL` NCBI: `NCBI` UCSC: `UCSC`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

geneset

label:	Gene set
type:	`basic:file`

geneset_json

label:	Gene set (JSON)
type:	`basic:json`

source

label:	Gene ID source
type:	`basic:string`

species

label:	Species
type:	`basic:string`

Genome¶

data:genome:fastaupload-genome (basic:file src, basic:string species, basic:string build, basic:file bowtie_index, basic:file bowtie2_index, basic:file bwa_index, basic:file hisat2_index, basic:file subread_index, basic:file walt_index)[Source: v3.4.0]

Import genome sequence in FASTA format which includes .fasta.gz (preferred), .fa., .fna., or .fasta extensions.

src

label:	Genome sequence (FASTA)
type:	`basic:file`
description:	Genome sequence in FASTA format. Supported extensions: .fasta.gz (preferred), .fa., .fna. or .fasta.*
validate_regex:	`\.(fasta\|fa\|fna\|fsa)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z)$`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

build

label:	Genome build
type:	`basic:string`

advanced.bowtie_index

label:	Bowtie index files
type:	`basic:file`
description:	Bowtie index files. Supported extensions (*.tar.gz).
required:	False
validate_regex:	`(\.tar\.gz)$`

advanced.bowtie2_index

label:	Bowtie2 index files
type:	`basic:file`
description:	Bowtie2 index files. Supported extensions (*.tar.gz).
required:	False
validate_regex:	`(\.tar\.gz)$`

advanced.bwa_index

label:	BWA index files
type:	`basic:file`
description:	BWA index files. Supported extensions (*.tar.gz).
required:	False
validate_regex:	`(\.tar\.gz)$`

advanced.hisat2_index

label:	HISAT2 index files
type:	`basic:file`
description:	HISAT2 index files. Supported extensions (*.tar.gz).
required:	False
validate_regex:	`(\.tar\.gz)$`

advanced.subread_index

label:	subread index files
type:	`basic:file`
description:	Subread index files. Supported extensions (*.tar.gz).
required:	False
validate_regex:	`(\.tar\.gz)$`

advanced.walt_index

label:	WALT index files
type:	`basic:file`
description:	WALT index files. Supported extensions (*.tar.gz).
required:	False
validate_regex:	`(\.tar\.gz)$`

fastagz

label:	Genome FASTA file (compressed)
type:	`basic:file`

fasta

label:	Genome FASTA file
type:	`basic:file`

index_bt

label:	Bowtie index
type:	`basic:dir`

index_bt2

label:	Bowtie2 index
type:	`basic:dir`

index_bwa

label:	BWA index
type:	`basic:dir`

index_hisat2

label:	HISAT2 index
type:	`basic:dir`

index_subread

label:	subread index
type:	`basic:dir`

index_walt

label:	WALT index
type:	`basic:dir`

fai

label:	Fasta index
type:	`basic:file`

dict

label:	Fasta dict
type:	`basic:file`

fasta_track_jbrowse

label:	Jbrowse track
type:	`basic:file`
hidden:	True

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

HISAT2¶

data:alignment:bam:hisat2alignment-hisat2 (data:genome:fasta genome, data:reads:fastq reads, basic:boolean softclip, basic:integer noncansplice, basic:boolean cufflinks)[Source: v1.7.0]

HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of genomes (as well as to a single reference genome). See [here](https://ccb.jhu.edu/software/hisat2/index.shtml) for more information.

genome

label:	Reference genome
type:	`data:genome:fasta`

reads

label:	Reads
type:	`data:reads:fastq`

softclip

label:	Disallow soft clipping
type:	`basic:boolean`
default:	`False`

spliced_alignments.noncansplice

label:	Non-canonical splice sites penalty (optional)
type:	`basic:integer`
description:	Sets the penalty for each pair of non-canonical splice sites (e.g. non-GT/AG).
required:	False

spliced_alignments.cufflinks

label:	Report alignments tailored specifically for Cufflinks
type:	`basic:boolean`
description:	With this option, HISAT2 looks for novel splice sites with three signals (GT/AG, GC/AG, AT/AC), but all user-provided splice sites are used irrespective of their signals. HISAT2 produces an optional field, XS:A:[+-], for every spliced alignment.
default:	`False`

bam

label:	Alignment file
type:	`basic:file`
description:	Position sorted alignment

bai

label:	Index BAI
type:	`basic:file`

stats

label:	Statistics
type:	`basic:file`

splice_junctions

label:	Splice junctions
type:	`basic:file`

unmapped_f

label:	Unmapped reads (mate 1)
type:	`basic:file`
required:	False

unmapped_r

label:	Unmapped reads (mate 2)
type:	`basic:file`
required:	False

bigwig

label:	BigWig file
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

HMR¶

data:wgbs:hmrhmr (data:wgbs:methcounts methcounts)[Source: v1.1.0]

Identify hypo-methylated regions.

methcounts

label:	Methylation levels
type:	`data:wgbs:methcounts`
description:	Methylation levels data calculated using methcounts.

hmr

label:	Hypo-methylated regions
type:	`basic:file`

tbi_jbrowse

label:	Bed file index for Jbrowse
type:	`basic:file`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

HTSeq-count (CPM)¶

data:expression:htseq:cpmhtseq-count-raw (data:alignment:bam alignments, data:annotation:gtf gtf, basic:string mode, basic:string stranded, basic:string feature_class, basic:string id_attribute, basic:string feature_type, basic:boolean name_ordered)[Source: v1.6.0]

HTSeq-count is useful for preprocessing RNA-Seq alignments for differential expression calling. It counts the number of reads that map to a genomic feature (e.g. gene). For computationally efficient quantification consider using featureCounts instead of HTSeq-count. The expressions with raw counts, produced by HTSeq are then normalized by computing CPM. See [the official website](https://htseq.readthedocs.io/en/release_0.9.1) and [the introductory paper](https://academic.oup.com/bioinformatics/article/31/2/166/2366196) for more information. For computationally efficient quantification consider using featureCounts instead of HTSeq-count.

alignments

label:	Aligned reads
type:	`data:alignment:bam`

gtf

label:	Annotation (GTF)
type:	`data:annotation:gtf`

mode

label:	Mode
type:	`basic:string`
description:	Mode to handle reads overlapping more than one feature. Possible values for <mode> are union, intersection-strict and intersection-nonempty
default:	`union`
choices:	union: `union` intersection-strict: `intersection-strict` intersection-nonempty: `intersection-nonempty`

stranded

label:	Is data from a strand specific assay?
type:	`basic:string`
description:	For stranded=no, a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. For stranded=yes and single-end reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. For stranded=reverse, these rules are reversed
default:	`yes`
choices:	yes: `yes` no: `no` reverse: `reverse`

feature_class

label:	Feature class
type:	`basic:string`
description:	Feature class (3rd column in GTF file) to be used. All other features will be ignored.
default:	`exon`

id_attribute

label:	ID attribute
type:	`basic:string`
description:	GFF attribute to be used as feature ID. Several GTF lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identity the counts in the output table.
default:	`gene_id`

feature_type

label:	Feature type
type:	`basic:string`
description:	The type of feature the quantification program summarizes over (e.g. gene or transcript-level analysis).
default:	`gene`
choices:	gene: `gene` transcript: `transcript`

name_ordered

label:	Use name-ordered BAM file for counting reads
type:	`basic:boolean`
description:	Use name-sorted BAM file for reads quantification. Improves compatibility with larger BAM files, but requires more computational time. Setting this to false may cause the process to fail for large BAM files due to buffer overflow.
default:	`True`

htseq_output

label:	HTseq-count output
type:	`basic:file`

rc

label:	Read count
type:	`basic:file`

exp

label:	CPM (Counts per million)
type:	`basic:file`

exp_json

label:	CPM (json)
type:	`basic:json`

exp_set

label:	Expressions
type:	`basic:file`

exp_set_json

label:	Expressions (json)
type:	`basic:json`

exp_type

label:	Expression Type (default output)
type:	`basic:string`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

HTSeq-count (TPM)¶

data:expression:htseq:normalizedhtseq-count (data:alignment:bam alignments, data:annotation:gtf gff, basic:string mode, basic:string stranded, basic:string feature_class, basic:string id_attribute, basic:string feature_type, basic:boolean name_ordered)[Source: v1.5.0]

HTSeq-count is useful for preprocessing RNA-Seq alignments for differential expression calling. It counts the number of reads that map to a genomic feature (e.g. gene). The expressions with raw counts, produced by HTSeq are then normalized by computing FPKM and TPM. For computationally efficient quantification consider using featureCounts instead of HTSeq-count.

alignments

label:	Aligned reads
type:	`data:alignment:bam`

gff

label:	Annotation (GFF)
type:	`data:annotation:gtf`

mode

label:	Mode
type:	`basic:string`
description:	Mode to handle reads overlapping more than one feature. Possible values for <mode> are union, intersection-strict and intersection-nonempty
default:	`union`
choices:	union: `union` intersection-strict: `intersection-strict` intersection-nonempty: `intersection-nonempty`

stranded

label:	Is data from a strand specific assay?
type:	`basic:string`
description:	For stranded=no, a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. For stranded=yes and single-end reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. For stranded=reverse, these rules are reversed
default:	`yes`
choices:	yes: `yes` no: `no` reverse: `reverse`

feature_class

label:	Feature class
type:	`basic:string`
description:	Feature class (3rd column in GFF file) to be used. All other features will be ignored.
default:	`exon`

id_attribute

label:	ID attribute
type:	`basic:string`
description:	GFF attribute to be used as feature ID. Several GFF lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identity the counts in the output table.
default:	`gene_id`

feature_type

label:	Feature type
type:	`basic:string`
description:	The type of feature the quantification program summarizes over (e.g. gene or transcript-level analysis).
default:	`gene`
choices:	gene: `gene` transcript: `transcript`

name_ordered

label:	Use name-ordered BAM file for counting reads
type:	`basic:boolean`
description:	Use name-sorted BAM file for reads quantification. Improves compatibility with larger BAM files, but requires more computational time. Setting this to false may cause the process to fail for large BAM files due to buffer overflow.
default:	`True`

htseq_output

label:	HTseq-count output
type:	`basic:file`

rc

label:	Read counts
type:	`basic:file`

fpkm

label:	FPKM
type:	`basic:file`

exp

label:	TPM (Transcripts Per Million)
type:	`basic:file`

exp_json

label:	TPM (json)
type:	`basic:json`

exp_type

label:	Expression Type (default output)
type:	`basic:string`

exp_set

label:	Expressions
type:	`basic:file`

exp_set_json

label:	Expressions (json)
type:	`basic:json`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

Hierarchical clustering of genes¶

data:clustering:hierarchical:geneclustering-hierarchical-genes (list:data:expression exps, basic:boolean advanced, list:basic:string genes, basic:string source, basic:string species, basic:boolean log2, basic:boolean z_score, basic:string distance_metric, basic:string linkage_method, basic:boolean order)[Source: v3.1.0]

Hierarchical clustering of genes.

exps

label:	Expressions
type:	`list:data:expression`
description:	Select at least two data objects.

advanced

label:	Show advanced options
type:	`basic:boolean`
default:	`False`

preprocessing.genes

label:	Gene subset
type:	`list:basic:string`
description:	Select at least two genes or leave this field empty.
required:	False
placeholder:	`new gene id`

preprocessing.source

label:	Gene ID database of selected genes
type:	`basic:string`
description:	This field is required if gene subset is set.
required:	False
hidden:	!preprocessing.genes

preprocessing.species

label:	Species
type:	`basic:string`
description:	Species latin name. This field is required if gene subset is set.
required:	False
hidden:	!preprocessing.genes
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

preprocessing.log2

label:	Log-transform expressions
type:	`basic:boolean`
description:	Transform expressions with log2(x + 1) before clustering.
default:	`True`

preprocessing.z_score

label:	Z-score normalization
type:	`basic:boolean`
description:	Use Z-score normalization of gene expressions before clustering.
default:	`True`

processing.distance_metric

label:	Distance metric
type:	`basic:string`
default:	`pearson`
choices:	Euclidean: `euclidean` Pearson: `pearson` Spearman: `spearman`

processing.linkage_method

label:	Linkage method
type:	`basic:string`
default:	`average`
choices:	single: `single` average: `average` complete: `complete`

postprocessing.order

label:	Order samples optimally
type:	`basic:boolean`
default:	`True`

cluster

label:	Hierarchical clustering
type:	`basic:json`
required:	False

Hierarchical clustering of samples¶

data:clustering:hierarchical:sampleclustering-hierarchical-samples (list:data:expression exps, basic:boolean advanced, list:basic:string genes, basic:string source, basic:string species, basic:boolean log2, basic:boolean z_score, basic:string distance_metric, basic:string linkage_method, basic:boolean order)[Source: v3.1.0]

Hierarchical clustering of samples.

exps

label:	Expressions
type:	`list:data:expression`
description:	Select at least two data objects.

advanced

label:	Show advanced options
type:	`basic:boolean`
default:	`False`

preprocessing.genes

label:	Gene subset
type:	`list:basic:string`
description:	Select at least two genes or leave this field empty.
required:	False
placeholder:	`new gene id`

preprocessing.source

label:	Gene ID database of selected genes
type:	`basic:string`
description:	This field is required if gene subset is set.
required:	False
hidden:	!preprocessing.genes

preprocessing.species

label:	Species
type:	`basic:string`
description:	Species latin name. This field is required if gene subset is set.
required:	False
hidden:	!preprocessing.genes
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

preprocessing.log2

label:	Log-transform expressions
type:	`basic:boolean`
description:	Transform expressions with log2(x + 1) before clustering.
default:	`True`

preprocessing.z_score

label:	Z-score normalization
type:	`basic:boolean`
description:	Use Z-score normalization of gene expressions before clustering.
default:	`True`

processing.distance_metric

label:	Distance metric
type:	`basic:string`
default:	`pearson`
choices:	Euclidean: `euclidean` Pearson: `pearson` Spearman: `spearman`

processing.linkage_method

label:	Linkage method
type:	`basic:string`
default:	`average`
choices:	single: `single` average: `average` complete: `complete`

postprocessing.order

label:	Order samples optimally
type:	`basic:boolean`
default:	`True`

cluster

label:	Hierarchical clustering
type:	`basic:json`
required:	False

Indel Realignment and Base Recalibration¶

data:alignment:bam:vcvc-realign-recalibrate (data:alignment:bam alignment, data:genome:fasta genome, list:data:variants:vcf known_vars, list:data:variants:vcf known_indels)[Source: v1.0.2]

Preprocess BAM file and prepare for Variant Calling.

alignment

label:	Alignment file (BAM)
type:	`data:alignment:bam`

genome

label:	Genome
type:	`data:genome:fasta`

known_vars

label:	Known sites (dbSNP)
type:	`list:data:variants:vcf`

known_indels

label:	Known indels
type:	`list:data:variants:vcf`

bam

label:	Alignment file
type:	`basic:file`

bai

label:	Index BAI
type:	`basic:file`

stats

label:	Stats
type:	`basic:file`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

LoFreq (call)¶

data:variants:vcf:lofreqlofreq (data:alignment:bam alignment, data:genome:fasta genome, data:masterfile:amplicon intervals, basic:integer min_bq, basic:integer min_alt_bq)[Source: v0.4.1]

Lofreq (call) Variant Calling.

alignment

label:	Alignment file (BAM)
type:	`data:alignment:bam`

genome

label:	Genome
type:	`data:genome:fasta`

intervals

label:	Intervals
type:	`data:masterfile:amplicon`
description:	Use this option to perform the analysis over only part of the genome.

min_bq

label:	Min baseQ
type:	`basic:integer`
description:	Skip any base with baseQ smaller than the default value.
default:	`6`

min_alt_bq

label:	Min alternate baseQ
type:	`basic:integer`
description:	Skip alternate bases with baseQ smaller than the default value.
default:	`6`

vcf

label:	Variants
type:	`basic:file`

tbi

label:	Tabix index
type:	`basic:file`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

MACS 1.4¶

data:chipseq:callpeak:macs14macs14 (data:alignment:bam treatment, data:alignment:bam control, basic:string pvalue)[Source: v3.2.1]

Model-based Analysis of ChIP-Seq (MACS 1.4) empirically models the length of the sequenced ChIP fragments, which tends to be shorter than sonication or library construction size estimates, and uses it to improve the spatial resolution of predicted binding sites. MACS also uses a dynamic Poisson distribution to effectively capture local biases in the genome sequence, allowing for more sensitive and robust prediction. See the [original paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2592715/) for more information.

treatment

label:	BAM File
type:	`data:alignment:bam`

control

label:	BAM Background File
type:	`data:alignment:bam`
required:	False

pvalue

label:	P-value
type:	`basic:string`
default:	`1e-9`
choices:	1e-9: `1e-9` 1e-6: `1e-6`

peaks_bed

label:	Peaks (BED)
type:	`basic:file`

summits_bed

label:	Summits (BED)
type:	`basic:file`

peaks_xls

label:	Peaks (XLS)
type:	`basic:file`

wiggle

label:	Wiggle
type:	`basic:file`

control_bigwig

label:	Control (bigWig)
type:	`basic:file`
required:	False

treat_bigwig

label:	Treat (bigWig)
type:	`basic:file`

peaks_bigbed_igv_ucsc

label:	Peaks (bigBed)
type:	`basic:file`
required:	False

summits_bigbed_igv_ucsc

label:	Summits (bigBed)
type:	`basic:file`
required:	False

peaks_tbi_jbrowse

label:	JBrowse track peaks file
type:	`basic:file`

summits_tbi_jbrowse

label:	JBrowse track summits file
type:	`basic:file`

model

label:	Model
type:	`basic:file`
required:	False

neg_peaks

label:	Negative peaks (XLS)
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

MACS 2.0¶

data:chipseq:callpeak:macs2macs2-callpeak (data:alignment:bam case, data:alignment:bam control, data:bed promoter, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff)[Source: v4.0.5]

Model-based Analysis of ChIP-Seq (MACS 2.0), is used to identify transcript factor binding sites. MACS 2.0 captures the influence of genome complexity to evaluate the significance of enriched ChIP regions, and MACS improves the spatial resolution of binding sites through combining the information of both sequencing tag position and orientation. It has also an option to link nearby peaks together in order to call broad peaks. See [here](https://github.com/taoliu/MACS/) for more information. In addition to peak-calling, this process computes ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. QC report contains ENCODE 3 proposed QC metrics – [NRF](https://www.encodeproject.org/data-standards/terms/), [PBC bottlenecking coefficients, NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq).

case

label:	Case (treatment)
type:	`data:alignment:bam`

control

label:	Control (background)
type:	`data:alignment:bam`
required:	False

promoter

label:	Promoter regions BED file
type:	`data:bed`
description:	BED file containing promoter regions (TSS+-1000bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
required:	False

tagalign

label:	Use tagAlign files
type:	`basic:boolean`
description:	Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
default:	`False`

prepeakqc_settings.q_threshold

label:	Quality filtering threshold
type:	`basic:integer`
default:	`30`

prepeakqc_settings.n_sub

label:	Number of reads to subsample
type:	`basic:integer`
default:	`15000000`

prepeakqc_settings.tn5

label:	TN5 shifting
type:	`basic:boolean`
description:	Tn5 transposon shifting. Shift reads on “+” strand by 4bp and reads on “-” strand by 5bp.
default:	`False`

prepeakqc_settings.shift

label:	User-defined cross-correlation peak strandshift
type:	`basic:integer`
description:	If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
required:	False

settings.duplicates

label:	Number of duplicates
type:	`basic:string`
description:	It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
required:	False
hidden:	tagalign
choices:	1: `1` auto: `auto` all: `all`

settings.duplicates_prepeak

label:	Number of duplicates
type:	`basic:string`
description:	It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
required:	False
hidden:	!tagalign
default:	`all`
choices:	1: `1` auto: `auto` all: `all`

settings.qvalue

label:	Q-value cutoff
type:	`basic:decimal`
description:	The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
required:	False
disabled:	settings.pvalue && settings.pvalue_prepeak

settings.pvalue

label:	P-value cutoff
type:	`basic:decimal`
description:	The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
required:	False
disabled:	settings.qvalue
hidden:	tagalign

settings.pvalue_prepeak

label:	P-value cutoff
type:	`basic:decimal`
description:	The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
disabled:	settings.qvalue
hidden:	!tagalign \|\| settings.qvalue
default:	`1e-05`

settings.cap_num

label:	Cap number of peaks by taking top N peaks
type:	`basic:integer`
description:	To keep all peaks set value to 0.
disabled:	settings.broad
default:	`500000`

settings.mfold_lower

label:	MFOLD range (lower limit)
type:	`basic:integer`
description:	This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:	False

settings.mfold_upper

label:	MFOLD range (upper limit)
type:	`basic:integer`
description:	This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:	False

settings.slocal

label:	Small local region
type:	`basic:integer`
description:	Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:	False

settings.llocal

label:	Large local region
type:	`basic:integer`
description:	Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:	False

settings.extsize

label:	extsize
type:	`basic:integer`
description:	While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
required:	False

settings.shift

label:	Shift
type:	`basic:integer`
description:	Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
required:	False

settings.band_width

label:	Band width
type:	`basic:integer`
description:	The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
required:	False

settings.nolambda

label:	Use backgroud lambda as local lambda
type:	`basic:boolean`
description:	With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
default:	`False`

settings.fix_bimodal

label:	Turn on the auto paired-peak model process
type:	`basic:boolean`
description:	Whether turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tags. If set, MACS will be terminated if paired-peak model is failed.
default:	`False`

settings.nomodel

label:	Bypass building the shifting model
type:	`basic:boolean`
description:	While on, MACS will bypass building the shifting model.
hidden:	tagalign
default:	`False`

settings.nomodel_prepeak

label:	Bypass building the shifting model
type:	`basic:boolean`
description:	While on, MACS will bypass building the shifting model.
hidden:	!tagalign
default:	`True`

settings.down_sample

label:	Down-sample
type:	`basic:boolean`
description:	When set, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change. Consider to use ‘randsample’ script before MACS2 runs instead.
default:	`False`

settings.bedgraph

label:	Save fragment pileup and control lambda
type:	`basic:boolean`
description:	If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
default:	`True`

settings.spmr

label:	Save signal per million reads for fragment pileup profiles
type:	`basic:boolean`
disabled:	settings.bedgraph === false
default:	`True`

settings.call_summits

label:	Call summits
type:	`basic:boolean`
description:	MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
default:	`False`

settings.broad

label:	Composite broad regions
type:	`basic:boolean`
description:	When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
disabled:	settings.call_summits === true
default:	`False`

settings.broad_cutoff

label:	Broad cutoff
type:	`basic:decimal`
description:	Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
required:	False
disabled:	settings.call_summits === true \|\| settings.broad !== true

called_peaks

label:	Called peaks
type:	`basic:file`

narrow_peaks

label:	Narrow peaks
type:	`basic:file`
required:	False

chip_qc

label:	QC report
type:	`basic:file`
required:	False

case_prepeak_qc

label:	Pre-peak QC report (case)
type:	`basic:file`

case_tagalign

label:	Filtered tagAlign (case)
type:	`basic:file`

control_prepeak_qc

label:	Pre-peak QC report (control)
type:	`basic:file`
required:	False

control_tagalign

label:	Filtered tagAlign (control)
type:	`basic:file`
required:	False

narrow_peaks_bigbed_igv_ucsc

label:	Narrow peaks (BigBed)
type:	`basic:file`
required:	False

summits

label:	Peak summits
type:	`basic:file`
required:	False

summits_tbi_jbrowse

label:	Peak summits tbi index for JBrowse
type:	`basic:file`
required:	False

summits_bigbed_igv_ucsc

label:	Summits (bigBed)
type:	`basic:file`
required:	False

broad_peaks

label:	Broad peaks
type:	`basic:file`
required:	False

gappedPeak

label:	Broad peaks (bed12/gappedPeak)
type:	`basic:file`
required:	False

treat_pileup

label:	Treatment pileup (bedGraph)
type:	`basic:file`
required:	False

treat_pileup_bigwig

label:	Treatment pileup (bigWig)
type:	`basic:file`
required:	False

control_lambda

label:	Control lambda (bedGraph)
type:	`basic:file`
required:	False

control_lambda_bigwig

label:	Control lambda (bigwig)
type:	`basic:file`
required:	False

model

label:	Model
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

MACS2 - ROSE2¶

data:workflow:chipseq:macs2rose2workflow-macs-rose (data:alignment:bam case, data:alignment:bam control, data:bed promoter, basic:boolean tagalign, basic:integer q_threshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift, basic:string duplicates, basic:string duplicates_prepeak, basic:decimal qvalue, basic:decimal pvalue, basic:decimal pvalue_prepeak, basic:integer cap_num, basic:integer mfold_lower, basic:integer mfold_upper, basic:integer slocal, basic:integer llocal, basic:integer extsize, basic:integer shift, basic:integer band_width, basic:boolean nolambda, basic:boolean fix_bimodal, basic:boolean nomodel, basic:boolean nomodel_prepeak, basic:boolean down_sample, basic:boolean bedgraph, basic:boolean spmr, basic:boolean call_summits, basic:boolean broad, basic:decimal broad_cutoff, basic:integer tss, basic:integer stitch, data:bed mask)[Source: v1.0.1]

case

label:	Case (treatment)
type:	`data:alignment:bam`

control

label:	Control (background)
type:	`data:alignment:bam`
required:	False

promoter

label:	Promoter regions BED file
type:	`data:bed`
description:	BED file containing promoter regions (TSS+-1000bp for example). Needed to get the number of peaks and reads mapped to promoter regions.
required:	False

tagalign

label:	Use tagAlign files
type:	`basic:boolean`
description:	Use filtered tagAlign files as case (treatment) and control (background) samples. If extsize parameter is not set, run MACS using input’s estimated fragment length.
default:	`False`

prepeakqc_settings.q_threshold

label:	Quality filtering threshold
type:	`basic:integer`
default:	`30`

prepeakqc_settings.n_sub

label:	Number of reads to subsample
type:	`basic:integer`
default:	`15000000`

prepeakqc_settings.tn5

label:	TN5 shifting
type:	`basic:boolean`
description:	Tn5 transposon shifting. Shift reads on “+” strand by 4bp and reads on “-” strand by 5bp.
default:	`False`

prepeakqc_settings.shift

label:	User-defined cross-correlation peak strandshift
type:	`basic:integer`
description:	If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
required:	False

settings.duplicates

label:	Number of duplicates
type:	`basic:string`
description:	It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
required:	False
hidden:	tagalign
choices:	1: `1` auto: `auto` all: `all`

settings.duplicates_prepeak

label:	Number of duplicates
type:	`basic:string`
description:	It controls the MACS behavior towards duplicate tags at the exact same location – the same coordination and the same strand. The ‘auto’ option makes MACS calculate the maximum tags at the exact same location based on binomal distribution using 1e-5 as pvalue cutoff and the ‘all’ option keeps all the tags. If an integer is given, at most this number of tags will be kept at the same location. The default is to keep one tag at the same location.
required:	False
hidden:	!tagalign
default:	`all`
choices:	1: `1` auto: `auto` all: `all`

settings.qvalue

label:	Q-value cutoff
type:	`basic:decimal`
description:	The q-value (minimum FDR) cutoff to call significant regions. Q-values are calculated from p-values using Benjamini-Hochberg procedure.
required:	False
disabled:	settings.pvalue && settings.pvalue_prepeak

settings.pvalue

label:	P-value cutoff
type:	`basic:decimal`
description:	The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
required:	False
disabled:	settings.qvalue
hidden:	tagalign

settings.pvalue_prepeak

label:	P-value cutoff
type:	`basic:decimal`
description:	The p-value cutoff. If specified, MACS2 will use p-value instead of q-value cutoff.
disabled:	settings.qvalue
hidden:	!tagalign \|\| settings.qvalue
default:	`1e-05`

settings.cap_num

label:	Cap number of peaks by taking top N peaks
type:	`basic:integer`
description:	To keep all peaks set value to 0.
disabled:	settings.broad
default:	`500000`

settings.mfold_lower

label:	MFOLD range (lower limit)
type:	`basic:integer`
description:	This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:	False

settings.mfold_upper

label:	MFOLD range (upper limit)
type:	`basic:integer`
description:	This parameter is used to select the regions within MFOLD range of high-confidence enrichment ratio against background to build model. The regions must be lower than upper limit, and higher than the lower limit of fold enrichment. DEFAULT:10,30 means using all regions not too low (>10) and not too high (<30) to build paired-peaks model. If MACS can not find more than 100 regions to build model, it will use the –extsize parameter to continue the peak detection ONLY if –fix-bimodal is set.
required:	False

settings.slocal

label:	Small local region
type:	`basic:integer`
description:	Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:	False

settings.llocal

label:	Large local region
type:	`basic:integer`
description:	Slocal and llocal parameters control which two levels of regions will be checked around the peak regions to calculate the maximum lambda as local lambda. By default, MACS considers 1000bp for small local region (–slocal), and 10000bps for large local region (–llocal) which captures the bias from a long range effect like an open chromatin domain. You can tweak these according to your project. Remember that if the region is set too small, a sharp spike in the input data may kill the significant peak.
required:	False

settings.extsize

label:	extsize
type:	`basic:integer`
description:	While ‘–nomodel’ is set, MACS uses this parameter to extend reads in 5’->3’ direction to fix-sized fragments. For example, if the size of binding region for your transcription factor is 200 bp, and you want to bypass the model building by MACS, this parameter can be set as 200. This option is only valid when –nomodel is set or when MACS fails to build model and –fix-bimodal is on.
required:	False

settings.shift

label:	Shift
type:	`basic:integer`
description:	Note, this is NOT the legacy –shiftsize option which is replaced by –extsize! You can set an arbitrary shift in bp here. Please Use discretion while setting it other than default value (0). When –nomodel is set, MACS will use this value to move cutting ends (5’) then apply –extsize from 5’ to 3’ direction to extend them to fragments. When this value is negative, ends will be moved toward 3’->5’ direction, otherwise 5’->3’ direction. Recommended to keep it as default 0 for ChIP-Seq datasets, or -1 * half of EXTSIZE together with –extsize option for detecting enriched cutting loci such as certain DNAseI-Seq datasets. Note, you can’t set values other than 0 if format is BAMPE for paired-end data. Default is 0.
required:	False

settings.band_width

label:	Band width
type:	`basic:integer`
description:	The band width which is used to scan the genome ONLY for model building. You can set this parameter as the sonication fragment size expected from wet experiment. The previous side effect on the peak detection process has been removed. So this parameter only affects the model building.
required:	False

settings.nolambda

label:	Use backgroud lambda as local lambda
type:	`basic:boolean`
description:	With this flag on, MACS will use the background lambda as local lambda. This means MACS will not consider the local bias at peak candidate regions.
default:	`False`

settings.fix_bimodal

label:	Turn on the auto paired-peak model process
type:	`basic:boolean`
description:	Whether turn on the auto paired-peak model process. If it’s set, when MACS failed to build paired model, it will use the nomodel settings, the ‘–extsize’ parameter to extend each tags. If set, MACS will be terminated if paired-peak model is failed.
default:	`False`

settings.nomodel

label:	Bypass building the shifting model
type:	`basic:boolean`
description:	While on, MACS will bypass building the shifting model.
hidden:	tagalign
default:	`False`

settings.nomodel_prepeak

label:	Bypass building the shifting model
type:	`basic:boolean`
description:	While on, MACS will bypass building the shifting model.
hidden:	!tagalign
default:	`True`

settings.down_sample

label:	Down-sample
type:	`basic:boolean`
description:	When set, random sampling method will scale down the bigger sample. By default, MACS uses linear scaling. This option will make the results unstable and irreproducible since each time, random reads would be selected, especially the numbers (pileup, pvalue, qvalue) would change. Consider to use ‘randsample’ script before MACS2 runs instead.
default:	`False`

settings.bedgraph

label:	Save fragment pileup and control lambda
type:	`basic:boolean`
description:	If this flag is on, MACS will store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files will be stored in current directory named NAME+’_treat_pileup.bdg’ for treatment data, NAME+’_control_lambda.bdg’ for local lambda values from control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue) form), and NAME+’_treat_qvalue.bdg’ for q-value scores from Benjamini-Hochberg-Yekutieli procedure.
default:	`True`

settings.spmr

label:	Save signal per million reads for fragment pileup profiles
type:	`basic:boolean`
disabled:	settings.bedgraph === false
default:	`True`

settings.call_summits

label:	Call summits
type:	`basic:boolean`
description:	MACS will now reanalyze the shape of signal profile (p or q-score depending on cutoff setting) to deconvolve subpeaks within each peak called from general procedure. It’s highly recommended to detect adjacent binding events. While used, the output subpeaks of a big peak region will have the same peak boundaries, and different scores and peak summit positions.
default:	`False`

settings.broad

label:	Composite broad regions
type:	`basic:boolean`
description:	When this flag is on, MACS will try to composite broad regions in BED12 (a gene-model-like format) by putting nearby highly enriched regions into a broad region with loose cutoff. The broad region is controlled by another cutoff through –broad-cutoff. The maximum length of broad region length is 4 times of d from MACS.
disabled:	settings.call_summits === true
default:	`False`

settings.broad_cutoff

label:	Broad cutoff
type:	`basic:decimal`
description:	Cutoff for broad region. This option is not available unless –broad is set. If -p is set, this is a p-value cutoff, otherwise, it’s a q-value cutoff. DEFAULT = 0.1
required:	False
disabled:	settings.call_summits === true \|\| settings.broad !== true

rose_settings.tss

label:	TSS exclusion
type:	`basic:integer`
description:	Enter a distance from TSS to exclude. 0 = no TSS exclusion
default:	`0`

rose_settings.stitch

label:	Stitch
type:	`basic:integer`
description:	Enter a max linking distance for stitching. If not given, optimal stitching parameter will be determined automatically.
required:	False

rose_settings.mask

label:	Masking BED file
type:	`data:bed`
description:	Mask a set of regions from analysis. Provide a BED of masking regions.
required:	False

Mappability¶

data:mappability:bcmmappability-bcm (data:genome:fasta genome, data:annotation:gff3 gff, basic:integer length)[Source: v2.0.1]

Compute genome mappability. Developed by Bioinformatics Laboratory, Faculty of Computer and Information Science, University of Ljubljana, Slovenia and Shaulsky’s Lab, Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, USA.

genome

label:	Reference genome
type:	`data:genome:fasta`

gff

label:	General feature format
type:	`data:annotation:gff3`

length

label:	Read length
type:	`basic:integer`
default:	`50`

mappability

label:	Mappability
type:	`basic:file`

Mappability info¶

data:mappability:bcmupload-mappability (basic:file src)[Source: v1.1.1]

Upload mappability information.

src

label:	Mappability file
type:	`basic:file`
description:	Mappability file: 2 column tab separated
validate_regex:	`\.(tab)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z)$`

mappability

label:	Uploaded mappability
type:	`basic:file`

Merge Expressions (ETC)¶

data:expressionset:etcmergeetc (list:data:etc exps, list:basic:string genes)[Source: v1.1.1]

Merge Expression Time Course (ETC) data.

exps

label:	Expression Time Course (ETC)
type:	`list:data:etc`

genes

label:	Filter genes
type:	`list:basic:string`
required:	False

expset

label:	Expression set
type:	`basic:file`

expset_type

label:	Expression set type
type:	`basic:string`

Metabolic pathway file¶

data:metabolicpathwayupload-metabolic-pathway (basic:file src, basic:string source, basic:string species)[Source: v1.0.1]

Upload pathway json.

src

label:	Pathway file
type:	`basic:file`
description:	JSON file. Supported extensions: ‘.json’, ‘.json.gz’
validate_regex:	`(\.json)(\.gz)?$`

source

label:	Gene ID database
type:	`basic:string`
choices:	BIGG: `BIGG`

species

label:	Species
type:	`basic:string`
choices:	Homo Sapiens: `Homo Sapiens` Mus musculus: `Mus musculus`

pathway

label:	Pathway json
type:	`basic:json`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

MultiQC¶

data:multiqcmultiqc (list:data data, basic:boolean dirs, basic:integer dirs_depth, basic:boolean fullnames, basic:boolean config, basic:string cl_config)[Source: v1.4.0]

Aggregate results from bioinformatics analyses across many samples into a single report. [MultiQC](http://www.multiqc.info) searches a given directory for analysis logs and compiles a HTML report. It’s a general use tool, perfect for summarising the output from numerous bioinformatics tools.

data

label:	Input data
type:	`list:data`
description:	Select multiple data objects for which the MultiQC report is to be generated.

advanced.dirs

label:	–dirs
type:	`basic:boolean`
description:	Prepend directory to sample names.
default:	`True`

advanced.dirs_depth

label:	–dirs-depth
type:	`basic:integer`
description:	Prepend a specified number of directories to sample names. Enter a negative number to take from start of path.
default:	`-1`

advanced.fullnames

label:	–fullnames
type:	`basic:boolean`
description:	Do not clean the sample names (leave as full file name).
default:	`False`

advanced.config

label:	Use configuration file
type:	`basic:boolean`
description:	Use Genialis configuration file for MultiQC report.
default:	`True`

advanced.cl_config

label:	–cl-config
type:	`basic:string`
description:	Enter text with command-line configuration options to override the defaults (e.g. custom_logo_url: https://www.genialis.com).
required:	False

report

label:	MultiQC report
type:	`basic:file:html`

report_data

label:	Report data
type:	`basic:dir`

OBO file¶

data:ontology:oboupload-obo (basic:file src)[Source: v1.2.0]

Upload gene ontology in OBO format.

src

label:	Gene ontology (OBO)
type:	`basic:file`
description:	Gene ontology in OBO format.
required:	True
validate_regex:	`\.obo(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z)$`

obo

label:	Ontology file
type:	`basic:file`

obo_obj

label:	OBO object
type:	`basic:file`

PCA¶

data:pcapca (list:data:expression exps, list:basic:string genes, basic:string source, basic:string species)[Source: v2.2.0]

Principal component analysis (PCA)

exps

label:	Expressions
type:	`list:data:expression`

genes

label:	Gene subset
type:	`list:basic:string`
required:	False

source

label:	Gene ID database of selected genes
type:	`basic:string`
description:	This field is required if gene subset is set.
required:	False

species

label:	Species
type:	`basic:string`
description:	Species latin name. This field is required if gene subset is set.
required:	False
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

pca

label:	PCA
type:	`basic:json`

Picard CollectTargetedPcrMetrics¶

data:picard:coveragepicard-pcrmetrics (data:alignment:bam alignment, data:masterfile:amplicon master_file, data:genome:fasta genome)[Source: v0.2.1]

Calculate PCR-related metrics from targeted sequencing data using the Picard CollectTargetedPcrMetrics tool

alignment

label:	Alignment file (BAM)
type:	`data:alignment:bam`

master_file

label:	Master file
type:	`data:masterfile:amplicon`

genome

label:	Genome
type:	`data:genome:fasta`

target_pcr_metrics

label:	Target PCR metrics
type:	`basic:file`

target_coverage

label:	Target coverage
type:	`basic:file`

Pre-peakcall QC¶

data:prepeakqcqc-prepeak (data:alignment:bam alignment, basic:integer q_treshold, basic:integer n_sub, basic:boolean tn5, basic:integer shift)[Source: v0.2.2]

ChIP-Seq and ATAC-Seq QC metrics. Process returns a QC metrics report, fragment length estimation, and a deduplicated tagAlign file. Both fragment length estimation and the tagAlign file can be used as inputs in MACS 2.0. QC report contains ENCODE 3 proposed QC metrics – [NRF, PBC bottlenecking coefficients](https://www.encodeproject.org/data-standards/terms/), [NSC, and RSC](https://genome.ucsc.edu/ENCODE/qualityMetrics.html#chipSeq).

alignment

label:	Aligned reads
type:	`data:alignment:bam`

q_treshold

label:	Quality filtering treshold
type:	`basic:integer`
default:	`30`

n_sub

label:	Number of reads to subsample
type:	`basic:integer`
default:	`15000000`

tn5

label:	TN5 shifting
type:	`basic:boolean`
description:	Tn5 transposon shifting. Shift reads on “+” strand by 4bp and reads on “-” strand by 5bp.
default:	`False`

shift

label:	User-defined cross-correlation peak strandshift
type:	`basic:integer`
description:	If defined, SPP tool will not try to estimate fragment length but will use the given value as fragment length.
required:	False

chip_qc

label:	QC report
type:	`basic:file`

tagalign

label:	Filtered tagAlign
type:	`basic:file`

fraglen

label:	Fragnment length
type:	`basic:integer`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Prepare GEO - ChIP-Seq¶

data:other:geo:chipseqprepare-geo-chipseq (list:data:reads:fastq reads, list:data:chipseq:callpeak macs, basic:string name)[Source: v2.0.2]

Prepare ChIP-seq data for GEO upload.

reads

label:	Reads
type:	`list:data:reads:fastq`
description:	List of reads objects. Fastq files will be used.

macs

label:	MACS
type:	`list:data:chipseq:callpeak`
description:	List of MACS2 or MACS14 objects. BedGraph (MACS2) or Wiggle (MACS14) files will be used.

name

label:	Collection name
type:	`basic:string`

tarball

label:	GEO folder
type:	`basic:file`

table

label:	Annotation table
type:	`basic:file`

Prepare GEO - RNA-Seq¶

data:other:geo:rnaseqprepare-geo-rnaseq (list:data:reads:fastq reads, list:data:expression expressions, basic:string name)[Source: v0.1.1]

Prepare RNA-Seq data for GEO upload.

reads

label:	Reads
type:	`list:data:reads:fastq`
description:	List of reads objects. Fastq files will be used.

expressions

label:	Expressions
type:	`list:data:expression`
description:	Cuffnorm data object. Expression table will be used.

name

label:	Collection name
type:	`basic:string`

tarball

label:	GEO folder
type:	`basic:file`

table

label:	Annotation table
type:	`basic:file`

Quantify shRNA species using bowtie2¶

data:expression:shrna2quantshrna-quant (data:alignment:bam alignment, basic:integer readlengths, basic:integer alignscores)[Source: v1.1.0]

Based on `bowtie2` output (.bam file) calculate number of mapped species. Input is limited to results from `bowtie2` since `YT:Z:` tag used to fetch aligned species is specific to this process. Result is a count matrix (successfully mapped reads) where species are in rows columns contain read specifics (count, species name, sequence, `AS:i:` tag value).

alignment

label:	Alignment
type:	`data:alignment:bam`
required:	True

readlengths

label:	Species lengths threshold
type:	`basic:integer`
description:	Species with read lengths below specified threshold will be removed from final output. Default is no removal.

alignscores

label:	Align scores filter threshold
type:	`basic:integer`
description:	Species with align score below specified threshold will be removed from final output. Default is no removal.

exp

label:	Normalized expression
type:	`basic:file`

rc

label:	Read counts
type:	`basic:file`
required:	False

exp_json

label:	Expression (json)
type:	`basic:json`

exp_type

label:	Expression type
type:	`basic:string`

source

label:	Gene ID source
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

mapped_species

label:	Mapped species
type:	`basic:file`

RNA-Seq (Cuffquant)¶

data:workflow:rnaseq:cuffquantworkflow-rnaseq-cuffquant (data:reads:fastq reads, data:genome:fasta genome, data:annotation annotation)[Source: v1.0.0]

reads

label:	Input reads
type:	`data:reads:fastq`

genome

label:	genome
type:	`data:genome:fasta`

annotation

label:	Annotation file
type:	`data:annotation`

ROSE2¶

data:chipseq:rose2rose2 (data:chipseq:callpeak input, data:bed input_upload, data:alignment:bam rankby, data:alignment:bam control, basic:integer tss, basic:integer stitch, data:bed mask)[Source: v4.3.1]

For identification of super enhancers R2 uses the Rank Ordering of Super-Enhancers algorithm (ROSE2). This takes the peaks called by RSEG for acetylation and calculates the distances in-between to judge whether they can be considered super-enhancers. The ranked values can be plotted and by locating the inflection point in the resulting graph, super-enhancers can be assigned. It can also be used with the MACS calculated data. See [here](http://younglab.wi.mit.edu/super_enhancer_code.html) for more information.

input

label:	BED/narrowPeak file (MACS results)
type:	`data:chipseq:callpeak`
required:	False

input_upload

label:	BED file (Upload)
type:	`data:bed`
required:	False

rankby

label:	BAM File
type:	`data:alignment:bam`
description:	bamfile to rank enhancer by

control

label:	Control BAM File
type:	`data:alignment:bam`
description:	bamfile to rank enhancer by
required:	False

tss

label:	TSS exclusion
type:	`basic:integer`
description:	Enter a distance from TSS to exclude. 0 = no TSS exclusion
default:	`0`

stitch

label:	Stitch
type:	`basic:integer`
description:	Enter a max linking distance for stitching. If not given, optimal stitching parameter will be determined automatically.
required:	False

mask

label:	Masking BED file
type:	`data:bed`
description:	Mask a set of regions from analysis. Provide a BED of masking regions.
required:	False

all_enhancers

label:	All enhancers table
type:	`basic:file`

enhancers_with_super

label:	Super enhancers table
type:	`basic:file`

plot_points

label:	Plot points
type:	`basic:file`

plot_panel

label:	Plot panel
type:	`basic:file`

enhancer_gene

label:	Enhancer to gene
type:	`basic:file`

enhancer_top_gene

label:	Enhancer to top gene
type:	`basic:file`

gene_enhancer

label:	Gene to Enhancer
type:	`basic:file`

stitch_parameter

label:	Stitch parameter
type:	`basic:file`
required:	False

all_output

label:	All output
type:	`basic:file`

scatter_plot

label:	Super-Enhancer plot
type:	`basic:json`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

RSEM¶

data:expression:rsemrsem (data:alignment:bam alignments, basic:string read_type, data:index:expression expression_index, basic:string strandedness)[Source: v1.2.0]

RSEM is a software package for estimating gene and isoform expression levels from RNA-Seq data. The RSEM package supports threads for parallel computation of the EM algorithm, single-end and paired-end read data, quality scores, variable-length reads and RSPD estimation. See [here](https://deweylab.github.io/RSEM/README.html) and the [original paper](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-12-323) for more information.

alignments

label:	Aligned reads
type:	`data:alignment:bam`

read_type

label:	Type of reads
type:	`basic:string`
default:	`se`
choices:	Single-end: `se` Paired-end: `pe`

expression_index

label:	Gene expression indices
type:	`data:index:expression`

strandedness

label:	Strandedness
type:	`basic:string`
default:	`none`
choices:	None: `none` Forward: `forward` Reverse: `reverse`

rc

label:	Read counts
type:	`basic:file`

fpkm

label:	FPKM
type:	`basic:file`

exp

label:	TPM (Transcripts Per Million)
type:	`basic:file`

exp_json

label:	TPM (json)
type:	`basic:json`

exp_set

label:	Expressions
type:	`basic:file`

exp_set_json

label:	Expressions (json)
type:	`basic:json`

genes

label:	Results grouped by gene
type:	`basic:file`

transcripts

label:	Results grouped by transcript
type:	`basic:file`

log

label:	RSEM log
type:	`basic:file`

exp_type

label:	Type of expression
type:	`basic:string`

source

label:	Transcript ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

Reads (QSEQ multiplexed, paired)¶

data:multiplexed:qseq:pairedupload-multiplexed-paired (basic:file reads, basic:file reads2, basic:file barcodes, basic:file annotation)[Source: v1.2.0]

Upload multiplexed NGS reds in QSEQ format.

reads

label:	Multiplexed upstream reads
type:	`basic:file`
description:	NGS reads in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.
required:	True
validate_regex:	`((\.qseq\|\.qseq\.txt)(\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z))\|(\.bz2)$`

reads2

label:	Multiplexed downstream reads
type:	`basic:file`
description:	NGS reads in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.
required:	True
validate_regex:	`((\.qseq\|\.qseq\.txt)(\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z))\|(\.bz2)$`

barcodes

label:	NGS barcodes
type:	`basic:file`
description:	Barcodes in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.
required:	True
validate_regex:	`((\.qseq\|\.qseq\.txt)(\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z))\|(\.bz2)$`

annotation

label:	Barcode mapping
type:	`basic:file`
description:	A tsv file mapping barcodes to experiment name, e.g. “TCGCAGG\tHr00”.
required:	True
validate_regex:	`(\.csv\|\.tsv)$`

qseq_reads

label:	Multiplexed upstream reads
type:	`basic:file`

qseq_reads2

label:	Multiplexed downstream reads
type:	`basic:file`

qseq_barcodes

label:	NGS barcodes
type:	`basic:file`

annotation

label:	Barcode mapping
type:	`basic:file`

matched

label:	Matched
type:	`basic:string`

notmatched

label:	Not matched
type:	`basic:string`

badquality

label:	Bad quality
type:	`basic:string`

skipped

label:	Skipped
type:	`basic:string`

Reads (QSEQ multiplexed, single)¶

data:multiplexed:qseq:singleupload-multiplexed-single (basic:file reads, basic:file barcodes, basic:file annotation)[Source: v1.2.0]

Upload multiplexed NGS reds in QSEQ format.

reads

label:	Multiplexed NGS reads
type:	`basic:file`
description:	NGS reads in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.
required:	True
validate_regex:	`(\.(qseq)(\|\.txt)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z))\|(\.bz2)$`

barcodes

label:	NGS barcodes
type:	`basic:file`
description:	Barcodes in QSeq format. Supported extensions: .qseq.txt.bz2 (preferred), .qseq.* or .qseq.txt.*.
required:	True
validate_regex:	`(\.(qseq)(\|\.txt)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z))\|(\.bz2)$`

annotation

label:	Barcode mapping
type:	`basic:file`
description:	A tsv file mapping barcodes to experiment name, e.g. “TCGCAGG\tHr00”.
required:	True
validate_regex:	`(\.csv\|\.tsv)$`

qseq_reads

label:	Multiplexed NGS reads
type:	`basic:file`

qseq_barcodes

label:	NGS barcodes
type:	`basic:file`

annotation

label:	Barcode mapping
type:	`basic:file`

matched

label:	Matched
type:	`basic:string`

notmatched

label:	Not matched
type:	`basic:string`

badquality

label:	Bad quality
type:	`basic:string`

skipped

label:	Skipped
type:	`basic:string`

SAM header¶

Upload a mapping file header in SAM format.

src

label:	Header (SAM)
type:	`basic:file`
description:	A mapping file header in SAM format.
validate_regex:	`\.(sam)$`

sam

label:	Uploaded file
type:	`basic:file`

SRA data¶

data:sraimport-sra (basic:string sra_accession, basic:boolean show_advanced, basic:integer min_spot_id, basic:integer max_spot_id, basic:integer min_read_len, basic:boolean clip, basic:boolean aligned, basic:boolean unaligned)[Source: v0.2.0]

Import single or paired-end reads from Sequence Read Archive (SRA) via an SRA accession number. SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms.

sra_accession

label:	SRA accession
type:	`basic:string`

show_advanced

label:	Show advanced options
type:	`basic:boolean`
default:	`False`

advanced.min_spot_id

label:	Minimum spot ID
type:	`basic:integer`
required:	False

advanced.max_spot_id

label:	Maximum spot ID
type:	`basic:integer`
required:	False

advanced.min_read_len

label:	Minimum read length
type:	`basic:integer`
required:	False

advanced.clip

label:	Clip adapter sequences
type:	`basic:boolean`
default:	`False`

advanced.aligned

label:	Dump only aligned sequences
type:	`basic:boolean`
default:	`False`

advanced.unaligned

label:	Dump only unaligned sequences
type:	`basic:boolean`
default:	`False`

SRA data (paired-end)¶

data:reads:fastq:pairedimport-sra-paired (basic:string sra_accession, basic:boolean show_advanced, basic:integer min_spot_id, basic:integer max_spot_id, basic:integer min_read_len, basic:boolean clip, basic:boolean aligned, basic:boolean unaligned)[Source: v0.2.0]

Import paired-end reads from Sequence Read Archive (SRA) via an SRA accession number. SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms.

sra_accession

label:	SRA accession
type:	`basic:string`

show_advanced

label:	Show advanced options
type:	`basic:boolean`
default:	`False`

advanced.min_spot_id

label:	Minimum spot ID
type:	`basic:integer`
required:	False

advanced.max_spot_id

label:	Maximum spot ID
type:	`basic:integer`
required:	False

advanced.min_read_len

label:	Minimum read length
type:	`basic:integer`
required:	False

advanced.clip

label:	Clip adapter sequences
type:	`basic:boolean`
default:	`False`

advanced.aligned

label:	Dump only aligned sequences
type:	`basic:boolean`
default:	`False`

advanced.unaligned

label:	Dump only unaligned sequences
type:	`basic:boolean`
default:	`False`

fastq

label:	Reads file (mate 1)
type:	`list:basic:file`

fastq2

label:	Reads file (mate 2)
type:	`list:basic:file`

fastqc_url

label:	Quality control with FastQC (Upstream)
type:	`list:basic:file:html`

fastqc_url2

label:	Quality control with FastQC (Downstream)
type:	`list:basic:file:html`

fastqc_archive

label:	Download FastQC archive (Upstream)
type:	`list:basic:file`

fastqc_archive2

label:	Download FastQC archive (Downstream)
type:	`list:basic:file`

SRA data (single-end)¶

data:reads:fastq:singleimport-sra-single (basic:string sra_accession, basic:boolean show_advanced, basic:integer min_spot_id, basic:integer max_spot_id, basic:integer min_read_len, basic:boolean clip, basic:boolean aligned, basic:boolean unaligned)[Source: v0.2.0]

Import single-end reads from Sequence Read Archive (SRA) via an SRA accession number. SRA stores raw sequencing data and alignment information from high-throughput sequencing platforms.

sra_accession

label:	SRA accession
type:	`basic:string`

show_advanced

label:	Show advanced options
type:	`basic:boolean`
default:	`False`

advanced.min_spot_id

label:	Minimum spot ID
type:	`basic:integer`
required:	False

advanced.max_spot_id

label:	Maximum spot ID
type:	`basic:integer`
required:	False

advanced.min_read_len

label:	Minimum read length
type:	`basic:integer`
required:	False

advanced.clip

label:	Clip adapter sequences
type:	`basic:boolean`
default:	`False`

advanced.aligned

label:	Dump only aligned sequences
type:	`basic:boolean`
default:	`False`

advanced.unaligned

label:	Dump only unaligned sequences
type:	`basic:boolean`
default:	`False`

fastq

label:	Reads file
type:	`list:basic:file`

fastqc_url

label:	Quality control with FastQC
type:	`list:basic:file:html`

fastqc_archive

label:	Download FastQC archive
type:	`list:basic:file`

STAR¶

data:alignment:bam:staralignment-star (data:reads:fastq reads, data:genomeindex:star genome, data:annotation annotation, basic:string exon_name, basic:integer sjdbOverhang, basic:boolean unstranded, basic:boolean noncannonical, basic:boolean chimeric, basic:integer chimSegmentMin, basic:boolean quantmode, basic:boolean singleend, basic:boolean gene_counts, basic:string outFilterType, basic:integer outFilterMultimapNmax, basic:integer outFilterMismatchNmax, basic:decimal outFilterMismatchNoverLmax, basic:integer outFilterScoreMin, basic:decimal outFilterMismatchNoverReadLmax, basic:integer alignSJoverhangMin, basic:integer alignSJDBoverhangMin, basic:integer alignIntronMin, basic:integer alignIntronMax, basic:integer alignMatesGapMax, basic:string alignEndsType, basic:boolean two_pass_mode, basic:string outSAMunmapped, basic:string outSAMattributes, basic:string outSAMattrRGline, basic:string tool_bigwig, basic:integer bin_size_bigwig, basic:boolean star_sort)[Source: v1.10.0]

Spliced Transcripts Alignment to a Reference (STAR) software is based on an alignment algorithm that uses sequential maximum mappable seed search in uncompressed suffix arrays followed by seed clustering and stitching procedure. In addition to unbiased de novo detection of canonical junctions, STAR can discover non-canonical splices and chimeric (fusion) transcripts, and is also capable of mapping full-length RNA sequences. More information can be found in the [STAR manual](http://labshare.cshl.edu/shares/gingeraslab/www-data/dobin/STAR/STAR.posix/doc/STARmanual.pdf) and in the [original paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3530905/).

reads

label:	Reads
type:	`data:reads:fastq`

genome

label:	Indexed reference genome
type:	`data:genomeindex:star`
description:	Genome index prepared by STAR aligner indexing tool.

annotation

label:	Annotation file (GTF/GFF3)
type:	`data:annotation`
description:	Insert known annotations into genome indices at the mapping stage.
required:	False

annotation_options.exon_name

label:	–sjdbGTFfeatureExon
type:	`basic:string`
description:	Feature type in GTF file to be used as exons for building transcripts
default:	`exon`

annotation_options.sjdbOverhang

label:	Junction length (sjdbOverhang)
type:	`basic:integer`
description:	This parameter specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junction database. Ideally, this length should be equal to the ReadLength-1, where ReadLength is the length of the reads. For instance, for Illumina 2x100b paired-end reads, the ideal value is 100-1=99. In case of reads of varying length, the ideal value is max(ReadLength)-1. In most cases, the default value of 100 will work as well as the ideal value.
default:	`100`

unstranded

label:	The data is unstranded
type:	`basic:boolean`
description:	For unstranded RNA-seq data, Cufflinks/Cuffdiff require spliced alignments with XS strand attribute, which STAR will generate with –outSAMstrandField intronMotif option. As required, the XS strand attribute will be generated for all alignments that contain splice junctions. The spliced alignments that have undefined strand (i.e. containing only non-canonical unannotated junctions) will be suppressed. If you have stranded RNA-seq data, you do not need to use any specific STAR options. Instead, you need to run Cufflinks with the library option –library-type options. For example, cufflinks –library-type fr-firststrand should be used for the standard dUTP protocol, including Illumina’s stranded Tru-Seq. This option has to be used only for Cufflinks runs and not for STAR runs.
default:	`False`

noncannonical

label:	Remove non-cannonical junctions (Cufflinks compatibility)
type:	`basic:boolean`
description:	It is recommended to remove the non-canonical junctions for Cufflinks runs using –outFilterIntronMotifs RemoveNoncanonical.
default:	`False`

detect_chimeric.chimeric

label:	Detect chimeric and circular alignments
type:	`basic:boolean`
description:	To switch on detection of chimeric (fusion) alignments (in addition to normal mapping), –chimSegmentMin should be set to a positive value. Each chimeric alignment consists of two “segments”. Each segment is non-chimeric on its own, but the segments are chimeric to each other (i.e. the segments belong to different chromosomes, or different strands, or are far from each other). Both segments may contain splice junctions, and one of the segments may contain portions of both mates. –chimSegmentMin parameter controls the minimum mapped length of the two segments that is allowed. For example, if you have 2x75 reads and used –chimSegmentMin 20, a chimeric alignment with 130b on one chromosome and 20b on the other will be output, while 135 + 15 won’t be.
default:	`False`

detect_chimeric.chimSegmentMin

label:	–chimSegmentMin
type:	`basic:integer`
disabled:	detect_chimeric.chimeric != true
default:	`20`

t_coordinates.quantmode

label:	Output in transcript coordinates
type:	`basic:boolean`
description:	With –quantMode TranscriptomeSAM option STAR will output alignments translated into transcript coordinates in the Aligned.toTranscriptome.out.bam file (in addition to alignments in genomic coordinates in Aligned.*.sam/bam files). These transcriptomic alignments can be used with various transcript quantification software that require reads to be mapped to transcriptome, such as RSEM or eXpress.
default:	`False`

t_coordinates.singleend

label:	Allow soft-clipping and indels
type:	`basic:boolean`
description:	By default, the output satisfies RSEM requirements: soft-clipping or indels are not allowed. Use –quantTranscriptomeBan Singleend to allow insertions, deletions ans soft-clips in the transcriptomic alignments, which can be used by some expression quantification software (e.g. eXpress).
disabled:	t_coordinates.quantmode != true
default:	`False`

t_coordinates.gene_counts

label:	Count reads
type:	`basic:boolean`
description:	With –quantMode GeneCounts option STAR will count number reads per gene while mapping. A read is counted if it overlaps (1nt or more) one and only one gene. Both ends of the paired-end read are checked for overlaps. The counts coincide with those produced by htseq-count with default parameters. ReadsPerGene.out.tab file with 4 columns which correspond to different strandedness options: column 1: gene ID; column 2: counts for unstranded RNA-seq; column 3: counts for the 1st read strand aligned with RNA (htseq-count option -s yes); column 4: counts for the 2nd read strand aligned with RNA (htseq-count option -s reverse).
disabled:	t_coordinates.quantmode != true
default:	`False`

filtering.outFilterType

label:	Type of filtering
type:	`basic:string`
description:	Normal: standard filtering using only current alignment; BySJout: keep only those reads that contain junctions that passed filtering into SJ.out.tab
default:	`Normal`
choices:	Normal: `Normal` BySJout: `BySJout`

filtering.outFilterMultimapNmax

label:	–outFilterMultimapNmax
type:	`basic:integer`
description:	Read alignments will be output only if the read maps fewer than this value, otherwise no alignments will be output (default: 10).
required:	False

filtering.outFilterMismatchNmax

label:	–outFilterMismatchNmax
type:	`basic:integer`
description:	Alignment will be output only if it has fewer mismatches than this value (default: 10).
required:	False

filtering.outFilterMismatchNoverLmax

label:	–outFilterMismatchNoverLmax
type:	`basic:decimal`
description:	Max number of mismatches per pair relative to read length: for 2x100b, max number of mismatches is 0.06*200=8 for the paired read.
required:	False

filtering.outFilterScoreMin

label:	–outFilterScoreMin
type:	`basic:integer`
description:	Alignment will be output only if its score is higher than or equal to this value (default: 0).
required:	False

filtering.outFilterMismatchNoverReadLmax

label:	–outFilterMismatchNoverReadLmax
type:	`basic:decimal`
description:	Alignment will be output only if its ratio of mismatches to read length is less than or equal to this value (default: 1.0).
required:	False

alignment.alignSJoverhangMin

label:	–alignSJoverhangMin
type:	`basic:integer`
description:	Minimum overhang (i.e. block size) for spliced alignments (default: 5).
required:	False

alignment.alignSJDBoverhangMin

label:	–alignSJDBoverhangMin
type:	`basic:integer`
description:	Minimum overhang (i.e. block size) for annotated (sjdb) spliced alignments (default: 3).
required:	False

alignment.alignIntronMin

label:	–alignIntronMin
type:	`basic:integer`
description:	Minimum intron size: genomic gap is considered intron if its length >= alignIntronMin, otherwise it is considered Deletion (default: 21).
required:	False

alignment.alignIntronMax

label:	–alignIntronMax
type:	`basic:integer`
description:	Maximum intron size, if 0, max intron size will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:	False

alignment.alignMatesGapMax

label:	–alignMatesGapMax
type:	`basic:integer`
description:	Maximum gap between two mates, if 0, max intron gap will be determined by (2pow(winBinNbits)*winAnchorDistNbins) (default: 0).
required:	False

alignment.alignEndsType

label:	–alignEndsType
type:	`basic:string`
description:	Type of read ends alignment (default: Local).
required:	False
default:	`Local`
choices:	Local: `Local` EndToEnd: `EndToEnd` Extend5pOfRead1: `Extend5pOfRead1` Extend5pOfReads12: `Extend5pOfReads12`

two_pass_mapping.two_pass_mode

label:	–twopassMode
type:	`basic:boolean`
description:	Perform first-pass mapping, extract junctions, insert them into genome index, and re-map all reads in the second mapping pass.
default:	`False`

output_sam_bam.outSAMunmapped

label:	–outSAMunmapped
type:	`basic:string`
description:	Output of unmapped reads in the SAM format.
required:	False
default:	`None`
choices:	None: `None` Within: `Within`

output_sam_bam.outSAMattributes

label:	–outSAMattributes
type:	`basic:string`
description:	a string of desired SAM attributes, in the order desired for the output SAM.
required:	False
default:	`Standard`
choices:	Standard: `Standard` All: `All` NH HI NM MD: `NH HI NM MD` None: `None`

output_sam_bam.outSAMattrRGline

label:	–outSAMattrRGline
type:	`basic:string`
description:	SAM/BAM read group line. The first word contains the read group identifier and must start with “ID:”, e.g. –outSAMattrRGline ID:xxx CN:yy “DS:z z z”
required:	False

output_sam_bam.tool_bigwig

label:	Tool to calculate BigWig
type:	`basic:string`
description:	Tool to calculate BigWig.
default:	`deeptools`
choices:	deepTools: `deeptools` UCSC BedGraphToBigWig: `bedgraphtobigwig`

output_sam_bam.bin_size_bigwig

label:	Bin Size for the output of BigWig
type:	`basic:integer`
description:	Size of the bins, in bases, for the output of the bigwig. Only possible if ‘Tool to calculate BigWig’ is deepTools. If BigWig is calculated by UCSC BedGraphToBigWig then bin size is 1.
default:	`50`

star_sort

label:	Sorting with STAR
type:	`basic:boolean`
description:	Set to false for sorting with samtools or to true for sorting with STAR which may be time and memory intensive.
default:	`False`

bam

label:	Alignment file
type:	`basic:file`
description:	Position sorted alignment

bai

label:	Index BAI
type:	`basic:file`

unmapped_f

label:	Unmapped reads (mate 1)
type:	`basic:file`
required:	False

unmapped_r

label:	Unmapped reads (mate 2)
type:	`basic:file`
required:	False

sj

label:	Splice junctions
type:	`basic:file`

chimeric

label:	Chimeric alignments
type:	`basic:file`
required:	False

alignment_transcriptome

label:	Alignment (trancriptome coordinates)
type:	`basic:file`
required:	False

gene_counts

label:	Gene counts
type:	`basic:file`
required:	False

stats

label:	Statistics
type:	`basic:file`

bigwig

label:	BigWig file
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

STAR genome index¶

data:genomeindex:staralignment-star-index (data:genome:fasta genome, data:seq:nucleotide genome2, data:annotation annotation, basic:string exon_name, basic:integer sjdbOverhang, basic:integer genomeSAindexNbases, basic:integer genomeChrBinNbits, basic:integer genomeSAsparseD)[Source: v1.6.0]

Generate genome indices files from the supplied reference genome sequence and GTF files.

genome

label:	Reference genome (indexed)
type:	`data:genome:fasta`
required:	False

genome2

label:	Reference genome (nucleotide sequence)
type:	`data:seq:nucleotide`
required:	False

annotation

label:	Annotation file (GTF/GFF3)
type:	`data:annotation`
required:	False

annotation_options.exon_name

label:	–sjdbGTFfeatureExon
type:	`basic:string`
description:	Feature type in GTF file to be used as exons for building transcripts.
default:	`exon`

annotation_options.sjdbOverhang

label:	Junction length (sjdbOverhang)
type:	`basic:integer`
description:	This parameter specifies the length of the genomic sequence around the annotated junction to be used in constructing the splice junction database. Ideally, this length should be equal to the ReadLength-1, where ReadLength is the length of the reads. For instance, for Illumina 2x100b paired-end reads, the ideal value is 100-1=99. In case of reads of varying length, the ideal value is max(ReadLength)-1. In most cases, the default value of 100 will work as well as the ideal value.
default:	`100`

advanced.genomeSAindexNbases

label:	Small genome adjustment
type:	`basic:integer`
description:	For small genomes, the parameter –genomeSAindexNbases needs to be scaled down, with a typical value of min(14, log2(GenomeLength)/2 - 1). For example, for 1 megaBase genome, this is equal to 9, for 100 kiloBase genome, this is equal to 7.
required:	False

advanced.genomeChrBinNbits

label:	Large number of references adjustment
type:	`basic:integer`
description:	If you are using a genome with a large (>5,000) number of references (chrosomes/scaffolds), you may need to reduce the –genomeChrBinNbits to reduce RAM consumption. The following scaling is recommended: –genomeChrBinNbits = min(18, log2(GenomeLength / NumberOfReferences)). For example, for 3 gigaBase genome with 100,000 chromosomes/scaffolds, this is equal to 15.
required:	False

advanced.genomeSAsparseD

label:	Sufflux array sparsity
type:	`basic:integer`
description:	Suffux array sparsity, i.e. distance between indices: use bigger numbers to decrease needed RAM at the cost of mapping speed reduction (integer > 0, default = 1).
required:	False

index

label:	Indexed genome
type:	`basic:dir`

source

label:	Gene ID source
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Salmon Index¶

data:index:salmonsalmon-index (data:seq:nucleotide nucl, data:file decoys, basic:boolean gencode, basic:boolean keep_duplicates, basic:boolean perfect_hash, basic:string source, basic:string species, basic:string build, basic:integer kmerlen)[Source: v1.1.0]

Generate index files for Salmon transcript quantification tool.

nucl

label:	Nucleotide sequence
type:	`data:seq:nucleotide`
description:	A CDS sequence file in .FASTA format.

decoys

label:	Decoys
type:	`data:file`
description:	Treat these sequences as decoys that may have sequence homologous to some known transcript.
required:	False

gencode

label:	Gencode
type:	`basic:boolean`
description:	This flag will expect the input transcript FASTA to be in GENCODE format, and will split the transcript name at the first ‘\|’ character. These reduced names will be used in the output and when looking for these transcripts in a gene to transcript GTF.
default:	`False`

keep_duplicates

label:	Keep duplicates
type:	`basic:boolean`
description:	This flag will disable the default indexing behavior of discarding sequence-identical duplicate transcripts. If this flag is passed, then duplicate transcripts that appear in the input will be retained and quantified separately.
default:	`False`

perfect_hash

label:	Perfect hash
type:	`basic:boolean`
description:	Build the index using a perfect hash rather than a dense hash. This will require less memory (especially during quantification), but will take longer to construct.
default:	`False`

source

label:	Source of attribute ID
type:	`basic:string`
choices:	DICTYBASE: `DICTYBASE` ENSEMBL: `ENSEMBL` NCBI: `NCBI` UCSC: `UCSC`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum`

build

label:	Genome build
type:	`basic:string`

kmerlen

label:	Size of k-mers
type:	`basic:integer`
description:	The size of k-mers that should be used for the quasi index. We find that a k of 31 seems to work well for reads of 75bp or longer, but you might consider a smaller k if you plan to deal with shorter reads.
default:	`31`

index

label:	Salmon index
type:	`basic:dir`

source

label:	Source of attribute ID
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Secondary hybrid BAM file¶

data:alignment:bam:secondaryupload-bam-secondary (data:alignment:bam bam, basic:file src, basic:string species, basic:string build)[Source: v0.6.0]

Upload a secondary mapping file in BAM format.

bam

label:	Hybrid bam
type:	`data:alignment:bam`
description:	Secondary bam will be appended to the same sample where hybrid bam is.
required:	False

src

label:	Mapping (BAM)
type:	`basic:file`
description:	A mapping file in BAM format. The file will be indexed on upload, so additional BAI files are not required.
validate_regex:	`\.(bam)$`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Drosophila melanogaster: `Drosophila melanogaster` Mus musculus: `Mus musculus`

build

label:	Build
type:	`basic:string`

bam

label:	Uploaded file
type:	`basic:file`

bai

label:	Index BAI
type:	`basic:file`

stats

label:	Alignment statistics
type:	`basic:file`

bigwig

label:	BigWig file
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Slamdunk analysis (paired-end)¶

data:workflow:slamdunkworkflow-slamdunk-paired (data:reads:fastq:paired reads, data:seq:nucleotide ref_seq, data:bed regions, basic:boolean show_advanced, basic:string source, basic:boolean filter_multimappers, basic:integer max_alignments, basic:integer read_length)[Source: v1.1.0]

Slamdunk-based pipeline for the analysis of the SLAM-Seq data. Thiol-linked alkylation for the metabolic sequencing of RNA enables the detection of RNA transcription, processing and decay dynamics in the context of total RNA.

reads

label:	Reads
type:	`data:reads:fastq:paired`
description:	Paired-end sequencing reads in FASTQ format.

ref_seq

label:	Reference sequence (FASTA)
type:	`data:seq:nucleotide`

regions

label:	Regions of interest (BED)
type:	`data:bed`

show_advanced

label:	Show advanced parameters
type:	`basic:boolean`
default:	`False`

options.source

label:	Gene ID database source
type:	`basic:string`
default:	`ENSEMBL`
choices:	ENSEMBL: `ENSEMBL` UCSC: `UCSC`

options.filter_multimappers

label:	Filter multimappers
type:	`basic:boolean`
description:	If true, filter and reasign multimappers based on the provided BED file with regions of interest.
default:	`True`

options.max_alignments

label:	Maximum number of multimapper alignments
type:	`basic:integer`
description:	The maximum number of alignments that will be reported for a multi-mapping read (i.e. reads with multiple alignments of equal best scores).
default:	`1`

options.read_length

label:	Maximum read length
type:	`basic:integer`
description:	Maximum length of reads in the input FASTQ file.
default:	`150`

Spike-ins quality control¶

data:spikeinsspikein-qc (list:data:expression samples, basic:string mix)[Source: v1.1.0]

Plot spike-ins measured abundances for samples quality control. The process will output graphs showing the correlation between known concentration of ERCC spike-ins and sample’s measured abundance.

samples

label:	Expressions with spike-ins
type:	`list:data:expression`

mix

label:	Spike-ins mix
type:	`basic:string`
description:	Select spike-ins mix.
choices:	ERCC Mix 1: `ercc_mix1` ERCC Mix 2: `ercc_mix2` SIRV-Set 3: `sirv_set3`

plots

label:	Plot figures
type:	`list:basic:file`
required:	False

report

label:	HTML report with results
type:	`basic:file:html`
required:	False
hidden:	True

report_zip

label:	ZIP file contining HTML report with results
type:	`basic:file`
required:	False

Subread¶

data:alignment:bam:subreadalignment-subread (data:genome:fasta genome, data:reads:fastq reads, basic:integer indel, basic:integer consensus, basic:integer mis_matched_bp, basic:integer cpu_number, basic:boolean multi_mapping, basic:string reads_orientation, basic:integer consensus_subreads)[Source: v2.2.0]

Subread is an accurate and efficient general-purpose read aligner which can align both genomic DNA-seq and RNA-seq reads. It can also be used to discover genomic mutations including short indels and structural variants. See [here](http://subread.sourceforge.net/) and a paper by [Liao and colleagues](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3664803/) (2013) for more information.

genome

label:	Reference genome
type:	`data:genome:fasta`

reads

label:	Reads
type:	`data:reads:fastq`

options.indel

label:	Number of INDEL bases
type:	`basic:integer`
description:	Specify the number of INDEL bases allowed in the mapping.
required:	False
default:	`5`

options.consensus

label:	Consensus threshold
type:	`basic:integer`
description:	Specify the consensus threshold, which is the minimal number of consensus subreads required for reporting a hit.
required:	False
default:	`3`

options.mis_matched_bp

label:	Max number of mis-matched bases
type:	`basic:integer`
description:	Specify the maximum number of mis-matched bases allowed in the alignment.
required:	False
default:	`3`

options.cpu_number

label:	Number of threads/CPUs
type:	`basic:integer`
description:	Specify the number of threads/CPUs used for mapping
required:	False
default:	`1`

options.multi_mapping

label:	Report multi-mapping reads in addition to uniquely mapped reads.
type:	`basic:boolean`
description:	Reads that were found to have more than one best mapping location are going to be reported.
required:	False

PE_options.reads_orientation

label:	reads orientation
type:	`basic:string`
description:	Specify the orientation of the two reads from the same pair.
required:	False
default:	`fr`
choices:	ff: `ff` fr: `fr` rf: `rf`

PE_options.consensus_subreads

label:	Minimum number of consensus subreads
type:	`basic:integer`
description:	Specify the minimum number of consensus subreads both reads from the sam pair must have.
required:	False
default:	`1`

bam

label:	Alignment file
type:	`basic:file`
description:	Position sorted alignment

bai

label:	Index BAI
type:	`basic:file`

unmapped

label:	Unmapped reads
type:	`basic:file`
required:	False

stats

label:	Statistics
type:	`basic:file`

bigwig

label:	BigWig file
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Subsample FASTQ (paired-end)¶

data:reads:fastq:paired:seqtkseqtk-sample-paired (data:reads:fastq:paired reads, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v1.1.0]

[Seqtk](https://github.com/lh3/seqtk) is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The Seqtk “sample” command enables subsampling of the large FASTQ file(s).

reads

label:	Reads
type:	`data:reads:fastq:paired`

n_reads

label:	Number of reads
type:	`basic:integer`
default:	`1000000`

advanced.seed

label:	Seed
type:	`basic:integer`
default:	`11`

advanced.fraction

label:	Fraction
type:	`basic:decimal`
description:	Use the fraction of reads [0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
required:	False

advanced.two_pass

label:	2-pass mode
type:	`basic:boolean`
description:	Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
default:	`False`

fastq

label:	Remaining mate 1 reads
type:	`list:basic:file`

fastq2

label:	Remaining mate 2 reads
type:	`list:basic:file`

fastqc_url

label:	Mate 1 quality control with FastQC
type:	`list:basic:file:html`

fastqc_url2

label:	Mate 2 quality control with FastQC
type:	`list:basic:file:html`

fastqc_archive

label:	Download mate 1 FastQC archive
type:	`list:basic:file`

fastqc_archive2

label:	Download mate 2 FastQC archive
type:	`list:basic:file`

Subsample FASTQ (single-end)¶

data:reads:fastq:single:seqtkseqtk-sample-single (data:reads:fastq:single reads, basic:integer n_reads, basic:integer seed, basic:decimal fraction, basic:boolean two_pass)[Source: v1.1.0]

[Seqtk](https://github.com/lh3/seqtk) is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. The Seqtk “sample” command enables subsampling of the large FASTQ file(s).

reads

label:	Reads
type:	`data:reads:fastq:single`

n_reads

label:	Number of reads
type:	`basic:integer`
default:	`1000000`

advanced.seed

label:	Seed
type:	`basic:integer`
default:	`11`

advanced.fraction

label:	Fraction
type:	`basic:decimal`
description:	Use the fraction of reads [0 - 1.0] from the orignal input file instead of the absolute number of reads. If set, this will override the “Number of reads” input parameter.
required:	False

advanced.two_pass

label:	2-pass mode
type:	`basic:boolean`
description:	Enable two-pass mode when down-sampling. Two-pass mode is twice as slow but with much reduced memory.
default:	`False`

fastq

label:	Remaining reads
type:	`list:basic:file`

fastqc_url

label:	Quality control with FastQC
type:	`list:basic:file:html`

fastqc_archive

label:	Download FastQC archive
type:	`list:basic:file`

Test basic fields¶

data:test:fieldstest-basic-fields (basic:boolean boolean, basic:date date, basic:datetime datetime, basic:decimal decimal, basic:integer integer, basic:string string, basic:text text, basic:url:download url_download, basic:url:view url_view, basic:string string2, basic:string string3, basic:string string4, basic:string string5, basic:string string6, basic:string string7, basic:string tricky2)[Source: v1.1.1]

Test with all basic input fields whose values are printed by the processor and returned unmodified as output fields.

boolean

label:	Boolean
type:	`basic:boolean`
default:	`True`

date

label:	Date
type:	`basic:date`
default:	`2013-12-31`

datetime

label:	Date and time
type:	`basic:datetime`
default:	`2013-12-31 23:59:59`

decimal

label:	Decimal
type:	`basic:decimal`
default:	`-123.456`

integer

label:	Integer
type:	`basic:integer`
default:	`-123`

string

label:	String
type:	`basic:string`
default:	`Foo b-a-r.gz 1.23`

text

label:	Text
type:	`basic:text`
default:	`Foo bar in 3 lines.`

url_download

label:	URL download
type:	`basic:url:download`
default:	`{'url': 'http://www.w3.org/TR/1998/REC-html40-19980424/html40.pdf'}`

url_view

label:	URL view
type:	`basic:url:view`
default:	`{'name': 'Something', 'url': 'http://www.something.com/'}`

group.string2

label:	String 2 required
type:	`basic:string`
description:	String 2 description.
required:	True
disabled:	false
hidden:	false
placeholder:	`Enter string`

group.string3

label:	String 3 disabled
type:	`basic:string`
description:	String 3 description.
disabled:	true
default:	`disabled`

group.string4

label:	String 4 hidden
type:	`basic:string`
description:	String 4 description.
hidden:	True
default:	`hidden`

group.string5

label:	String 5 choices
type:	`basic:string`
description:	String 5 description.
hidden:	False
default:	`choice_2`
choices:	Choice 1: `choice_1` Choice 2: `choice_2` Choice 3: `choice_3`

group.string6

label:	String 6 regex only “Aa”
type:	`basic:string`
default:	`AAaAaaa`
validate_regex:	`^[aA]*$`

group.string7

label:	String 7 optional choices
type:	`basic:string`
description:	String 7 description.
required:	False
hidden:	False
default:	`choice_2`
choices:	Choice 1: `choice_1` Choice 2: `choice_2` Choice 3: `choice_3`

tricky.tricky1.tricky2

label:	Tricky 2
type:	`basic:string`
default:	`true`

output

label:	Result
type:	`basic:url:view`

out_boolean

label:	Boolean
type:	`basic:boolean`

out_date

label:	Date
type:	`basic:date`

out_datetime

label:	Date and time
type:	`basic:datetime`

out_decimal

label:	Decimal
type:	`basic:decimal`

out_integer

label:	Integer
type:	`basic:integer`

out_string

label:	String
type:	`basic:string`

out_text

label:	Text
type:	`basic:text`

out_url_download

label:	URL download
type:	`basic:url:download`

out_url_view

label:	URL view
type:	`basic:url:view`

out_group.string2

label:	String 2 required
type:	`basic:string`
description:	String 2 description.

out_group.string3

label:	String 3 disabled
type:	`basic:string`
description:	String 3 description.

out_group.string4

label:	String 4 hidden
type:	`basic:string`
description:	String 4 description.

out_group.string5

label:	String 5 choices
type:	`basic:string`
description:	String 5 description.

out_group.string6

label:	String 6 regex only “Aa”
type:	`basic:string`

out_group.string7

label:	String 7 optional choices
type:	`basic:string`

out_tricky.tricky1.tricky2

label:	Tricky 2
type:	`basic:string`

Test disabled inputs¶

data:test:disabledtest-disabled (basic:boolean broad, basic:integer broad_width, basic:string width_label, basic:integer if_and_condition)[Source: v1.1.1]

Test disabled input fields.

broad

label:	Broad peaks
type:	`basic:boolean`
default:	`False`

broad_width

label:	Width of peaks
type:	`basic:integer`
disabled:	broad === false
default:	`5`

width_label

label:	Width label
type:	`basic:string`
disabled:	broad === false
default:	`FD`

if_and_condition

label:	If width is 5 and label FDR
type:	`basic:integer`
disabled:	broad_width == 5 && width_label == ‘FDR’
default:	`5`

output

label:	Result
type:	`basic:string`

Test hidden inputs¶

data:test:hiddentest-hidden (basic:boolean broad, basic:integer broad_width, basic:integer parameter1, basic:integer parameter2, basic:integer broad_width2)[Source: v1.1.1]

Test hidden input fields

broad

label:	Broad peaks
type:	`basic:boolean`
default:	`False`

broad_width

label:	Width of peaks
type:	`basic:integer`
hidden:	broad === false
default:	`5`

parameters_broad_f.parameter1

label:	parameter1
type:	`basic:integer`
default:	`10`

parameters_broad_f.parameter2

label:	parameter2
type:	`basic:integer`
default:	`10`

parameters_broad_t.broad_width2

label:	Width of peaks2
type:	`basic:integer`
default:	`5`

output

label:	Result
type:	`basic:string`

Test select controler¶

data:test:resulttest-list (data:test:result single, list:data:test:result multiple)[Source: v1.1.1]

Test with all basic input fields whose values are printed by the processor and returned unmodified as output fields.

single

label:	Single
type:	`data:test:result`

multiple

label:	Multiple
type:	`list:data:test:result`

output

label:	Result
type:	`basic:string`

Test sleep progress¶

data:test:resulttest-sleep-progress (basic:integer t)[Source: v1.1.1]

Test for the progress bar by sleeping 5 times for the specified amount of time.

t

label:	Sleep time
type:	`basic:integer`
default:	`5`

output

label:	Result
type:	`basic:string`

Trimmomatic (paired-end)¶

data:reads:fastq:paired:trimmomatictrimmomatic-paired (data:reads:fastq:paired reads, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer palindrome_clip_threshold, basic:integer min_adapter_length, basic:boolean keep_both_reads, basic:integer window_size, basic:integer required_quality, basic:integer target_length, basic:decimal strictness, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:integer average_quality)[Source: v2.2.0]

Trimmomatic performs a variety of useful trimming tasks including removing adapters for Illumina paired-end and single-end data. FastQC is performed for quality control checks on trimmed raw sequence data, which are the output of Trimmomatic. See [Trimmomatic official website](http://www.usadellab.org/cms/?page=trimmomatic), the [introductory paper](https://www.ncbi.nlm.nih.gov/pubmed/24695404), and the [FastQC official website](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) for more information.

reads

label:	Reads
type:	`data:reads:fastq:paired`

illuminaclip.adapters

label:	Adapter sequences
type:	`data:seq:nucleotide`
description:	Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping. ‘Minimum adapter length’ and ‘Keep both reads’ are optional parameters.
required:	False

illuminaclip.seed_mismatches

label:	Seed mismatches
type:	`basic:integer`
description:	Specifies the maximum mismatch count which will still allow a full match to be performed. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping.
required:	False
disabled:	!illuminaclip.adapters

illuminaclip.simple_clip_threshold

label:	Simple clip threshold
type:	`basic:integer`
description:	Specifies how accurate the match between any adapter etc. sequence must be against a read. This field as well as ‘Adapter sequence’, ‘Seed mismatches’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping.
required:	False
disabled:	!illuminaclip.adapters

illuminaclip.palindrome_clip_threshold

label:	Palindrome clip threshold
type:	`basic:integer`
description:	Specifies how accurate the match between the two ‘adapter ligated’ reads must be for PE palindrome read alignment. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Seed mismatches’ parameters are needed to perform Illuminacliping.
required:	False
disabled:	!illuminaclip.adapters

illuminaclip.min_adapter_length

label:	Minimum adapter length
type:	`basic:integer`
description:	In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed. This field is optional for preforming Illuminaclip. ‘Adapter sequences’, ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ are also needed in order to use this parameter.
disabled:	!illuminaclip.seed_mismatches && !illuminaclip.simple_clip_threshold && !illuminaclip.palindrome_clip_threshold
default:	`8`

illuminaclip.keep_both_reads

label:	Keep both reads
type:	`basic:boolean`
description:	After read-though has been detected by palindrome mode, and the adapter sequence removed, the reverse read contains the same sequence information as the forward read, albeit in reverse complement. For this reason, the default behaviour is to entirely drop the reverse read.By specifying this parameter, the reverse read will also be retained, which may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads. This field is optional for preforming Illuminaclip. ‘Adapter sequence’, ‘Seed mismatches’, ‘Simple clip threshold’, ‘Palindrome clip threshold’ and also ‘Minimum adapter length’ are needed in order to use this parameter.
required:	False
disabled:	!illuminaclip.seed_mismatches && !illuminaclip.simple_clip_threshold && !illuminaclip.palindrome_clip_threshold && !illuminaclip.min_adapter_length

slidingwindow.window_size

label:	Window size
type:	`basic:integer`
description:	Specifies the number of bases to average across. This field as well as ‘Required quality’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).
required:	False

slidingwindow.required_quality

label:	Required quality
type:	`basic:integer`
description:	Specifies the average quality required. This field as well as ‘Window size’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).
required:	False

maxinfo.target_length

label:	Target length
type:	`basic:integer`
description:	This specifies the read length which is likely to allow the location of the read within the target sequence to be determined. This field as well as ‘Strictness’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).
required:	False

maxinfo.strictness

label:	Strictness
type:	`basic:decimal`
description:	This value, which should be set between 0 and 1, specifies the balance between preserving as much read length as possible vs. removal of incorrect bases. A low value of this parameter (<0.2) favours longer reads, while a high value (>0.8) favours read correctness. This field as well as ‘Target length’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).
required:	False

trim_bases.leading

label:	Leading quality
type:	`basic:integer`
description:	Remove low quality bases from the beginning. Specifies the minimum quality required to keep a base.
required:	False

trim_bases.trailing

label:	Trailing
type:	`basic:integer`
description:	Remove low quality bases from the end. Specifies the minimum quality required to keep a base.
required:	False

trim_bases.crop

label:	Crop
type:	`basic:integer`
description:	Cut the read to a specified length by removing bases from the end.
required:	False

trim_bases.headcrop

label:	Headcrop
type:	`basic:integer`
description:	Cut the specified number of bases from the start of the read.
required:	False

reads_filtering.minlen

label:	Minimum length
type:	`basic:integer`
description:	Drop the read if it is below a specified length.
required:	False

reads_filtering.average_quality

label:	Average quality
type:	`basic:integer`
description:	Drop the read if the average quality is below the specified level.
required:	False

fastq

label:	Reads file (mate 1)
type:	`list:basic:file`

fastq_unpaired

label:	Reads file
type:	`basic:file`
required:	False

fastq2

label:	Reads file (mate 2)
type:	`list:basic:file`

fastq2_unpaired

label:	Reads file
type:	`basic:file`
required:	False

fastqc_url

label:	Quality control with FastQC (Upstream)
type:	`list:basic:file:html`

fastqc_url2

label:	Quality control with FastQC (Downstream)
type:	`list:basic:file:html`

fastqc_archive

label:	Download FastQC archive (Upstream)
type:	`list:basic:file`

fastqc_archive2

label:	Download FastQC archive (Downstream)
type:	`list:basic:file`

Trimmomatic (single-end)¶

data:reads:fastq:single:trimmomatictrimmomatic-single (data:reads:fastq:single reads, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer window_size, basic:integer required_quality, basic:integer target_length, basic:decimal strictness, basic:integer leading, basic:integer trailing, basic:integer crop, basic:integer headcrop, basic:integer minlen, basic:integer average_quality)[Source: v2.2.0]

Trimmomatic performs a variety of useful trimming tasks including removing adapters for Illumina paired-end and single-end data. FastQC is performed for quality control checks on trimmed raw sequence data, which are the output of Trimmomatic. See [Trimmomatic official website](http://www.usadellab.org/cms/?page=trimmomatic), the [introductory paper](https://www.ncbi.nlm.nih.gov/pubmed/24695404), and the [FastQC official website](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) for more information.

reads

label:	Reads
type:	`data:reads:fastq:single`

illuminaclip.adapters

label:	Adapter sequences
type:	`data:seq:nucleotide`
description:	Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’ and ‘Simple clip threshold’ parameters are needed to perform Illuminacliping.
required:	False

illuminaclip.seed_mismatches

label:	Seed mismatches
type:	`basic:integer`
description:	Specifies the maximum mismatch count which will still allow a full match to be performed. This field as well as ‘Adapter sequences’ and ‘Simple clip threshold’ parameter are needed to perform Illuminacliping.
required:	False
disabled:	!illuminaclip.adapters

illuminaclip.simple_clip_threshold

label:	Simple clip threshold
type:	`basic:integer`
description:	Specifies how accurate the match between any adapter etc. sequence must be against a read. This field as well as ‘Adapter sequences’ and ‘Seed mismatches’ parameter are needed to perform Illuminacliping.
required:	False
disabled:	!illuminaclip.adapters

slidingwindow.window_size

label:	Window size
type:	`basic:integer`
description:	Specifies the number of bases to average across. This field as well as ‘Required quality’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).
required:	False

slidingwindow.required_quality

label:	Required quality
type:	`basic:integer`
description:	Specifies the average quality required in window size. This field as well as ‘Window size’ are needed to perform a ‘Sliding window’ trimming (cutting once the average quality within the window falls below a threshold).
required:	False

maxinfo.target_length

label:	Target length
type:	`basic:integer`
description:	This specifies the read length which is likely to allow the location of the read within the target sequence to be determined. This field as well as ‘Strictness’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).
required:	False

maxinfo.strictness

label:	Strictness
type:	`basic:decimal`
description:	This value, which should be set between 0 and 1, specifies the balance between preserving as much read length as possible vs. removal of incorrect bases. A low value of this parameter (<0.2) favours longer reads, while a high value (>0.8) favours read correctness. This field as well as ‘Target length’ are needed to perform ‘Maxinfo’ feature (an adaptive quality trimmer which balances read length and error rate to maximise the value of each read).
required:	False

trim_bases.leading

label:	Leading quality
type:	`basic:integer`
description:	Remove low quality bases from the beginning, if below a threshold quality.
required:	False

trim_bases.trailing

label:	Trailing quality
type:	`basic:integer`
description:	Remove low quality bases from the end, if below a threshold quality.
required:	False

trim_bases.crop

label:	Crop
type:	`basic:integer`
description:	Cut the read to a specified length by removing bases from the end.
required:	False

trim_bases.headcrop

label:	Headcrop
type:	`basic:integer`
description:	Cut the specified number of bases from the start of the read.
required:	False

reads_filtering.minlen

label:	Minimum length
type:	`basic:integer`
description:	Drop the read if it is below a specified length.
required:	False

reads_filtering.average_quality

label:	Average quality
type:	`basic:integer`
description:	Drop the read if the average quality is below the specified level.
required:	False

fastq

label:	Reads file
type:	`list:basic:file`

fastqc_url

label:	Quality control with FastQC
type:	`list:basic:file:html`

fastqc_archive

label:	Download FastQC archive
type:	`list:basic:file`

Trimmomatic - HISAT2 - HTSeq-count (paired-end)¶

data:workflow:rnaseq:htseqworkflow-rnaseq-paired (data:reads:fastq:paired reads, data:genome:fasta genome, data:annotation:gtf annotation, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer palindrome_clip_threshold, basic:integer simple_clip_threshold, basic:integer minlen, basic:integer trailing, basic:string stranded, basic:string id_attribute)[Source: v1.0.1]

This RNA-seq pipeline is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by __Trimmomatic__ which performs a variety of useful trimming tasks including removing adapters for Illumina paired-end and single-end high-throughput sequencing reads. Next, preprocessed reads are aligned by __HISAT2__ aligner. HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __HTSeq-count__. Compared to featureCounts, HTSeq-count is not as computationally efficient. All three tools in this workflow support parallelization to accelerate the analysis.

reads

label:	Input reads
type:	`data:reads:fastq:paired`

genome

label:	Genome
type:	`data:genome:fasta`

annotation

label:	Annotation (GTF)
type:	`data:annotation:gtf`

adapters

label:	Adapter sequences (FASTA)
type:	`data:seq:nucleotide`
required:	False

illuminaclip.seed_mismatches

label:	Seed mismatches
type:	`basic:integer`
description:	Specifies the maximum mismatch count which will still allow a full match to be performed.
default:	`2`

illuminaclip.palindrome_clip_threshold

label:	Palindrome clip threshold
type:	`basic:integer`
description:	Specifies how accurate the match between the two ‘adapter ligated’ reads must be for PE palindrome read alignment.
default:	`30`

illuminaclip.simple_clip_threshold

label:	Simple clip threshold
type:	`basic:integer`
description:	Specifies how accurate the match between any adapter etc. sequence must be against a read.
default:	`10`

minlen

label:	Min length
type:	`basic:integer`
description:	Drop the read if it is below a specified length.
default:	`10`

trailing

label:	Trailing quality
type:	`basic:integer`
description:	Remove low quality bases from the end. Specifies the minimum quality required to keep a base.
default:	`28`

stranded

label:	Is data from a strand specific assay?
type:	`basic:string`
description:	In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
default:	`no`
choices:	Strand non-specific: `no` Strand-specific forward: `yes` Strand-specific reverse: `reverse`

id_attribute

label:	ID attribute
type:	`basic:string`
description:	GFF attribute to be used as feature ID. Several GFF lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identity the counts in the output table.
default:	`gene_id`

Trimmomatic - HISAT2 - HTSeq-count (single-end)¶

data:workflow:rnaseq:htseqworkflow-rnaseq-single (data:reads:fastq:single reads, data:genome:fasta genome, data:annotation:gtf annotation, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer minlen, basic:integer trailing, basic:string stranded, basic:string id_attribute)[Source: v1.0.1]

This RNA-seq pipeline is comprised of three steps, preprocessing, alignment, and quantification. First, reads are preprocessed by __Trimmomatic__ which performs a variety of useful trimming tasks including removing adapters for Illumina paired-end and single-end high-throughput sequencing reads. Next, preprocessed reads are aligned by __HISAT2__ aligner. HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads For more information see [this comparison of RNA-seq aligners](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5792058/). Finally, aligned reads are summarized to genes by __HTSeq-count__. Compared to featureCounts, HTSeq-count is not as computationally efficient. All three tools in this workflow support parallelization to accelerate the analysis.

reads

label:	Input reads
type:	`data:reads:fastq:single`

genome

label:	Genome
type:	`data:genome:fasta`

annotation

label:	Annotation (GTF)
type:	`data:annotation:gtf`

adapters

label:	Adapter sequences (FASTA)
type:	`data:seq:nucleotide`
required:	False

illuminaclip.seed_mismatches

label:	Seed mismatches
type:	`basic:integer`
description:	Specifies the maximum mismatch count which will still allow a full match to be performed.
default:	`2`

illuminaclip.simple_clip_threshold

label:	Simple clip threshold
type:	`basic:integer`
description:	Specifies how accurate the match between any adapter etc. sequence must be against a read.
default:	`10`

minlen

label:	Minimum length
type:	`basic:integer`
description:	Drop the read if it is below a specified length.
default:	`10`

trailing

label:	Trailing quality
type:	`basic:integer`
description:	Remove low quality bases from the end. Specifies the minimum quality required to keep a base.
default:	`28`

stranded

label:	Is data from a strand specific assay?
type:	`basic:string`
description:	In strand non-specific assay a read is considered overlapping with a feature regardless of whether it is mapped to the same or the opposite strand as the feature. In strand-specific forward assay and single reads, the read has to be mapped to the same strand as the feature. For paired-end reads, the first read has to be on the same strand and the second read on the opposite strand. In strand-specific reverse assay these rules are reversed.
default:	`no`
choices:	Strand non-specific: `no` Strand-specific forward: `yes` Strand-specific reverse: `reverse`

id_attribute

label:	ID attribute
type:	`basic:string`
description:	GFF attribute to be used as feature ID. Several GFF lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identity the counts in the output table.
default:	`gene_id`

Upload Picard CollectTargetedPcrMetrics¶

data:picard:coverage:uploadupload-picard-pcrmetrics (basic:file target_pcr_metrics, basic:file target_coverage)[Source: v1.1.1]

Upload Picard CollectTargetedPcrMetrics result files.

target_pcr_metrics

label:	Target PCR metrics
type:	`basic:file`

target_coverage

label:	Target coverage
type:	`basic:file`

target_pcr_metrics

label:	Target PCR metrics
type:	`basic:file`

target_coverage

label:	Target coverage
type:	`basic:file`

VCF file¶

data:variants:vcfupload-variants-vcf (basic:file src, basic:string species, basic:string build)[Source: v2.1.1]

Upload variants in VCF format.

src

label:	Variants (VCF)
type:	`basic:file`
description:	Variants in VCF format.
required:	True
validate_regex:	`\.(vcf)(\|\.gz\|\.bz2\|\.tgz\|\.tar\.gz\|\.tar\.bz2\|\.zip\|\.rar\|\.7z)$`

species

label:	Species
type:	`basic:string`
description:	Species latin name.
choices:	Homo sapiens: `Homo sapiens` Mus musculus: `Mus musculus` Rattus norvegicus: `Rattus norvegicus` Dictyostelium discoideum: `Dictyostelium discoideum` Odocoileus virginianus texanus: `Odocoileus virginianus texanus` Solanum tuberosum: `Solanum tuberosum`

build

label:	Genome build
type:	`basic:string`

vcf

label:	Uploaded file
type:	`basic:file`

tbi

label:	Tabix index
type:	`basic:file`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Variant calling (CheMut)¶

data:variants:vcf:chemutvc-chemut (data:genome:fasta genome, list:data:alignment:bam parental_strains, list:data:alignment:bam mutant_strains, basic:boolean br_and_ind_ra, basic:boolean dbsnp, data:variants:vcf known_sites, list:data:variants:vcf known_indels, basic:string PL, basic:string LB, basic:string PU, basic:string CN, basic:date DT, basic:integer stand_emit_conf, basic:integer stand_call_conf, basic:integer ploidy, basic:string glm, list:basic:string intervals, basic:boolean rf)[Source: v1.2.2]

“CheMut varint calling using multiple BAM input files. Note: Usage of Genome Analysis Toolkit requires a licence.”

genome

label:	Reference genome
type:	`data:genome:fasta`

parental_strains

label:	Parental strains
type:	`list:data:alignment:bam`

mutant_strains

label:	Mutant strains
type:	`list:data:alignment:bam`

br_and_ind_ra

label:	Do variant base recalibration and indel realignment
type:	`basic:boolean`
default:	`False`

dbsnp

label:	Use dbSNP file
type:	`basic:boolean`
description:	rsIDs from this file are used to populate the ID column of the output. Also, the DB INFO flag will be set when appropriate. dbSNP is not used in any way for the calculations themselves.
default:	`False`

known_sites

label:	Known sites (dbSNP)
type:	`data:variants:vcf`
required:	False
hidden:	br_and_ind_ra === false && dbsnp === false

known_indels

label:	Known indels
type:	`list:data:variants:vcf`
required:	False
hidden:	br_and_ind_ra === false

reads_info.PL

label:	Platform/technology
type:	`basic:string`
description:	Platform/technology used to produce the reads.
default:	`Illumina`
choices:	Capillary: `Capillary` Ls454: `Ls454` Illumina: `Illumina` SOLiD: `SOLiD` Helicos: `Helicos` IonTorrent: `IonTorrent` Pacbio: `Pacbio`

reads_info.LB

label:	Library
type:	`basic:string`
default:	`x`

reads_info.PU

label:	Platform unit
type:	`basic:string`
description:	Platform unit (e.g. flowcell-barcode.lane for Illumina or slide for SOLiD). Unique identifier.
default:	`x`

reads_info.CN

label:	Sequencing center
type:	`basic:string`
description:	Name of sequencing center producing the read.
default:	`x`

reads_info.DT

label:	Date
type:	`basic:date`
description:	Date the run was produced.
default:	`2017-01-01`

Varc_param.stand_emit_conf

label:	Emission confidence threshold
type:	`basic:integer`
description:	The minimum confidence threshold (phred-scaled) at which the program should emit sites that appear to be possibly variant.
default:	`10`

Varc_param.stand_call_conf

label:	Calling confidence threshold
type:	`basic:integer`
description:	The minimum confidence threshold (phred-scaled) at which the program should emit variant sites as called. If a site’s associated genotype has a confidence score lower than the calling threshold, the program will emit the site as filtered and will annotate it as LowQual. This threshold separates high confidence calls from low confidence calls.
default:	`30`

Varc_param.ploidy

label:	Sample ploidy
type:	`basic:integer`
description:	Ploidy (number of chromosomes) per sample. For pooled data, set to (Number of samples in each pool * Sample Ploidy).
default:	`2`

Varc_param.glm

label:	Genotype likelihoods model
type:	`basic:string`
description:	Genotype likelihoods calculation model to employ – SNP is the default option, while INDEL is also available for calling indels and BOTH is available for calling both together.
default:	`SNP`
choices:	SNP: `SNP` INDEL: `INDEL` BOTH: `BOTH`

Varc_param.intervals

label:	Intervals
type:	`list:basic:string`
description:	Use this option to perform the analysis over only part of the genome. This argument can be specified multiple times. You can use samtools-style intervals (e.g. -L chr1 or -L chr1:100-200).
required:	False

Varc_param.rf

label:	ReasignOneMappingQuality Filter
type:	`basic:boolean`
description:	This read transformer will change a certain read mapping quality to a different value without affecting reads that have other mapping qualities. This is intended primarily for users of RNA-Seq data handling programs such as TopHat, which use MAPQ = 255 to designate uniquely aligned reads. According to convention, 255 normally designates “unknown” quality, and most GATK tools automatically ignore such reads. By reassigning a different mapping quality to those specific reads, users of TopHat and other tools can circumvent this problem without affecting the rest of their dataset.
default:	`False`

vcf

label:	Called variants file
type:	`basic:file`

tbi

label:	Tabix index
type:	`basic:file`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

Variant filtering (CheMut)¶

data:variants:vcf:filteringfiltering-chemut (data:variants:vcf variants, basic:string analysis_type, basic:string parental_strain, basic:string mutant_strain, basic:integer read_depth)[Source: v1.4.0]

Filtering and annotation of Variant Calling data - Chemical mutagenesis in _Dictyostelium discoideum_.

variants

label:	Variants file (VCF)
type:	`data:variants:vcf`

analysis_type

label:	Analysis type
type:	`basic:string`
description:	Choice of the analysis type. Use “SNV” or “INDEL” options for the analysis of haploid VCF files prepared by using GATK UnifiedGenotyper -glm option “SNP” or “INDEL”, respectively. Choose options SNV_CHR2 or INDEL_CHR2 to run the GATK analysis only on the diploid portion of CHR2 (-ploidy 2 -L chr2:2263132-3015703).
default:	`snv`
choices:	SNV: `snv` INDEL: `indel` SNV_CHR2: `snv_chr2` INDEL_CHR2: `indel_chr2`

parental_strain

label:	Parental Strain Prefix
type:	`basic:string`
default:	`parental`

mutant_strain

label:	Mutant Strain Prefix
type:	`basic:string`
default:	`mut`

read_depth

label:	Read Depth Cutoff
type:	`basic:integer`
default:	`5`

summary

label:	Summary
type:	`basic:file`
description:	Summarize the input parameters and results.

vcf

label:	Variants
type:	`basic:file`
description:	A genome VCF file of variants that passed the filters.

tbi

label:	Tabix index
type:	`basic:file`

variants_filtered

label:	Variants filtered
type:	`basic:file`
description:	A data frame of variants that passed the filters.
required:	False

variants_filtered_alt

label:	Variants filtered (multiple alt. alleles)
type:	`basic:file`
description:	A data frame of variants that contain more than two alternative alleles. These variants are likely to be false positives.
required:	False

gene_list_all

label:	Gene list (all)
type:	`basic:file`
description:	Genes that are mutated at least once.
required:	False

gene_list_top

label:	Gene list (top)
type:	`basic:file`
description:	Genes that are mutated at least twice.
required:	False

mut_chr

label:	Mutations (by chr)
type:	`basic:file`
description:	List mutations in individual chromosomes.
required:	False

mut_strain

label:	Mutations (by strain)
type:	`basic:file`
description:	List mutations in individual strains.
required:	False

strain_by_gene

label:	Strain (by gene)
type:	`basic:file`
description:	List mutants that carry mutations in individual genes.
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

WALT¶

data:alignment:mr:waltwalt (data:genome:fasta genome, data:reads:fastq reads, basic:boolean rm_dup, basic:integer mismatch, basic:integer number)[Source: v1.0.2]

WALT (Wildcard ALignment Tool) is a read mapping program for bisulfite sequencing in DNA methylation studies.

genome

label:	Reference genome
type:	`data:genome:fasta`

reads

label:	Reads
type:	`data:reads:fastq`

rm_dup

label:	Remove duplicates
type:	`basic:boolean`
default:	`True`

mismatch

label:	Maximum allowed mismatches
type:	`basic:integer`
required:	False

number

label:	Number of reads to map in one loop
type:	`basic:integer`
description:	Sets the number of reads to mapping in each loop. Larger number results in program taking more memory. This is especially evident in paired-end mapping.
required:	False

mr

label:	Alignment file
type:	`basic:file`
description:	Position sorted alignment

stats

label:	Statistics
type:	`basic:file`

unmapped_f

label:	Unmapped reads (mate 1)
type:	`basic:file`
required:	False

unmapped_r

label:	Unmapped reads (mate 2)
type:	`basic:file`
required:	False

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

WGBS¶

data:workflow:wgbsworkflow-wgbs (data:reads:fastq reads, data:genome:fasta genome, basic:boolean rm_dup, basic:integer mismatch, basic:integer number, basic:boolean cpgs, basic:boolean symmetric_cpgs)[Source: v1.0.2]

This WGBS pipeline is comprised of three steps - alignment, computation of methylation levels, and identification of hypo-methylated regions (HMRs). First, reads are aligned by __WALT__ aligner. [WALT (Wildcard ALignment Tool)](https://github.com/smithlabcode/walt) is fast and accurate read mapping for bisulfite sequencing. Then, methylation level at each genomic cytosine is calculated using __methcounts__. Finally, hypo-methylated regions are identified using __hmr__. Both methcounts and hmr are part of [MethPipe](http://smithlabresearch.org/software/methpipe/) package.

reads

label:	Select sample(s)
type:	`data:reads:fastq`

genome

label:	Genome
type:	`data:genome:fasta`

alignment.rm_dup

label:	Remove duplicates
type:	`basic:boolean`
default:	`True`

alignment.mismatch

label:	Maximum allowed mismatches
type:	`basic:integer`
default:	`6`

alignment.number

label:	Number of reads to map in one loop
type:	`basic:integer`
description:	Sets the number of reads to mapping in each loop. Larger number results in program taking more memory. This is especially evident in paired-end mapping.
required:	False

methcounts.cpgs

label:	Only CpG context sites
type:	`basic:boolean`
description:	Output file will contain methylation data for CpG context sites only. Choosing this option will result in CpG content report only.
disabled:	methcounts.symmetric_cpgs
default:	`False`

methcounts.symmetric_cpgs

label:	Merge CpG pairs
type:	`basic:boolean`
description:	Merging CpG pairs results in symmetric methylation levels. Methylation is usually symmetric (cytosines at CpG sites were methylated on both DNA strands). Choosing this option will only keep the CpG sites data.
disabled:	methcounts.cpgs
default:	`True`

Whole exome sequencing (WES) analysis¶

data:workflow:wesworkflow-wes (data:reads:fastq:paired reads, data:genome:fasta genome, list:data:variants:vcf known_sites, data:bed intervals, data:variants:vcf hc_dbsnp, basic:string validation_stringency, data:seq:nucleotide adapters, basic:integer seed_mismatches, basic:integer simple_clip_threshold, basic:integer min_adapter_length, basic:integer palindrome_clip_threshold, basic:integer leading, basic:integer trailing, basic:integer minlen, basic:integer seed_l, basic:integer band_w, basic:boolean m, basic:decimal re_seeding, basic:integer match, basic:integer mismatch, basic:integer gap_o, basic:integer gap_e, basic:integer clipping, basic:integer unpaired_p, basic:integer report_tr, data:bedpe bedpe, basic:boolean skip, basic:boolean md_skip, basic:boolean md_remove_duplicates, basic:string md_assume_sort_order, basic:string read_group, basic:integer stand_call_conf, basic:integer mbq)[Source: v2.1.0]

Whole exome sequencing pipeline analyzes Illumina panel data. It consists of trimming, aligning, soft clipping, (optional) marking of duplicates, recalibration of base quality scores and finally, calling of variants. The tools used are Trimmomatic which performs trimming. Aligning is performed using BWA (mem). Soft clipping of Illumina primer sequences is done using bamclipper tool. Marking of duplicates (MarkDuplicates), recalibration of base quality scores (ApplyBQSR) and calling of variants (HaplotypeCaller) is done using GATK4 bundle of bioinformatics tools. To successfully run this pipeline, you will need a genome (FASTA), paired-end (FASTQ) files, BEDPE file for bamclipper, known sites of variation (dbSNP) (VCF), dbSNP database of variations (can be the same as known sites of variation), intervals on which target capture was done (BED) and illumina adapter sequences (FASTA). Make sure that specified resources match the genome used in the alignment step. Result is a file of called variants (VCF).

reads

label:	Raw untrimmed reads
type:	`data:reads:fastq:paired`
description:	Raw paired-end reads.
required:	True

genome

label:	Reference genome
type:	`data:genome:fasta`
description:	Against which genome to align. Further processes depend on this genome (e.g. BQSR step).
required:	True

known_sites

label:	Known sites of variation used in BQSR
type:	`list:data:variants:vcf`
description:	Known sites of variation as a VCF file.
required:	True

intervals

label:	Intervals
type:	`data:bed`
description:	Use intervals to narrow the analysis to defined regions. This usually help cutting down on process time.
required:	True

hc_dbsnp

label:	dbSNP for GATK4’s HaplotypeCaller
type:	`data:variants:vcf`
description:	dbSNP database of variants for variant calling.
required:	True

validation_stringency

label:	Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded. Default is STRICT. This setting is used in BaseRecalibrator and ApplyBQSR processes.
type:	`basic:string`
default:	`STRICT`
choices:	STRICT: `STRICT` SILENT: `SILENT` LENIENT: `LENIENT`

advanced.trimming.adapters

label:	Adapter sequences
type:	`data:seq:nucleotide`
description:	Adapter sequence in FASTA format that will be removed from the read. This field as well as ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping. ‘Minimum adapter length’ and ‘Keep both reads’ are optional parameters.
required:	False

advanced.trimming.seed_mismatches

label:	Seed mismatches
type:	`basic:integer`
description:	Specifies the maximum mismatch count which will still allow a full match to be performed. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ parameters are needed to perform Illuminacliping.
required:	False
disabled:	!advanced.trimming.adapters

advanced.trimming.simple_clip_threshold

label:	Simple clip threshold
type:	`basic:integer`
description:	Specifies how accurate the match between any adapter etc. sequence must be against a read. This field as well as ‘Adapter sequences’ and ‘Seed mismatches’ parameter are needed to perform Illuminacliping.
required:	False
disabled:	!advanced.trimming.adapters

advanced.trimming.min_adapter_length

label:	Minimum adapter length
type:	`basic:integer`
description:	In addition to the alignment score, palindrome mode can verify that a minimum length of adapter has been detected. If unspecified, this defaults to 8 bases, for historical reasons. However, since palindrome mode has a very low false positive rate, this can be safely reduced, even down to 1, to allow shorter adapter fragments to be removed. This field is optional for preforming Illuminaclip. ‘Adapter sequences’, ‘Seed mismatches’, ‘Simple clip threshold’ and ‘Palindrome clip threshold’ are also needed in order to use this parameter.
disabled:	!advanced.trimming.seed_mismatches && !advanced.trimming.simple_clip_threshold && !advanced.trimming.palindrome_clip_threshold
default:	`8`

advanced.trimming.palindrome_clip_threshold

label:	Palindrome clip threshold
type:	`basic:integer`
description:	Specifies how accurate the match between the two ‘adapter ligated’ reads must be for PE palindrome read alignment. This field as well as ‘Adapter sequence’, ‘Simple clip threshold’ and ‘Seed mismatches’ parameters are needed to perform Illuminaclipping.
required:	False
disabled:	!advanced.trimming.adapters

advanced.trimming.leading

label:	Leading quality
type:	`basic:integer`
description:	Remove low quality bases from the beginning, if below a threshold quality.
required:	False

advanced.trimming.trailing

label:	Trailing quality
type:	`basic:integer`
description:	Remove low quality bases from the end, if below a threshold quality.
required:	False

advanced.trimming.minlen

label:	Minimum length
type:	`basic:integer`
description:	Drop the read if it is below a specified length.
required:	False

advanced.align.seed_l

label:	Minimum seed length
type:	`basic:integer`
description:	Minimum seed length. Matches shorter than minimum seed length will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates 20.
default:	`19`

advanced.align.band_w

label:	Band width
type:	`basic:integer`
description:	Gaps longer than this will not be found.
default:	`100`

advanced.align.m

label:	Mark shorter split hits as secondary
type:	`basic:boolean`
description:	Mark shorter split hits as secondary (for Picard compatibility)
default:	`False`

advanced.align.re_seeding

label:	Re-seeding factor
type:	`basic:decimal`
description:	Trigger re-seeding for a MEM longer than minSeedLen*FACTOR. This is a key heuristic parameter for tuning the performance. Larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.
default:	`1.5`

advanced.align.scoring.match

label:	Score of a match
type:	`basic:integer`
default:	`1`

advanced.align.scoring.mismatch

label:	Mismatch penalty
type:	`basic:integer`
default:	`4`

advanced.align.scoring.gap_o

label:	Gap open penalty
type:	`basic:integer`
default:	`6`

advanced.align.scoring.gap_e

label:	Gap extension penalty
type:	`basic:integer`
default:	`1`

advanced.align.scoring.clipping

label:	Clipping penalty
type:	`basic:integer`
description:	Clipping is applied if final alignment score is smaller than (best score reaching the end of query) - (Clipping penalty)
default:	`5`

advanced.align.scoring.unpaired_p

label:	Penalty for an unpaired read pair
type:	`basic:integer`
description:	Affinity to force pair. Score: scoreRead1+scoreRead2-Penalty
default:	`9`

advanced.align.report_tr

label:	Report threshold score
type:	`basic:integer`
description:	Don’t output alignment with score lower than defined number. This option only affects output.
default:	`30`

advanced.bamclipper.bedpe

label:	BEDPE file used for clipping using Bamclipper
type:	`data:bedpe`
description:	BEDPE file used for clipping using Bamclipper tool.
required:	False

advanced.bamclipper.skip

label:	Skip Bamclipper step
type:	`basic:boolean`
description:	Use this option to skip Bamclipper step.
default:	`False`

advanced.markduplicates.md_skip

label:	Skip GATK’s MarkDuplicates step
type:	`basic:boolean`
default:	`False`

advanced.markduplicates.md_remove_duplicates

label:	Remove found duplicates
type:	`basic:boolean`
default:	`False`

advanced.markduplicates.md_assume_sort_order

label:	Assume sort oder
type:	`basic:string`
default:
choices:	as in BAM header (default): unsorted: `unsorted` queryname: `queryname` coordinate: `coordinate` duplicate: `duplicate` unknown: `unknown`

advanced.bqsr.read_group

label:	Read group (@RG)
type:	`basic:string`
description:	If BAM file has not been prepared using a @RG tag, you can add it here. This argument enables the user to replace all read groups in the INPUT file with a single new read group and assign all reads to this read group in the OUTPUT BAM file. Addition or replacement is performed using Picard’s AddOrReplaceReadGroups tool. Input should take the form of -name=value delimited by a \t, e.g. “-ID=1\t-PL=Illumina\t-SM=sample_1”. See AddOrReplaceReadGroups documentation for more information on tag names. Note that PL, LB, PU and SM are required fields. See caveats of rewriting read groups in the documentation linked above.
required:	False

advanced.hc.stand_call_conf

label:	Min call confidence threshold
type:	`basic:integer`
description:	The minimum phred-scaled confidence threshold at which variants should be called.
default:	`20`

advanced.hc.mbq

label:	Min Base Quality
type:	`basic:integer`
description:	Minimum base quality required to consider a base for calling.
default:	`20`

coverageBed¶

data:coveragecoveragebed (data:alignment:bam alignment, data:masterfile:amplicon master_file)[Source: v4.1.1]

Bedtools coverage (coveragebed)

alignment

label:	Alignment (BAM)
type:	`data:alignment:bam`

master_file

label:	Master file
type:	`data:masterfile:amplicon`

cov_metrics

label:	Coverage metrics
type:	`basic:file`

mean_cov

label:	Mean amplicon coverage
type:	`basic:file`

amplicon_cov

label:	Amplicon coverage file (nomergebed)
type:	`basic:file`

covplot_html

label:	HTML coverage plot
type:	`basic:file:html`

edgeR¶

data:differentialexpression:edgerdifferentialexpression-edger (list:data:expression case, list:data:expression control, basic:integer filter)[Source: v1.2.0]

Empirical Analysis of Digital Gene Expression Data in R (edgeR). Differential expression analysis of RNA-seq expression profiles with biological replication. Implements a range of statistical methodology based on the negative binomial distributions, including empirical Bayes estimation, exact tests, generalized linear models and quasi-likelihood tests. As well as RNA-seq, it be applied to differential signal analysis of other types of genomic data that produce counts, including ChIP-seq, Bisulfite-seq, SAGE and CAGE. See [here](https://www.bioconductor.org/packages/devel/bioc/vignettes/edgeR/inst/doc/edgeRUsersGuide.pdf) for more information.

case

label:	Case
type:	`list:data:expression`
description:	Case samples (replicates)

control

label:	Control
type:	`list:data:expression`
description:	Control samples (replicates)

filter

label:	Raw counts filtering threshold
type:	`basic:integer`
description:	Filter genes in the expression matrix input. Remove genes where the number of counts in all samples is below the threshold.
default:	`10`

raw

label:	Differential expression
type:	`basic:file`

de_json

label:	Results table (JSON)
type:	`basic:json`

de_file

label:	Results table (file)
type:	`basic:file`

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

featureCounts¶

data:expression:featurecountsfeature_counts (data:alignment:bam aligned_reads, basic:string assay_type, data:index:salmon cdna_index, basic:integer n_reads, data:annotation annotation, basic:string feature_class, basic:string feature_type, basic:string id_attribute, basic:string normalization_type, data:mappability:bcm mappability, basic:boolean show_advanced, basic:boolean count_features, basic:boolean allow_multi_overlap, basic:integer min_overlap, basic:decimal frac_overlap, basic:decimal frac_overlap_feature, basic:boolean largest_overlap, basic:integer read_extension_5, basic:integer read_extension_3, basic:integer read_to_pos, basic:boolean count_multi_mapping_reads, basic:boolean fraction, basic:integer min_mqs, basic:boolean split_only, basic:boolean non_split_only, basic:boolean primary, basic:boolean ignore_dup, basic:boolean junc_counts, data:genome genome, basic:boolean is_paired_end, basic:boolean require_both_ends_mapped, basic:boolean check_frag_length, basic:integer min_frag_length, basic:integer max_frag_length, basic:boolean do_not_count_chimeric_fragments, basic:boolean do_not_sort, basic:boolean by_read_group, basic:boolean count_long_reads, basic:boolean report_reads, basic:integer max_mop, basic:boolean verbose)[Source: v2.6.0]

featureCounts is a highly efficient general-purpose read summarization program that counts mapped reads for genomic features such as genes, exons, promoter, gene bodies, genomic bins and chromosomal locations. It can be used to count both RNA-seq and genomic DNA-seq reads. See the [official website](http://bioinf.wehi.edu.au/featureCounts/) and the [introductory paper](https://academic.oup.com/bioinformatics/article/30/7/923/232889) for more information.

alignment.aligned_reads

label:	Aligned reads
type:	`data:alignment:bam`

alignment.assay_type

label:	Assay type
type:	`basic:string`
description:	Indicate if strand-specific read counting should be performed. For paired-end reads, strand of the first read is taken as the strand of the whole fragment. FLAG field is used to tell if a read is first or second read in a pair. Automated strand detection is enabled using the [Salmon](https://salmon.readthedocs.io/en/latest/library_type.html) tool’s build-in functionality. To use this option, cDNA (transcriptome) index file crated using the Salmon indexing tool must be provided.
default:	`non_specific`
choices:	Strand non-specific: `non_specific` Strand-specific forward: `forward` Strand-specific reverse: `reverse` Detect automatically: `auto`

alignment.cdna_index

label:	cDNA index file
type:	`data:index:salmon`
description:	Transcriptome index file created using the Salmon indexing tool. cDNA (transcriptome) sequences used for index file creation must be derived from the same species as the input sequencing reads to obtain the reliable analysis results.
required:	False
hidden:	alignment.assay_type != ‘auto’

alignment.n_reads

label:	Number of reads in subsampled alignment file
type:	`basic:integer`
description:	Alignment (.bam) file subsample size. Increase the number of reads to make automatic detection more reliable. Decrease the number of reads to make automatic detection run faster.
hidden:	alignment.assay_type != ‘auto’
default:	`5000000`

annotation.annotation

label:	Annotation
type:	`data:annotation`
description:	GTF and GFF3 annotation formats are supported.

annotation.feature_class

label:	Feature class
type:	`basic:string`
description:	Feature class (3rd column in GTF/GFF3 file) to be used. All other features will be ignored.
default:	`exon`

annotation.feature_type

label:	Feature type
type:	`basic:string`
description:	The type of feature the quantification program summarizes over (e.g. gene or transcript-level analysis). The value of this parameter needs to be chosen in line with ‘ID attribute’ below.
default:	`gene`
choices:	gene: `gene` transcript: `transcript`

annotation.id_attribute

label:	ID attribute
type:	`basic:string`
description:	GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.
default:	`gene_id`
choices:	gene_id: `gene_id` transcript_id: `transcript_id` ID: `ID` geneid: `geneid`

normalization_type

label:	Normalization type
type:	`basic:string`
description:	The default expression normalization type.
default:	`TPM`
choices:	TPM: `TPM` CPM: `CPM` FPKM: `FPKM` RPKUM: `RPKUM`

mappability

label:	Mappability
type:	`data:mappability:bcm`
description:	Genome mappability information
required:	False
hidden:	normalization_type != ‘RPKUM’

show_advanced

label:	Show advanced options
type:	`basic:boolean`
description:	Inspect and modify parameters
default:	`False`

advanced.summarization_level.count_features

label:	Perform read counting at feature level
type:	`basic:boolean`
description:	Count reads for exons rather than genes.
default:	`False`

advanced.overlap.allow_multi_overlap

label:	Assign reads to all their overlapping features or meta-features
type:	`basic:boolean`
default:	`False`

advanced.overlap.min_overlap

label:	Minimum number of overlapping bases in a read that is required for read assignment
type:	`basic:integer`
description:	Number of overlapping bases is counted from both reads if paired-end. If a negative value is provided, then a gap of up to specified size will be allowed between read and the feature that the read is assigned to.
default:	`1`

advanced.overlap.frac_overlap

label:	Minimum fraction of overlapping bases in a read that is required for read assignment
type:	`basic:decimal`
description:	Value should be within range [0, 1]. Number of overlapping bases is counted from both reads if paired end. Both this option and ‘Minimum number of overlapping bases in a read that is required for read assignment’ need to be satisfied for read assignment.
default:	`0.0`

advanced.overlap.frac_overlap_feature

label:	Minimum fraction of overlapping bases included in a feature that is required for overlapping with a read or a read pair
type:	`basic:decimal`
description:	Value should be within range [0, 1].
default:	`0.0`

advanced.overlap.largest_overlap

label:	Assign reads to a feature or meta-feature that has the largest number of overlapping bases
type:	`basic:boolean`
default:	`False`

advanced.overlap.read_extension_5

label:	Number of bases to extend reads upstream by from their 5’ end
type:	`basic:integer`
default:	`0`

advanced.overlap.read_extension_3

label:	Number of bases to extend reads upstream by from their 3’ end
type:	`basic:integer`
default:	`0`

advanced.overlap.read_to_pos

label:	Reduce reads to their 5’-most or 3’-most base
type:	`basic:integer`
description:	Read counting is performed based on the single base the read is reduced to.
required:	False

advanced.multi_mapping_reads.count_multi_mapping_reads

label:	Count multi-mapping reads
type:	`basic:boolean`
description:	For a multi-mapping read, all its reported alignments will be counted. The ‘NH’ tag in BAM input is used to detect multi-mapping reads.
default:	`False`

advanced.fractional_counting.fraction

label:	Assign fractional counts to features
type:	`basic:boolean`
description:	This option must be used together with ‘Count multi-mapping reads’ or ‘Assign reads to all their overlapping features or meta-features’ or both. When ‘Count multi-mapping reads’ is checked, each reported alignment from a multi-mapping read (identified via ‘NH’ tag) will carry a count of 1 / x, instead of 1 (one), where x is the total number of alignments reported for the same read. When ‘Assign reads to all their overlapping features or meta-features’ is checked, each overlapping feature will receive a count of 1 / y, where y is the total number of features overlapping with the read. When both ‘Count multi-mapping reads’ and ‘Assign reads to all their overlapping features or meta-features’ are specified, each alignment will carry a count of 1 / (x * y).
required:	False
disabled:	!advanced.multi_mapping_reads.count_multi_mapping_reads && !advanced.overlap.allow_multi_overlap
default:	`False`

advanced.read_filtering.min_mqs

label:	Minimum mapping quality score
type:	`basic:integer`
description:	The minimum mapping quality score a read must satisfy in order to be counted. For paired-end reads, at least one end should satisfy this criterion.
default:	`0`

advanced.read_filtering.split_only

label:	Count only split alignments
type:	`basic:boolean`
default:	`False`

advanced.read_filtering.non_split_only

label:	Count only non-split alignments
type:	`basic:boolean`
default:	`False`

advanced.read_filtering.primary

label:	Count only primary alignments
type:	`basic:boolean`
description:	Primary alignments are identified using bit 0x100 in BAM FLAG field.
default:	`False`

advanced.read_filtering.ignore_dup

label:	Ignore duplicate reads in read counting
type:	`basic:boolean`
description:	Duplicate reads are identified using bit Ox400 in BAM FLAG field. The whole read pair is ignored if one of the reads is a duplicate read for paired-end data.
default:	`False`

advanced.exon_exon_junctions.junc_counts

label:	Count number of reads supporting each exon-exon junction
type:	`basic:boolean`
description:	Junctions are identified from those exon-spanning reads in input (containing ‘N’ in CIGAR string).
default:	`False`

advanced.exon_exon_junctions.genome

label:	Genome
type:	`data:genome`
description:	Reference sequences used in read mapping that produced the provided BAM files. This optional argument can be used to improve read counting for junctions.
required:	False
disabled:	!advanced.exon_exon_junctions.junc_counts

advanced.paired_end.is_paired_end

label:	Count fragments (or templates) instead of reads
type:	`basic:boolean`
default:	`True`

advanced.paired_end.require_both_ends_mapped

label:	Count only read pairs that have both ends aligned
type:	`basic:boolean`
default:	`False`

advanced.paired_end.check_frag_length

label:	Check fragment length when assigning fragments to meta-features or features
type:	`basic:boolean`
description:	Use minimum and maximum fragment/template length to set thresholds.
default:	`False`

advanced.paired_end.min_frag_length

label:	Minimum fragment/template length
type:	`basic:integer`
required:	False
disabled:	!advanced.paired_end.check_frag_length
default:	`50`

advanced.paired_end.max_frag_length

label:	Maximum fragment/template length
type:	`basic:integer`
required:	False
disabled:	!advanced.paired_end.check_frag_length
default:	`600`

advanced.paired_end.do_not_count_chimeric_fragments

label:	Do not count chimeric fragments
type:	`basic:boolean`
description:	Do not count read pairs that have their two ends mapped to different chromosomes or mapped to same chromosome but on different strands.
default:	`False`

advanced.paired_end.do_not_sort

label:	Do not sort reads in BAM input
type:	`basic:boolean`
default:	`False`

advanced.read_groups.by_read_group

label:	Assign reads by read group
type:	`basic:boolean`
description:	RG tag is required to be present in the input BAM files.
default:	`False`

advanced.long_reads.count_long_reads

label:	Count long reads such as Nanopore and PacBio reads
type:	`basic:boolean`
default:	`False`

advanced.miscellaneous.report_reads

label:	Output detailed assignment results for each read or read pair
type:	`basic:boolean`
default:	`False`

advanced.miscellaneous.max_mop

label:	Maximum number of ‘M’ operations allowed in a CIGAR string
type:	`basic:integer`
description:	Both ‘X’ and ‘=’ are treated as ‘M’ and adjacent ‘M’ operations are merged in the CIGAR string.
default:	`10`

advanced.miscellaneous.verbose

label:	Output verbose information
type:	`basic:boolean`
description:	Output verbose information for debugging, such as unmatched chromosome / contig names.
default:	`False`

rc

label:	Read counts
type:	`basic:file`

fpkm

label:	FPKM
type:	`basic:file`

tpm

label:	TPM
type:	`basic:file`

cpm

label:	CPM
type:	`basic:file`

exp

label:	Default expression output
type:	`basic:file`

exp_json

label:	Default expression output (json)
type:	`basic:json`

exp_type

label:	Expression normalization type (on default output)
type:	`basic:string`

exp_set

label:	Expressions
type:	`basic:file`

exp_set_json

label:	Expressions (json)
type:	`basic:json`

feature_counts_output

label:	featureCounts output
type:	`basic:file`

counts_summary

label:	Counts summary
type:	`basic:file`

read_assignments

label:	Read assignments
type:	`basic:file`
description:	Read assignment results for each read (or fragment if paired end).
required:	False

strandedness_report

label:	Strandedness report file
type:	`basic:file`
required:	False

source

label:	Gene ID database
type:	`basic:string`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

feature_type

label:	Feature type
type:	`basic:string`

methcounts¶

data:wgbs:methcountsmethcounts (data:genome:fasta genome, data:alignment:mr alignment, basic:boolean cpgs, basic:boolean symmetric_cpgs)[Source: v1.0.1]

The methcounts program takes the mapped reads and produces the methylation level at each genomic cytosine, with the option to produce only levels for CpG-context cytosines.

genome

label:	Reference genome
type:	`data:genome:fasta`

alignment

label:	Mapped reads
type:	`data:alignment:mr`
description:	WGBS alignment file in Mapped Read (.mr) format.

cpgs

label:	Only CpG context sites
type:	`basic:boolean`
description:	Output file will contain methylation data for CpG context sites only. Choosing this option will result in CpG content report only.
disabled:	symmetric_cpgs
default:	`False`

symmetric_cpgs

label:	Merge CpG pairs
type:	`basic:boolean`
description:	Merging CpG pairs results in symmetric methylation levels. Methylation is usually symmetric (cytosines at CpG sites were methylated on both DNA strands). Choosing this option will only keep the CpG sites data.
disabled:	cpgs
default:	`True`

meth

label:	Methylation levels
type:	`basic:file`

stats

label:	Statistics
type:	`basic:file`

bigwig

label:	Methylation levels BigWig file
type:	`basic:file`

species

label:	Species
type:	`basic:string`

build

label:	Build
type:	`basic:string`

miRNA pipeline¶

data:workflow:mirnaworkflow-mirna (data:reads:fastq reads, data:genome:fasta genome, data:annotation annotation, basic:string id_attribute, basic:string feature_class)[Source: v0.0.5]

reads

label:	Input miRNA reads.
type:	`data:reads:fastq`
description:	Note that these reads should already be void of adapters.

genome

label:	Genome
type:	`data:genome:fasta`

annotation

label:	Annotation (GTF/GFF3)
type:	`data:annotation`

id_attribute

label:	ID attribute
type:	`basic:string`
description:	GTF/GFF3 attribute to be used as feature ID. Several GTF/GFF3 lines with the same feature ID will be considered as parts of the same feature. The feature ID is used to identify the counts in the output table. In GTF files this is usually ‘gene_id’, in GFF3 files this is often ‘ID’, and ‘transcript_id’ is frequently a valid choice for both annotation formats.
default:	`gene_id`
choices:	gene_id: `gene_id` transcript_id: `transcript_id` ID: `ID` geneid: `geneid`

feature_class

label:	Feature class
type:	`basic:string`
description:	Feature class (3rd column in GFF file) to be used, all features of other types are ignored.
default:	`miRNA`

shRNA quantification¶

data:workflow:trimalquantworkflow-trim-align-quant (data:reads:fastq:single reads, list:basic:string up_primers_seq, list:basic:string down_primers_seq, basic:decimal error_rate_5end, basic:decimal error_rate_3end, data:genome:fasta genome, basic:string mode, basic:integer N, basic:integer L, basic:integer gbar, basic:string mp, basic:string rdg, basic:string rfg, basic:string score_min, basic:integer readlengths, basic:integer alignscores)[Source: v0.0.3]

reads

label:	Untrimmed reads.
type:	`data:reads:fastq:single`
description:	First stage of shRNA pipeline. Trims 5’ adapters, then 3’ adapters using the same error rate setting, aligns reads to a reference library and quantifies species.

trimming_options.up_primers_seq

label:	5’ adapter sequence
type:	`list:basic:string`
description:	A string of 5’ adapter sequence.
required:	True

trimming_options.down_primers_seq

label:	3’ adapter sequence
type:	`list:basic:string`
description:	A string of 3’ adapter sequence.
required:	True

trimming_options.error_rate_5end

label:	Error rate for 5’
type:	`basic:decimal`
description:	Maximum allowed error rate (no. of errors divided by the length of the matching region) for 5’ trimming.
required:	False
default:	`0.1`

trimming_options.error_rate_3end

label:	Error rate for 3’
type:	`basic:decimal`
description:	Maximum allowed error rate (no. of errors divided by the length of the matching region) for 3’ trimming.
required:	False
default:	`0.1`

alignment_options.genome

label:	Reference library
type:	`data:genome:fasta`
description:	Choose the reference library against which to align reads.

alignment_options.mode

label:	Alignment mode
type:	`basic:string`
description:	End to end: Bowtie 2 requires that the entire read align from one end to the other, without any trimming (or “soft clipping”) of characters from either end. local: Bowtie 2 does not require that the entire read align from one end to the other. Rather, some characters may be omitted (“soft clipped”) from the ends in order to achieve the greatest possible alignment score.
default:	`--end-to-end`
choices:	end to end mode: `--end-to-end` local: `--local`

alignment_options.N

label:	Number of mismatches allowed in seed alignment (N)
type:	`basic:integer`
description:	Sets the number of mismatches to allowed in a seed alignment during multiseed alignment. Can be set to 0 or 1. Setting this higher makes alignment slower (often much slower) but increases sensitivity. Default: 0.
required:	False

alignment_options.L

label:	Length of seed substrings (L)
type:	`basic:integer`
description:	Sets the length of the seed substrings to align during multiseed alignment. Smaller values make alignment slower but more sensitive. Default: the –sensitive preset is used by default for end-to-end alignment and –sensitive-local for local alignment. See documentation for details.
required:	False

alignment_options.gbar

label:	Disallow gaps within positions (gbar)
type:	`basic:integer`
description:	Disallow gaps within <int> positions of the beginning or end of the read. Default: 4.
required:	False

alignment_options.mp

label:	Maximal and minimal mismatch penalty (mp)
type:	`basic:string`
description:	Sets the maximum (MX) and minimum (MN) mismatch penalties, both integers. A number less than or equal to MX and greater than or equal to MN is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an N. If –ignore-quals is specified, the number subtracted quals MX. Otherwise, the number subtracted is MN + floor((MX-MN)(MIN(Q, 40.0)/40.0)) where Q is the Phred quality value. Default for MX, MN: 6,2.
required:	False

alignment_options.rdg

label:	Set read gap open and extend penalties (rdg)
type:	`basic:string`
description:	Sets the read gap open (<int1>) and extend (<int2>) penalties. A read gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.
required:	False

alignment_options.rfg

label:	Set reference gap open and close penalties (rfg)
type:	`basic:string`
description:	Sets the reference gap open (<int1>) and extend (<int2>) penalties. A reference gap of length N gets a penalty of <int1> + N * <int2>. Default: 5,3.
required:	False

alignment_options.score_min

label:	Minimum alignment score needed for “valid” alignment (score-min)
type:	`basic:string`
description:	Sets a function governing the minimum alignment score needed for an alignment to be considered “valid” (i.e. good enough to report). This is a function of read length. For instance, specifying L,0,-0.6 sets the minimum-score function to f(x) = 0 + -0.6 * x, where x is the read length. The default in –end-to-end mode is L,-0.6,-0.6 and the default in –local mode is G,20,8.
required:	False

quant_options.readlengths

label:	Species lengths threshold
type:	`basic:integer`
description:	Species with read lengths below specified threshold will be removed from final output. Default is no removal.

quant_options.alignscores

label:	Align scores filter threshold
type:	`basic:integer`
description:	Species with align score below specified threshold will be removed from final output. Default is no removal.

snpEff¶

data:snpeff:uploadupload-snpeff (basic:file annotation, basic:file summary, basic:file snpeff_genes)[Source: v1.1.1]

Upload snpEff result files.

annotation

label:	Annotation file
type:	`basic:file`

summary

label:	Summary
type:	`basic:file`

snpeff_genes

label:	SnpEff genes
type:	`basic:file`

annotation

label:	Annotation file
type:	`basic:file`

summary

label:	Summary
type:	`basic:file:html`

snpeff_genes

label:	SnpEff genes
type:	`basic:file`

snpEff¶

data:snpeffsnpeff (data:variants:vcf variants, basic:string var_source, basic:string database, list:data:variants:vcf known_vars_annot)[Source: v0.2.1]

Variant annotation using snpEff package.

variants

label:	Variants (VCF)
type:	`data:variants:vcf`

var_source

label:	Input VCF source
type:	`basic:string`
choices:	GATK HC: `gatk_hc` loFreq: `lofreq`

database

label:	snpEff database
type:	`basic:string`
default:	`GRCh37.75`
choices:	GRCh37.75: `GRCh37.75`

known_vars_annot

label:	Known variants
type:	`list:data:variants:vcf`

annotation

label:	Annotation file
type:	`basic:file`

summary

label:	Summary
type:	`basic:file:html`

snpeff_genes

label:	SnpEff genes
type:	`basic:file`