-
Notifications
You must be signed in to change notification settings - Fork 157
Expand file tree
/
Copy pathnextflow_schema.json
More file actions
1305 lines (1305 loc) · 93.5 KB
/
Copy pathnextflow_schema.json
File metadata and controls
1305 lines (1305 loc) · 93.5 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://raw.githubusercontent.com/nf-core/ampliseq/master/nextflow_schema.json",
"title": "nf-core/ampliseq pipeline parameters",
"description": "Amplicon sequencing analysis workflow using DADA2 and QIIME2",
"type": "object",
"$defs": {
"input_output_options": {
"title": "Main arguments",
"type": "object",
"description": "",
"default": "",
"properties": {
"input": {
"type": "string",
"format": "file-path",
"mimetype": "text/tsv",
"pattern": "^\\S+\\.(tsv|csv|yml|yaml|txt)$",
"fa_icon": "fas fa-dna",
"description": "Path to tab-separated sample sheet",
"help_text": "Path to sample sheet, either tab-separated (.tsv), comma-separated (.csv), or in YAML format (.yml/.yaml), that points to compressed fastq files.\n\nThe sample sheet must have at least two entries and supports two header layouts:\n- Legacy layout (required): `sampleID`, `forwardReads`; optional: `reverseReads`, `run`, `control`, `quant_reading`\n- Standardized layout (required): `sample`, `fastq_1`; optional: `fastq_2`, `run`, `control`, `quant_reading`\n\nSample IDs must start with a letter and can only contain letters, numbers or underscores.\n\nRelated parameters are:\n- `--pacbio` and `--iontorrent` if the sequencing data is PacBio data or IonTorrent data (default expected: paired-end Illumina data)\n- `--single_end` if the sequencing data is single-ended Illumina data (default expected: paired-end Illumina data)\n- Choose an appropriate reference taxonomy for the type of amplicon (16S/18S/ITS/CO1) (default: DADA2 assignTaxonomy and 16S rRNA sequence database)",
"schema": "assets/schema_input.json"
},
"input_fasta": {
"type": "string",
"format": "file-path",
"mimetype": "text/tsv",
"pattern": "^\\S+\\.(fasta|fas|fna|fa|ffn)$",
"fa_icon": "fas fa-dna",
"description": "Path to ASV/OTU fasta file",
"help_text": "Path to fasta format file with sequences that will be taxonomically classified. The fasta file input option can be used to taxonomically classify previously produced ASV/OTU sequences.\n\nThe fasta sequence header line may contain a description, that will be kept as part of the sequence name. However, tabs will be changed into spaces.\n\nRelated parameters are:\n- Choose an appropriate reference taxonomy for the type of amplicon (16S/18S/ITS/CO1) (default: DADA2 assignTaxonomy and 16S rRNA sequence database)"
},
"input_folder": {
"type": "string",
"format": "directory-path",
"fa_icon": "fas fa-dna",
"description": "Path to folder containing zipped FastQ files",
"help_text": "Path to folder containing compressed fastq files. Sample identifiers are extracted from file names, i.e. the string before the first underscore `_`, these must be unique. Examples and requirements are in the usage documentation.\n\nRelated parameters: `--extension`, `--multiple_sequencing_runs`, `--pacbio`, `--iontorrent`, `--single_end`."
},
"FW_primer": {
"type": "string",
"description": "Forward primer sequence",
"help_text": "In amplicon sequencing methods, PCR with specific primers produces the amplicon of interest. These primer sequences need to be trimmed from the reads before further processing and are also required for producing an appropriate classifier. Do not use here any technical sequence such as adapter sequences but only the primer sequence that matches the biological amplicon.\n\nFor example:\n\n```bash\n--FW_primer \"GTGYCAGCMGCCGCGGTAA\" --RV_primer \"GGACTACNVGGGTWTCTAAT\"\n```",
"fa_icon": "fas fa-arrow-circle-right"
},
"RV_primer": {
"type": "string",
"description": "Reverse primer sequence",
"help_text": "In amplicon sequencing methods, PCR with specific primers produces the amplicon of interest. These primer sequences need to be trimmed from the reads before further processing and are also required for producing an appropriate classifier. Do not use here any technical sequence such as adapter sequences but only the primer sequence that matches the biological amplicon.\n\nFor example:\n\n```bash\n--FW_primer GTGYCAGCMGCCGCGGTAA --RV_primer GGACTACNVGGGTWTCTAAT\n```",
"fa_icon": "fas fa-arrow-alt-circle-left"
},
"metadata": {
"type": "string",
"format": "file-path",
"description": "Path to metadata sheet, when missing most downstream analysis are skipped (barplots, PCoA plots, ...).",
"help_text": "This is optional, but for performing downstream analysis such as barplots, diversity indices or differential abundance testing, a metadata file is essential.\n\nRelated parameter:\n- `--metadata_category` (optional) to choose columns that are used for testing significance\n\nFor example:\n\n```bash\n--metadata \"path/to/metadata.tsv\"\n```\n\nThe first column in the tab-separated metadata file is the sample identifier column (required header: `ID`) and defines the sample or feature IDs associated with your study. More details are in the usage documentation.",
"fa_icon": "fas fa-file-csv"
},
"multiregion": {
"type": "string",
"format": "file-path",
"mimetype": "text/tsv",
"pattern": "^\\S+\\.(tsv|csv|yml|yaml|txt)$",
"fa_icon": "fas fa-dna",
"description": "Path to multi-region definition sheet, for multi-region analysis with Sidle",
"help_text": "Path to file with information about sequenced regions, either tab-separated (.tsv), comma-separated (.csv), or in YAML format (.yml/.yaml). This initiates scaffolding multiple regions along a reference.\n\nThe file must have four headers: \n- `region`: Unique region identifier\n- `region_length`: Minimal length of region\n- `FW_primer`: Forward primer sequence\n- `RV_primer`: Reverse primer sequence\n\nFor more details check the usage documentation.\n\nRelated parameters are:\n- `--sidle_ref_taxonomy` to select the reference taxonomic database\n- `--sidle_ref_tax_custom`, `--sidle_ref_seq_custom` and related parameters for custom reference taxonomic database files",
"schema": "assets/schema_multiregion.json"
},
"outdir": {
"type": "string",
"format": "directory-path",
"description": "The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.",
"fa_icon": "fas fa-folder-open"
},
"ref_taxonomy_storage": {
"type": "string",
"format": "directory-path",
"description": "The directory where the reference taxonomy databases will be saved for re-use. Absolute paths to storage on Cloud infrastructure.",
"fa_icon": "fas fa-folder-open"
},
"save_intermediates": {
"type": "boolean",
"description": "Save intermediate results such as QIIME2's qza and qzv files",
"fa_icon": "fas fa-folder-open"
},
"email": {
"type": "string",
"description": "Email address for completion summary.",
"fa_icon": "fas fa-envelope",
"help_text": "Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. If set in your user config file (`~/.nextflow/config`) then you don't need to specify this on the command line for every run.",
"pattern": "^([a-zA-Z0-9_\\-\\.]+)@([a-zA-Z0-9_\\-\\.]+)\\.([a-zA-Z]{2,5})$"
}
},
"required": ["outdir"],
"fa_icon": "fas fa-terminal"
},
"sequencing_input": {
"title": "Sequencing input",
"type": "object",
"description": "",
"default": "",
"properties": {
"binned_quality": {
"type": "string",
"description": "Comma separated quality bins. If data has binned quality scores.",
"help_text": "Bins can change over time and vary by sequencer type and software version. For example, at the time of writing, for 'MiSeq i100 Control Software version 1.0' quality bins are documented as '2,12,24,38', for 'MiSeq i100 Control Software version 1.1 and later' as '2,9,23,38', for 'NovaSeq X/X Plus Control Software v1.3' as '2,9,24,40'.",
"fa_icon": "fas fa-align-justify",
"pattern": "^[0-9]+(,[0-9]+)*$"
},
"ignore_binned_quality": {
"type": "boolean",
"description": "Ignore warnings about binned quality scores.",
"fa_icon": "fas fa-arrow-right"
},
"pacbio": {
"type": "boolean",
"description": "If data is single-ended PacBio reads instead of Illumina",
"fa_icon": "fas fa-align-justify"
},
"iontorrent": {
"type": "boolean",
"description": "If data is single-ended IonTorrent reads instead of Illumina",
"fa_icon": "fas fa-align-justify"
},
"single_end": {
"type": "boolean",
"description": "If data is single-ended Illumina reads instead of paired-end",
"help_text": "When using a sample sheet with `--input` containing forward and reverse reads, specifying `--single_end` will only extract forward reads and treat the data as single ended instead of extracting forward and reverse reads.",
"fa_icon": "fas fa-align-left"
},
"illumina_pe_its": {
"type": "boolean",
"description": "If analysing ITS amplicons or any other region with large length variability with Illumina paired end reads",
"help_text": "This will cause the pipeline to\n- not truncate input reads if not `--trunclenf` and `--trunclenr` are overwriting defaults\n- remove reverse complement primers from the end of reads in case the read length exceeds the amplicon length",
"fa_icon": "fas fa-align-justify"
},
"quality_type": {
"type": "string",
"default": "Auto",
"description": "Type of quality scores in raw read data",
"help_text": "From R package 'ShortRead' function 'readFastq': Representation to be used for quality scores, must be one of `Auto` (infer automatically), `FastqQuality` (Phred-like base 33 encoding), `SFastqQuality` (Illumina base 64 encoding).",
"enum": ["Auto", "FastqQuality", "SFastqQuality"],
"fa_icon": "fab fa-amilia"
},
"multiple_sequencing_runs": {
"type": "boolean",
"description": "If using `--input_folder`: samples were sequenced in multiple sequencing runs",
"help_text": "Expects one sub-folder per sequencing run in the folder specified by `--input_folder` containing sequencing data of the specific run.\nSample identifiers are taken from sequencing files, specifically the string before the first underscore will be the sample ID. Sample IDs across all sequencing runs (all sequencing files) have to be unique. If this is not the case, please use a sample sheet as input instead.\n\nExample for input data organization:\n\n```bash\ndata\n |-run1\n | |-sample1_1_L001_R1_001.fastq.gz\n | |-sample1_1_L001_R2_001.fastq.gz\n | |-sample2_1_L001_R1_001.fastq.gz\n | |-sample2_1_L001_R2_001.fastq.gz\n |\n |-run2\n |-sample3_1_L001_R1_001.fastq.gz\n |-sample3_1_L001_R2_001.fastq.gz\n |-sample4_1_L001_R1_001.fastq.gz\n |-sample4_1_L001_R2_001.fastq.gz\n```\n\nExample command to analyze this data in one pipeline run:\n\n```bash\nnextflow run nf-core/ampliseq \\\n -profile singularity \\\n --input_folder \"data\" \\\n --FW_primer \"GTGYCAGCMGCCGCGGTAA\" \\\n --RV_primer \"GGACTACNVGGGTWTCTAAT\" \\\n --metadata \"data/Metadata.tsv\" \\\n --multiple_sequencing_runs\n```",
"fa_icon": "fas fa-running"
},
"extension": {
"type": "string",
"default": "/*_R{1,2}_001.fastq.gz",
"description": "If using `--input_folder`: naming of sequencing files",
"help_text": "Indicates the naming of sequencing files (default: `\"/*_R{1,2}_001.fastq.gz\"`).\n\nPlease note:\n\n1. The prepended slash (`/`) is required\n2. The star (`*`) is the required wildcard for sample names\n3. The curly brackets (`{}`) enclose the orientation for paired end reads, separated by a comma (`,`).\n4. The pattern must be enclosed in quotes\n\nFor example for one sample (name: `1`) with forward (file: `1_a.fastq.gz`) and reverse (file: `1_b.fastq.gz`) reads in folder `data`:\n\n```bash\n--input_folder \"data\" --extension \"/*_{a,b}.fastq.gz\"\n```",
"fa_icon": "fab fa-amilia"
},
"min_read_counts": {
"type": "integer",
"default": 1,
"description": "Set read count threshold for failed samples.",
"help_text": "Samples with less reads than this threshold at input or after trimming stop the pipeline. Using `--ignore_empty_input_files` or `--ignore_failed_trimming ignores` samples with read numbers below the threshold and lets the pipeline continue with less samples.",
"fa_icon": "fas fa-greater-than-equal"
},
"ignore_empty_input_files": {
"type": "boolean",
"description": "Ignore input files with too few reads.",
"help_text": "Ignore input files with less reads than specified by `--min_read_counts` and continue the pipeline without those samples.",
"fa_icon": "fas fa-arrow-right"
}
},
"fa_icon": "fas fa-align-justify"
},
"primer_removal": {
"title": "Primer removal",
"type": "object",
"description": "Spurious sequences sometimes lack primer sequences and primers introduce errors that can be removed in that step",
"default": "",
"properties": {
"retain_untrimmed": {
"type": "boolean",
"description": "Cutadapt will retain untrimmed reads, choose only if input reads are not expected to contain primer sequences.",
"help_text": "When read sequences are trimmed, untrimmed read pairs are discarded routinely. Use this option to retain untrimmed read pairs. This is usually not recommended and is only of advantage for specific protocols that prevent sequencing PCR primers. ",
"fa_icon": "far fa-plus-square"
},
"cutadapt_min_overlap": {
"type": "integer",
"default": 3,
"description": "Sets the minimum overlap for valid matches of primer sequences with reads for cutadapt (-O).",
"fa_icon": "fas fa-align-left"
},
"cutadapt_max_error_rate": {
"type": "number",
"default": 0.1,
"description": "Sets the maximum error rate for valid matches of primer sequences with reads for cutadapt (-e).",
"fa_icon": "fas fa-exclamation-circle"
},
"double_primer": {
"type": "boolean",
"description": "Cutadapt will be run twice to ensure removal of potential double primers",
"help_text": "Cutdapt will be run twice, first to remove reads without primers (default), then a second time to remove reads that erroneously contain a second set of primers, not to be used with `--retain_untrimmed`.",
"fa_icon": "fas fa-project-diagram"
},
"ignore_failed_trimming": {
"type": "boolean",
"description": "Ignore files with too few reads after trimming.",
"help_text": "Ignore files with less reads than specified by `--min_read_counts` after trimming and continue the pipeline without those samples.",
"fa_icon": "fas fa-arrow-right"
}
},
"fa_icon": "fas fa-align-left"
},
"read_trimming_and_quality_filtering": {
"title": "Read trimming and quality filtering",
"type": "object",
"description": "Read trimming and quality filtering is supposed to reduce spurious results and aid error correction",
"default": "",
"properties": {
"truncq": {
"type": "integer",
"default": 2,
"fa_icon": "fas fa-greater-than-equal",
"minimum": 0,
"description": "Truncate each read at the first instance of a quality score less than or equal to `--truncq`.",
"help_text": "Applied before read length truncation, but in the same step as read length truncation. The parameter is equivalent to `truncQ` in DADA2's filterAndTrim method. `--truncq` defaults to 2, which was already the default previously. If `--trunc_qmin` and `--trunc_rmin` are used to automatically calculate `--trunclenf` and `--trunclenr`, these values are determined using the read metrics calculated on the read input before DADA2 filterAndTrim is called, and so before `--truncq` is applied."
},
"trunclenf": {
"type": "integer",
"description": "DADA2 read truncation value for forward strand, set this to 0 for no truncation",
"help_text": "Read denoising by DADA2 creates an error profile specific to a sequencing run and uses this to correct sequencing errors. This method prefers when all reads to have the same length and as high quality as possible while maintaining at least 20 bp overlap for merging. One cutoff for the forward read `--trunclenf` and one for the reverse read `--trunclenr` truncate all longer reads at that position and drop all shorter reads.\nIf not set, these cutoffs will be determined automatically for the position before the mean quality score drops below `--trunc_qmin`.\n\nFor example:\n\n```bash\n--trunclenf 180 --trunclenr 120\n```\n\nPlease note:\n\n1. Overly aggressive truncation might lead to insufficient overlap for read merging\n2. Too little truncation might reduce denoised reads\n3. The code choosing these values automatically cannot take the points above into account, therefore checking read numbers is essential",
"fa_icon": "fas fa-ban"
},
"trunclenr": {
"type": "integer",
"description": "DADA2 read truncation value for reverse strand, set this to 0 for no truncation",
"help_text": "Read denoising by DADA2 creates an error profile specific to a sequencing run and uses this to correct sequencing errors. This method prefers when all reads to have the same length and as high quality as possible while maintaining at least 20 bp overlap for merging. One cutoff for the forward read `--trunclenf` and one for the reverse read `--trunclenr` truncate all longer reads at that position and drop all shorter reads.\nIf not set, these cutoffs will be determined automatically for the position before the mean quality score drops below `--trunc_qmin`.\n\nFor example:\n\n```bash\n--trunclenf 180 --trunclenr 120\n```\n\nPlease note:\n\n1. Overly aggressive truncation might lead to insufficient overlap for read merging\n2. Too little truncation might reduce denoised reads\n3. The code choosing these values automatically cannot take the points above into account, therefore checking read numbers is essential",
"fa_icon": "fas fa-ban"
},
"trunc_qmin": {
"type": "integer",
"default": 25,
"description": "If --trunclenf and --trunclenr are not set, these values will be automatically determined using this median quality score",
"help_text": "Automatically determine `--trunclenf` and `--trunclenr` before the median quality score drops below `--trunc_qmin`. The fraction of reads retained is defined by `--trunc_rmin`, which might override the quality cutoff.\n\nFor example:\n\n```bash\n--trunc_qmin 35\n```\n\nPlease note:\n\n1. The code choosing `--trunclenf` and `--trunclenr` using `--trunc_qmin` automatically cannot take amplicon length or overlap requirements for merging into account, therefore use with caution.\n2. A minimum value of 25 is recommended. However, high quality data with a large paired sequence overlap might justify a higher value (e.g. 35). Also, very low quality data might require a lower value.\n3. If the quality cutoff is too low to include a certain fraction of reads that is specified by `--trunc_rmin` (e.g. 0.75 means at least 75% percent of reads are retained), a lower cutoff according to `--trunc_rmin` superseeds the quality cutoff.\n4. The calculations for the values of `--trunclenf` and `--trunclenr` are made before any quality score-based read truncation (using `--truncq`) is performed.",
"fa_icon": "fas fa-greater-than-equal"
},
"trunc_rmin": {
"type": "number",
"default": 0.75,
"description": "Assures that values chosen with --trunc_qmin will retain a fraction of reads.",
"help_text": "Value can range from 0 to 1. 0 means no reads need to be retained and 1 means all reads need to be retained. The minimum lengths of --trunc_qmin and --trunc_rmin are chosen as DADA2 cutoffs.",
"minimum": 0,
"maximum": 1,
"fa_icon": "fas fa-greater-than-equal"
},
"max_ee": {
"type": "integer",
"default": 2,
"description": "DADA2 read filtering option",
"help_text": "After truncation, reads with higher than `max_ee` \"expected errors\" will be discarded. In case of very long reads, you might want to increase this value. We recommend (to start with) a value corresponding to approximately 1 expected error per 100-200 bp (default: 2)",
"fa_icon": "fas fa-equals"
},
"min_len": {
"type": "integer",
"default": 50,
"description": "DADA2 read filtering option",
"fa_icon": "fas fa-greater-than-equal",
"help_text": "Remove reads with length less than `min_len` after trimming and truncation."
},
"max_len": {
"type": "integer",
"description": "DADA2 read filtering option",
"fa_icon": "fas fa-less-than-equal",
"help_text": "Remove reads with length greater than `max_len` after trimming and truncation. Must be a positive integer."
},
"ignore_failed_filtering": {
"type": "boolean",
"description": "Ignore files with too few reads after quality filtering.",
"help_text": "Ignore files with fewer reads than specified by `--min_read_counts` after trimming and continue the pipeline without those samples. Please review all quality trimming and filtering options before using this parameter. For example, one sample with shorter sequences than other samples might loose all sequences due to minimum length requirements by read truncation (see --trunclenf).",
"fa_icon": "fas fa-arrow-right"
}
},
"fa_icon": "fas fa-ban"
},
"amplicon_sequence_variants_asv_calculation": {
"title": "Amplicon Sequence Variants (ASV) calculation",
"type": "object",
"default": "",
"properties": {
"sample_inference": {
"type": "string",
"default": "independent",
"help_text": "If samples are treated independent (lowest sensitivity and lowest resources), pooled (highest sensitivity and resources) or pseudo-pooled (balance between required resources and sensitivity).",
"description": "Mode of sample inference: \"independent\", \"pooled\" or \"pseudo\"",
"enum": ["independent", "pooled", "pseudo"]
},
"mergepairs_strategy": {
"type": "string",
"default": "merge",
"description": "Strategy to merge paired end reads. When paired end reads are not sufficiently overlapping for merging, you can use \"concatenate\" (not recommended). When you have a mix of overlapping and non overlapping reads use \"consensus\"",
"help_text": "This parameters specifies how paired-end reads are merged after denoising. By default, read pairs will be merged by overlap. Concatenating read pairs (separated by 10 N's) is an alternative to only analyzing the forward or reverse read in case of non-overlapping paired-end sequencing data, this is not recommended, only if all other options fail. The consensus strategy is merging read pairs by overlap if possible and concatenates non-overlapping read pairs, based on `--mergepairs_consensus_*` parameters.",
"enum": ["merge", "concatenate", "consensus"]
},
"mergepairs_consensus_match": {
"type": "integer",
"default": 1,
"description": "The score assigned for each matching base pair during sequence alignment.",
"help_text": "This parameter specifies the numerical value added to the alignment score for every pair of bases that match between the forward and reverse reads. A higher value increases the preference for alignments with more matching bases."
},
"mergepairs_consensus_mismatch": {
"type": "integer",
"default": -2,
"description": "The penalty score assigned for each mismatched base pair during sequence alignment.",
"help_text": "This parameter defines the numerical penalty subtracted from the alignment score for each base pair mismatch between the forward and reverse reads. A higher penalty reduces the likelihood of accepting alignments with mismatches."
},
"mergepairs_consensus_gap": {
"type": "integer",
"default": -4,
"description": "The penalty score assigned for each gap introduced during sequence alignment.",
"help_text": "This parameter sets the numerical penalty subtracted from the alignment score for each gap (insertion or deletion) introduced to align the forward and reverse reads. A higher penalty discourages alignments that require gaps."
},
"mergepairs_consensus_minoverlap": {
"type": "integer",
"default": 12,
"description": "The minimum number of overlapping base pairs required to merge forward and reverse reads.",
"help_text": "This parameter specifies the smallest number of consecutive base pairs that must overlap between the forward and reverse reads for them to be merged. Ensuring sufficient overlap is crucial for accurate merging."
},
"mergepairs_consensus_maxmismatch": {
"type": "integer",
"default": 0,
"description": "The maximum number of mismatches allowed within the overlapping region for merging reads.",
"help_text": "This parameter defines the highest number of mismatched base pairs permitted in the overlap region between forward and reverse reads for a successful merge. Setting this value helps control the stringency of read merging, balancing between sensitivity and accuracy."
},
"mergepairs_consensus_percentile_cutoff": {
"type": "number",
"default": 0.001,
"description": "The percentile used to determine a stringent cutoff which will correspond to the minimum observed overlap in the dataset. This ensures that only read pairs with high overlap are merged into consensus sequences. Those with insufficient overlap are concatenated."
}
},
"fa_icon": "fas fa-braille"
},
"asv_post_processing": {
"title": "ASV post processing",
"type": "object",
"description": "ASV post-processing takes place after ASV computation but before taxonomic assignment, it will affect all downstream processes",
"default": "",
"properties": {
"vsearch_cluster": {
"type": "boolean",
"description": "Post-cluster ASVs with VSEARCH",
"help_text": "ASVs will be clustered with VSEARCH using the id value found in `--vsearch_cluster_id`."
},
"vsearch_cluster_id": {
"type": "number",
"default": 0.97,
"minimum": 0,
"maximum": 1,
"description": "Pairwise Identity value used when post-clustering ASVs if `--vsearch_cluster` option is used (default: 0.97).",
"help_text": "Lowering or increasing this value can change the number ASVs left over after clustering."
},
"raise_filter_stacksize": {
"type": "boolean",
"default": true,
"fa_icon": "fas fa-angle-double-up",
"description": "Raise stack size when filtering VSEARCH clusters",
"help_text": "Setting to true adds 'ulimit -s unlimited' to the beginning of the filt_clusters.py command."
},
"decontam": {
"type": "string",
"description": "Choose whether decontamination with `decontam` is applied to features.",
"help_text": "`decontaminate` assumes that features are **not** contaminants and requires sufficient positive proof a feature is a contaminant before calling it so. Only works with at least one of `control` and/or `quant_reading` columns in the sample sheet. `notcontaminant` assumes that features **are** contaminants and requires sufficient proof a feature is not a contaminant before calling it so. Only works with `control` column in the sample sheet.",
"default": "none",
"enum": ["none", "decontaminate", "notcontaminant"]
},
"decontam_decontaminate_method": {
"type": "string",
"description": "Choose the decontamination method for `--decontam decontaminate`.",
"help_text": "A description of the options can be found at https://rdrr.io/bioc/decontam/man/isContaminant.html, using 'auto' is using 'frequency', 'prevalence', or 'combined' where appropriate. Be aware that using 'frequency' while also having control samples can violate Decontam's assumption of similar bacterial biomass across samples.",
"default": "auto",
"enum": ["auto", "frequency", "prevalence", "combined", "minimum", "either", "both"]
},
"decontam_decontaminate_threshold": {
"type": "number",
"default": 0.1,
"minimum": 0,
"maximum": 1,
"description": "Choose the contamination likelihood threshold for `--decontam decontaminate`."
},
"decontam_notcontaminant_threshold": {
"type": "number",
"default": 0.5,
"minimum": 0,
"maximum": 1,
"description": "Choose the non-contaminant likelihood threshold for `--decontam notcontaminant`."
},
"filter_ssu": {
"type": "string",
"description": "Enable SSU filtering. Comma separated list of kingdoms (domains) in Barrnap, a combination (or one) of \"bac\", \"arc\", \"mito\", and \"euk\". ASVs that have their lowest evalue in that kingdoms are kept.",
"enum": [
"bac,arc,mito,euk",
"bac",
"arc",
"mito",
"euk",
"bac,arc",
"bac,mito",
"bac,euk",
"arc,mito",
"arc,euk",
"mito,euk",
"bac,arc,mito",
"bac,mito,euk",
"arc,mito,euk"
]
},
"min_len_asv": {
"type": "integer",
"description": "Minimal ASV length",
"help_text": "Remove ASV that are below the minimum length threshold (default: filter is disabled, otherwise 1). Increasing the threshold might reduce false positive ASVs (e.g. PCR off-targets)."
},
"max_len_asv": {
"type": "integer",
"description": "Maximum ASV length",
"help_text": "Remove ASV that are above the maximum length threshold (default: filter is disabled, otherwise 1000000). Lowering the threshold might reduce false positive ASVs (e.g. PCR off-targets)."
},
"filter_codons": {
"type": "boolean",
"description": "Filter ASVs based on codon usage",
"help_text": "ASVs will be filtered to contain no stop codon in their coding sequence and that their length is a multiple of 3."
},
"orf_start": {
"type": "integer",
"default": 1,
"description": "Starting position of codon tripletts",
"help_text": "By default, when `--filter_codons` is set, the codons start from the first position of the ASV sequences. The start of the codons can be changed to any position."
},
"orf_end": {
"type": "integer",
"description": "Ending position of codon tripletts",
"help_text": "By default, when `--filter_codons` is set, the codons are checked until the end of the ASV sequences. If you would like to change this setting, you can specify until which position of the ASV sequences the codon triplets are checked.\n\nPlease note that the length of the ASV from the beginning or from the `--orf_start` until this position must be a multiple of 3."
},
"stop_codons": {
"type": "string",
"default": "TAA,TAG",
"description": "Define stop codons",
"help_text": "By default, when `--filter_codons` is set, the codons `TAA,TAG` are set as stop codons. Here you can specify any comma-separated list of codons to be used as stop codons, e.g. `--stop_codons \"TAA,TAG,TGA\"`"
}
},
"fa_icon": "fas fa-filter"
},
"taxonomic_assignment": {
"title": "Taxonomic assignment",
"type": "object",
"fa_icon": "fas fa-database",
"description": "Choose a method and database for taxonomic assignments to single-region amplicons",
"properties": {
"dada_ref_taxonomy": {
"type": "string",
"help_text": "Choose any of the supported databases, and optionally also specify the version. Database and version are separated by an equal sign (`=`, e.g. `silva=138`) . This will download the desired database, format it to produce a file that is compatible with DADA2's assignTaxonomy and another file that is compatible with DADA2's addSpecies.\n\nThe following databases are supported:\n- GTDB - Genome Taxonomy Database - 16S rRNA\n- SBDI-GTDB, a Sativa-vetted version of the GTDB 16S rRNA\n- PR2 - Protist Reference Ribosomal Database - 18S rRNA\n- RDP - Ribosomal Database Project - 16S rRNA\n- SILVA ribosomal RNA gene database project - 16S rRNA\n- UNITE - eukaryotic nuclear ribosomal ITS region - ITS\n- COIDB - eukaryotic Cytochrome Oxidase I (COI) from The Barcode of Life Data System (BOLD) - COI\n\nGenerally, using `gtdb`, `pr2`, `rdp`, `sbdi-gtdb`, `silva`, `coidb`, `unite-fungi`, or `unite-alleuk` will select the most recent supported version.\n\nPlease note that commercial/non-academic entities [require licensing](https://www.arb-silva.de/silva-license-information) for SILVA v132 database (non-default) but not from v138 on (default).",
"description": "Name of supported database, and optionally also version number",
"default": "sbdi-gtdb=R11-RS232-1",
"enum": [
"coidb",
"coidb=221216",
"greengenes2",
"greengenes2=2024.09",
"gtdb",
"gtdb=R05-RS95",
"gtdb=R06-RS202",
"gtdb=R07-RS207",
"gtdb=R08-RS214",
"gtdb=R09-RS220",
"gtdb=R10-RS226",
"gtdb=R11-RS232",
"midori2-co1",
"midori2-co1=gb250",
"pr2",
"pr2=4.13.0",
"pr2=4.14.0",
"pr2=5.0.0",
"pr2=5.1.0",
"rdp",
"rdp=18",
"sbdi-gtdb",
"sbdi-gtdb=R11-RS232-1",
"sbdi-gtdb=R10-RS226-2",
"sbdi-gtdb=R09-RS220-2",
"sbdi-gtdb=R09-RS220-1",
"sbdi-gtdb=R08-RS214-1",
"sbdi-gtdb=R07-RS207-1",
"sbdi-gtdb=R06-RS202-3",
"sbdi-gtdb=R06-RS202-1",
"silva",
"silva=138.2",
"silva=138",
"silva=132",
"unite-alleuk",
"unite-alleuk=10.0",
"unite-alleuk=9.0",
"unite-alleuk=8.3",
"unite-alleuk=8.2",
"unite-fungi",
"unite-fungi=10.0",
"unite-fungi=9.0",
"unite-fungi=8.3",
"unite-fungi=8.2",
"zehr-nifh",
"zehr-nifh=2.5.0"
]
},
"dada_ref_tax_custom": {
"type": "string",
"help_text": "Overwrites `--dada_ref_taxonomy`. Either `--skip_dada_addspecies` (no species annotation) or `--dada_ref_tax_custom_sp` (species annotation) is additionally required. Consider also setting `--dada_assign_taxlevels`.\n\nMust be compatible to DADA2's assignTaxonomy function: 'Can be compressed. This reference fasta file should be formatted so that the id lines correspond to the taxonomy (or classification) of the associated sequence, and each taxonomic level is separated by a semicolon.' See also https://rdrr.io/bioc/dada2/man/assignTaxonomy.html",
"description": "Path to a custom DADA2 reference taxonomy database"
},
"dada_ref_tax_custom_sp": {
"type": "string",
"help_text": "Requires `--dada_ref_tax_custom`. Must be compatible to DADA2's addSpecies function: 'Can be compressed. This reference fasta file should be formatted so that the id lines correspond to the genus-species binomial of the associated sequence.' See also https://rdrr.io/bioc/dada2/man/addSpecies.html",
"description": "Path to a custom DADA2 reference taxonomy database for species assignment"
},
"dada_min_boot": {
"type": "integer",
"default": 50,
"fa_icon": "fas fa-greater-than-equal",
"description": "The minimum bootstrap confidence (out of 100 trials) for assigning a taxonomic level with DADA2. Matches `minBoot` in DADA2's assignTaxonomy method.",
"minimum": 0,
"maximum": 100
},
"dada_assign_taxlevels": {
"type": "string",
"help_text": "Typically useful when providing a custom DADA2 reference taxonomy database with `--dada_ref_tax_custom`. If DADA2's addSpecies is used (default), the last element(s) of the comma separated string must be 'Genus' or 'Genus,Species'.",
"description": "Comma separated list of taxonomic levels used in DADA2's assignTaxonomy function"
},
"cut_dada_ref_taxonomy": {
"type": "boolean",
"help_text": "Expected amplified sequences are extracted from the DADA2 reference taxonomy using the primer sequences, that might improve classification. This is not applied to species classification (assignSpecies) but only for lower taxonomic levels (assignTaxonomy).",
"description": "If the expected amplified sequences are extracted from the DADA2 reference taxonomy database"
},
"dada_addspecies_allowmultiple": {
"type": "boolean",
"help_text": "Defines the behavior when multiple exact matches against different species are returned. By default only unambiguous identifications are returned. If TRUE, a concatenated string of all exactly matched species is returned.",
"description": "If multiple exact matches against different species are returned"
},
"dada_taxonomy_rc": {
"type": "boolean",
"help_text": "Reverse-complement of each sequences will be used for classification if it is a better match to the reference sequences than the forward sequence.",
"description": "If reverse-complement of each sequences will be also tested for classification"
},
"dada_assign_chunksize": {
"type": "integer",
"help_text": "Chunks for DADA2's assignTaxonomy and addSpecies can speed up the process and lower required memory.",
"description": "ASV fasta will be subset into chunks of this size for classification",
"default": 10000
},
"pplace_sheet": {
"type": "string",
"format": "file-path",
"mimetype": "text/tsv",
"pattern": "^\\S+\\.(tsv|csv|yml|yaml|txt)$",
"fa_icon": "fas fa-dna",
"description": "Spreadsheet with phylogenetic placement information. Possible columns: target, alignmethod, hmm, extract_hmm, align_hmm, align_extract_hmm, refseqfile, refphylogeny, model, taxonomy.",
"help_text": "This specifies parameters for phylogenetic placement of sequences onto a reference phylogeny after first using HMMER to search through ASV sequences in contrast to the other `--pplace_*` parameters with which all ASVs are placed in the same reference phylogeny. The file needs to have at least two columns that specifies search HMM profiles ('target' and 'hmm') and at least three columns specifying the reference phylogenies ('refseqfile', 'refphylogeny' and 'model') for placement of search results. (Decoy profiles, i.e. profiles only used for searches, not placements, are allowed. Specify only 'target' and 'hmm' for these.) See [usage docs](https://nf-co.re/ampliseq/usage#multiple-reference-phylogenetic-placement).",
"schema": "assets/schema_pplace_sheet.json"
},
"pplace_tree": {
"type": "string",
"description": "Newick file with reference phylogenetic tree. Requires also `--pplace_aln` and `--pplace_model`."
},
"pplace_aln": {
"type": "string",
"description": "File with reference sequences. Requires also `--pplace_tree` and `--pplace_model`."
},
"pplace_model": {
"type": "string",
"description": "Phylogenetic model to use in placement, e.g. 'LG+F' or 'GTR+I+F'. Requires also `--pplace_tree` and `--pplace_aln`."
},
"pplace_alnmethod": {
"type": "string",
"description": "Method used for alignment, \"clustalo\", \"hmmer\" or \"mafft\"",
"default": "clustalo",
"enum": ["clustalo", "hmmer", "mafft"]
},
"pplace_taxonomy": {
"type": "string",
"help_text": "Headerless, tab-separated, first column with tree leaves, second column with taxonomy ranks separated by semicolon `;`. The results take precedence over DADA2 and QIIME2 classifications.",
"description": "Tab-separated file with taxonomy assignments of reference sequences."
},
"pplace_name": {
"type": "string",
"description": "A name for the run",
"hidden": true
},
"qiime_ref_taxonomy": {
"type": "string",
"help_text": "Choose any of the supported databases, and optionally also specify the version. Database and version are separated by an equal sign (`=`, e.g. `silva=138`) . This will download the desired database and initiate taxonomic classification with QIIME2 and the chosen database.\n\nIf both, `--dada_ref_taxonomy` and `--qiime_ref_taxonomy` are used, DADA2 classification will be used for downstream analysis.\n\nThe following databases are supported:\n- SILVA ribosomal RNA gene database project - 16S rRNA\n- UNITE - eukaryotic nuclear ribosomal ITS region - ITS\n- Greengenes (only testing!)\n\nGenerally, using `silva`, `unite-fungi`, or `unite-alleuk` will select the most recent supported version. For testing purposes, the tiny database `greengenes85` (dereplicated at 85% sequence similarity) is available. For details on what values are valid, please either use an invalid value such as `x` (causing the pipeline to send an error message with all valid values) or see `conf/ref_databases.config`.",
"description": "Name of supported database, and optionally also version number",
"enum": [
"silva=138",
"silva",
"greengenes85",
"greengenes2",
"greengenes2=2024.09",
"greengenes2=2022.10"
]
},
"qiime_ref_tax_custom": {
"type": "string",
"help_text": "Overwrites `--qiime_ref_taxonomy`. Either path to tarball (`*.tar.gz` or `*.tgz`) that contains sequence (`*.fna`) and taxonomy (`*.tax`) data, or alternatively a comma separated pair of filepaths to sequence (`*.fna`) and taxonomy (`*.tax`) data (possibly gzipped `*.gz`).",
"description": "Path to files of a custom QIIME2 reference taxonomy database (tarball, or two comma-separated files)"
},
"classifier": {
"type": "string",
"description": "Path to QIIME2 trained classifier file (typically *-classifier.qza)",
"help_text": "If you have trained a compatible classifier before, from sources such as SILVA (https://www.arb-silva.de/), Greengenes (http://greengenes.secondgenome.com/downloads) or RDP (https://rdp.cme.msu.edu/). \n\nFor example:\n\n```bash\n--classifier \"FW_primer-RV_primer-classifier.qza\"\n```\n\nPlease note the following requirements:\n\n1. The path must be enclosed in quotes\n2. The classifier is a Naive Bayes classifier produced by `qiime feature-classifier fit-classifier-naive-bayes` (e.g. by this pipeline)\n3. The primer pair for the amplicon PCR and the computing of the classifier are exactly the same (or full-length, potentially lower performance)\n4. The classifier has to be trained by the same version of scikit-learn as this version of the pipeline uses"
},
"kraken2_ref_taxonomy": {
"type": "string",
"help_text": "Choose any of the supported databases, and optionally also specify the version. Database and version are separated by an equal sign (`=`, e.g. `silve=138`) . This will download the desired database and initiate taxonomic classification with Kraken2 and the chosen database.\n\nConsider using `--kraken2_confidence` to set a confidence score threshold.\n\nThe following databases are supported:\n- RDP - Ribosomal Database Project - 16S rRNA\n- SILVA ribosomal RNA gene database project - 16S rRNA\n- Greengenes - 16S rRNA\n- Standard Kraken2 database (RefSeq archaea, bacteria, viral, plasmid, human, UniVec_Core) - any amplicon\n\nGenerally, using `rdp`, `silva`, `greengenes`, `standard` will select the most recent supported version.\n\nPlease note that commercial/non-academic entities [require licensing](https://www.arb-silva.de/silva-license-information) for SILVA v132 database (non-default) but not from v138 on.",
"description": "Name of supported database, and optionally also version number",
"enum": [
"silva",
"silva=138",
"silva=132",
"rdp",
"rdp=18",
"greengenes",
"greengenes=13.5",
"standard",
"standard=20240904",
"standard=20230605"
]
},
"kraken2_ref_tax_custom": {
"type": "string",
"help_text": "Overwrites `--kraken2_ref_taxonomy`. Consider also setting `--kraken2_assign_taxlevels`. Can be compressed tar archive (.tar.gz|.tgz) or folder containing the database. See also https://benlangmead.github.io/aws-indexes/k2.",
"description": "Path to a custom Kraken2 reference taxonomy database (*.tar.gz|*.tgz archive or folder)"
},
"kraken2_assign_taxlevels": {
"type": "string",
"help_text": "Typically useful when providing a custom Kraken2 reference taxonomy database with `--kraken2_ref_tax_custom`. In case a database is given with `--kraken2_ref_taxonomy`, the default taxonomic levels will be overwritten with `--kraken2_assign_taxlevels`.",
"description": "Comma separated list of taxonomic levels used in Kraken2. Will overwrite default values."
},
"kraken2_confidence": {
"type": "number",
"default": 0.0,
"help_text": "Increasing the threshold will require more k-mers to match at a taxonomic levels and reduce the taxonomic levels shown until the threshold is met.",
"description": "Confidence score threshold for taxonomic classification.",
"minimum": 0,
"maximum": 1
},
"sintax_ref_taxonomy": {
"type": "string",
"help_text": "Choose any of the supported databases, and optionally also specify the version. Database and version are separated by an equal sign (`=`, e.g. `coidb=221216`) . This will download the desired database and initiate taxonomic classification with VSEARCH sintax and the chosen database, which if needed is formatted to produce a file that is compatible with VSEARCH sintax.\n\nThe following databases are supported:\n- COIDB - eukaryotic Cytochrome Oxidase I (COI) from The Barcode of Life Data System (BOLD) - COI\n- UNITE - eukaryotic nuclear ribosomal ITS region - ITS\n\nGenerally, using `coidb`, `unite-fungi`, or `unite-alleuk` will select the most recent supported version.\n\nCannot be used together with `--sintax_ref_tax_custom`.",
"description": "Name of supported database, and optionally also version number",
"enum": [
"coidb",
"coidb=221216",
"unite-fungi",
"unite-fungi=10.0",
"unite-fungi=9.0",
"unite-fungi=8.3",
"unite-fungi=8.2",
"unite-alleuk",
"unite-alleuk=10.0",
"unite-alleuk=9.0",
"unite-alleuk=8.3",
"unite-alleuk=8.2"
]
},
"sintax_ref_tax_custom": {
"type": "string",
"help_text": "Overwrites `--sintax_ref_taxonomy`. Requires `--sintax_assign_taxlevels`. Plain fasta or gzip-compressed (`.gz`) is accepted. Cannot be used with `--sbdiexport`.",
"description": "Path to a custom SINTAX reference database (fasta)"
},
"sintax_assign_taxlevels": {
"type": "string",
"help_text": "Comma separated list of taxonomic ranks matching the labels in the custom database. Required with `--sintax_ref_tax_custom`. For example: `Phylum,Class,Order,Family,Genus,Species`.",
"description": "Comma separated list of taxonomic levels used in SINTAX with a custom reference database"
},
"vsearch_lca_ref_tax_custom": {
"type": "string",
"help_text": "Path to a custom VSEARCH LCA reference database. Requires `--vsearch_lca_assign_taxlevels`. Expected format: SINTAX-compatible FASTA with semicolon-separated taxonomy labels in sequence headers (same as `--sintax_ref_tax_custom`); plain FASTA or gzip-compressed (`.gz`) is accepted. Cannot be used together with `--vsearch_lca_ref_taxonomy`.",
"description": "Path to a custom VSEARCH LCA reference database (SINTAX-compatible FASTA)"
},
"vsearch_lca_ref_taxonomy": {
"type": "string",
"help_text": "Choose one of the supported built-in databases for VSEARCH LCA. Database and version are separated by an equal sign (`=`, e.g. `midori2-co1=gb270`). This downloads the selected reference and runs VSEARCH usearch_global with `--lcaout`.\n\nThe following built-in databases are currently supported:\n- COIDB - eukaryotic Cytochrome Oxidase I (COI) from BOLD\n- MIDORI2-CO1 - metazoan COI reference sequences in SINTAX format (default: GenBank270 release on reference-midori.info)\n- UNITE (`unite-fungi`, `unite-alleuk`, and pinned versions) - eukaryotic nuclear ribosomal ITS in USEARCH/UTAX (SINTAX-compatible) format\n\nUsing `unite-fungi` or `unite-alleuk` selects the most recent supported release. Cannot be used together with `--vsearch_lca_ref_tax_custom`.",
"description": "Name of built-in VSEARCH LCA reference database, and optionally also version number",
"enum": [
"coidb",
"coidb=221216",
"midori2-co1",
"midori2-co1=gb270",
"unite-fungi",
"unite-fungi=10.0",
"unite-fungi=9.0",
"unite-fungi=8.3",
"unite-fungi=8.2",
"unite-alleuk",
"unite-alleuk=10.0",
"unite-alleuk=9.0",
"unite-alleuk=8.3",
"unite-alleuk=8.2"
]
},
"vsearch_lca_assign_taxlevels": {
"type": "string",
"help_text": "Comma separated list of taxonomic ranks matching the labels in the custom database. Required with `--vsearch_lca_ref_tax_custom`. For example: `Phylum,Class,Order,Family,Genus,Species`.",
"description": "Comma separated list of taxonomic levels for VSEARCH LCA with a custom reference database"
},
"vsearch_lca_id": {
"type": "number",
"default": 0.9,
"minimum": 0,
"maximum": 1,
"description": "VSEARCH usearch_global identity cutoff (`--id`) used for VSEARCH LCA."
},
"vsearch_lca_maxaccepts": {
"type": "integer",
"default": 0,
"minimum": 0,
"description": "Maximum number of target sequences to accept per query for VSEARCH LCA (`--maxaccepts`)."
},
"vsearch_lca_maxrejects": {
"type": "integer",
"default": 0,
"minimum": 0,
"description": "Maximum number of non-matching target sequences to reject per query for VSEARCH LCA (`--maxrejects`)."
},
"vsearch_lca_lca_cutoff": {
"type": "number",
"default": 0.9,
"minimum": 0.5,
"maximum": 1,
"description": "LCA support threshold for VSEARCH LCA (`--lca_cutoff`)."
},
"vsearch_lca_query_cov": {
"type": "number",
"default": 1.0,
"minimum": 0,
"maximum": 1,
"description": "Minimum fraction of the query that must align to a target for VSEARCH LCA (`--query_cov`)."
},
"addsh": {
"type": "boolean",
"description": "If ASVs should be assigned to UNITE species hypotheses (SHs). Only relevant for ITS data."
},
"cut_its": {
"type": "string",
"help_text": "If data is long read ITS sequences, that need to be cut to ITS region (full ITS, only ITS1, or only ITS2) for taxonomy assignment.",
"description": "Part of ITS region to use for taxonomy assignment: \"full\", \"its1\", or \"its2\"",
"default": "none",
"enum": ["none", "full", "its1", "its2"]
},
"its_partial": {
"type": "integer",
"help_text": "If using cut_its, this option allows partial ITS sequences, longer than the specified cutoff.",
"description": "Cutoff for partial ITS sequences. Only full sequences by default.",
"default": 0
},
"its_extractor": {
"type": "string",
"description": "Tool for ITS region extraction: \"itsx\" or \"itsxrust\".",
"help_text": "Choose the tool used for ITS extraction when --cut_its is set. ITSxRust is a Rust-based alternative that is faster and optimized for long-read (ONT, PacBio HiFi) data. Both tools use the same HMM profiles. Default: itsx for backward compatibility.",
"default": "itsx",
"enum": ["itsx", "itsxrust"]
}
}
},
"multiregion_taxonomic_database": {
"title": "Multi-region taxonomic database",
"type": "object",
"fa_icon": "fas fa-database",
"description": "Choose database for taxonomic assignments with multi-region amplicons using SIDLE",
"properties": {
"sidle_ref_taxonomy": {
"type": "string",
"help_text": "",
"description": "Name of supported database, and optionally also version number",
"enum": ["silva", "silva=128", "greengenes", "greengenes=13_8", "greengenes88"]
},
"sidle_ref_tax_custom": {
"type": "string",
"help_text": "Use with `--sidle_ref_seq_custom`. Consider also setting `--sidle_ref_aln_custom` and `--sidle_ref_tree_custom`. The taxonomy file must be headerless and follow the format of `qiime tools import`'s `HeaderlessTSVTaxonomyFormat`. Example usage: `--sidle_ref_tax_custom 'taxonomy_99_taxonomy.txt'`",
"description": "Path to reference taxonomy strings (headerless, *.txt)",
"pattern": "^.*\\.txt$"
},
"sidle_ref_seq_custom": {
"type": "string",
"help_text": "Use with `--sidle_ref_tax_custom`. Example usage: `--sidle_ref_seq_custom 'rep_set_99.fasta'`",
"description": "Path to reference taxonomy sequences in fasta format",
"pattern": "^.*\\.(fasta|fas|fna|fa|ffn)$"
},
"sidle_ref_aln_custom": {
"type": "string",
"help_text": "May be used with `--sidle_ref_tax_custom`. Allows sequence reconstruction. Example usage: `--sidle_ref_aln_custom 'rep_set_aligned_99.fasta'`",
"description": "Path to multiple sequence alignment of reference taxonomy sequences in fasta format",
"pattern": "^.*\\.(fasta|fas|fna|fa|ffn)$"
},
"sidle_ref_tree_custom": {
"type": "string",
"help_text": "Use with `--sidle_ref_aln_custom`. Recommended with `--sidle_ref_tax_custom`. Allows phylogenetic tree reconstruction and therefore diversity analysis. Overwrites tree chosen by `--sidle_ref_taxonomy`",
"description": "Path to SIDLE reference taxonomy tree (*.qza)",
"pattern": "^.*\\.qza$"
},
"sidle_ref_degenerates": {
"type": "integer",
"default": 5,
"min": 0,
"help_text": "Only effective with `--sidle_ref_tax_custom`. Recommended with `--sidle_ref_tax_custom`, default 5 was recommended with SILVA 128. Sets `--p-num-degenerates` for `qiime rescript cull-seqs`.",
"description": "Exclude reference sequences with more than this much degenerates"
},
"sidle_ref_cleaning": {
"type": "string",
"help_text": "Recommended with `--sidle_ref_tax_custom`, default '--p-database silva' was recommended with SILVA 128. Therefore, ad-hoc database cleaning will be performed automatically, specifically with regard to the `define-missing` and `ambiguity-handling` parameters. Overwrites recommended settings for database chosen by `--sidle_ref_taxonomy`",
"description": "Arguments for `qiime sidle reconstruct-taxonomy` regarding ad-hoc cleaning"
}
}
},
"asv_filtering": {
"title": "ASV filtering",
"type": "object",
"default": "",
"description": "Filtering by taxonomy or abundance will affect all downstream analysis",
"fa_icon": "fas fa-filter",
"properties": {
"exclude_taxa": {
"type": "string",
"default": "mitochondria,chloroplast",
"description": "Comma separated list of unwanted taxa, to skip taxa filtering use \"none\"",
"help_text": "Depending on the primers used, PCR might amplify unwanted or off-target DNA. By default sequences originating from mitochondria or chloroplasts are removed. The taxa specified are excluded from further analysis.\nFor example to exclude any taxa that contain mitochondria, chloroplast, or archaea:\n\n```bash\n--exclude_taxa \"mitochondria,chloroplast,archaea\"\n```\n\nIf you prefer not filtering the data, specify:\n\n```bash\n--exclude_taxa \"none\"\n```\n\nPlease note the following requirements:\n\n1. Comma separated list enclosed in quotes\n2. May not contain whitespace characters\n3. Features that contain one or several of these terms in their taxonomical classification are excluded from further analysis\n4. The taxonomy level is not taken into consideration\n5. Taxa names should be as in Taxonomic database (Default: Silva138), example: 'Bacteria', 'Armatimonadia', 'unidentified', 'p__'\n6. Taxon names are case-insensitive and partial match is possible."
},
"min_frequency": {
"type": "integer",
"default": 1,
"description": "Abundance filtering",
"help_text": "Remove entries from the feature table below an absolute abundance threshold (default: 1, meaning filter is disabled). Singletons are often regarded as artifacts, choosing a value of 2 removes sequences with less than 2 total counts from the feature table.\n\nFor example to remove singletons choose:\n\n```bash\n--min_frequency 2\n```"
},
"min_samples": {
"type": "integer",
"default": 1,
"description": "Prevalence filtering",
"help_text": "Filtering low prevalent features from the feature table, e.g. keeping only features that are present in at least two samples can be achived by choosing a value of 2 (default: 1, meaning filter is disabled). Typically only used when having replicates for all samples.\n\nFor example to retain features that are present in at least two sample:\n\n```bash\n--min_samples 2\n```\n\nPlease note this is independent of abundance."
}
}
},
"downstream_analysis": {
"title": "Downstream analysis",
"type": "object",
"description": "Metadata is used here to visualize data either for quality control or publication ready figures",
"default": "",
"fa_icon": "fas fa-bacteria",
"properties": {
"metadata_category": {
"type": "string",
"description": "Comma separated list of metadata column headers for statistics.",
"help_text": "Here columns in the metadata sheet can be chosen with groupings that are used for diversity indices and differential abundance analysis. By default, all suitable columns in the metadata sheet will be used if this option is not specified. Suitable are columns which are categorical (not numerical) and have multiple different values which are not all unique. For example:\n\n```bash\n--metadata_category \"treatment1,treatment2\"\n```\n\nPlease note the following requirements:\n\n1. Comma separated list enclosed in quotes\n2. May not contain whitespace characters\n3. Each comma separated term has to match exactly one column name in the metadata sheet"
},
"metadata_category_barplot": {
"type": "string",
"description": "Comma separated list of metadata column headers for plotting average relative abundance barplots.",
"help_text": "Here columns in the metadata sheet can be chosen with groupings that are used for average relative abundance barplots. Samples that have empty fields for that column are discarded. For example:\n\n```bash\n--metadata_category_barplot \"treatment1,treatment2\"\n```\n\nPlease note the following requirements:\n\n1. Comma separated list enclosed in quotes\n2. May not contain whitespace characters\n3. Each comma separated term has to match exactly one column name in the metadata sheet"
},
"qiime_adonis_formula": {
"type": "string",
"description": "Formula for QIIME2 ADONIS metadata feature importance test for beta diversity distances",
"help_text": "Comma separated list of model formula(s), e.g. \"treatment1,treatment2\". Model formula should contain only independent terms in the sample metadata. These can be continuous variables or factors, and they can have interactions as in a typical R formula. Essentially, columns in the metadata sheet can be chosen that have no empty values, not only unique values, or not only identical values.\nFor example, \"treatment1+treatment2\" tests whether the data partitions based on \"treatment1\" and \"treatment2\" sample metadata. \"treatment1*treatment2\" test both of those effects as well as their interaction.\nMore examples can be found in the R documentation, https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Formulae-for-statistical-models"
},
"picrust": {
"type": "boolean",
"description": "If the functional potential of the bacterial community is predicted."
},
"sbdiexport": {
"type": "boolean",
"description": "If data should be exported in SBDI (Swedish biodiversity infrastructure) Excel format."
},
"diversity_rarefaction_depth": {
"type": "integer",
"default": 500,
"description": "Minimum rarefaction depth for diversity analysis. Any sample below that threshold will be removed.",
"fa_icon": "fas fa-greater-than-equal"
},
"tax_agglom_min": {
"type": "integer",
"default": 2,
"description": "Minimum taxonomy agglomeration level for taxonomic classifications",
"fa_icon": "fas fa-greater-than-equal",
"help_text": "Depends on the reference taxonomy database used."
},
"tax_agglom_max": {
"type": "integer",
"default": 6,
"description": "Maximum taxonomy agglomeration level for taxonomic classifications",
"fa_icon": "fas fa-greater-than-equal",
"help_text": "Depends on the reference taxonomy database used. Most default databases have genus level at 6."
}
}
},
"differential_abundance_analysis": {
"title": "Differential abundance analysis",
"type": "object",
"description": "Differential abundance analysis relies on provided metadata",
"default": "",
"fa_icon": "fas fa-bacteria",
"properties": {
"ancom_sample_min_count": {
"type": "integer",
"default": 1,
"description": "Minimum sample counts to retain a sample for ANCOM analysis. Any sample below that threshold will be removed.",
"fa_icon": "fas fa-greater-than-equal"
},
"ancom": {
"type": "boolean",
"description": "Perform differential abundance analysis with ANCOM",
"fa_icon": "fas fa-greater-than-equal"
},
"ancombc": {
"type": "boolean",
"description": "Perform differential abundance analysis with ANCOMBC",
"help_text": "ANCOMBC will be performed on all suitable columns in the metadata sheet. Empty values will be removed, therefore it is possible to perform tests on subsets. The reference level will default to highest alphanumeric group (e.g. in alphabetical or numeric order, as applicable) within each metadata column. Formula for specific tests can be supplied with `--ancombc_formula`.",
"fa_icon": "fas fa-greater-than-equal"
},
"ancombc_formula": {
"type": "string",
"description": "Formula to perform differential abundance analysis with ANCOMBC",
"help_text": "Comma separated list of model formula(s), e.g. \"treatment1,treatment2\". The reference level will default to highest alphanumeric group (e.g. in alphabetical or numeric order, as applicable) within each formula term. The reference level can be overwritten by `--ancombc_formula_reflvl`. Model formula should contain only independent terms in the sample metadata. These can be continuous variables or factors, and they can have interactions as in a typical R formula. Essentially, columns in the metadata sheet can be chosen that have no empty values, not only unique values, or not only identical values.\nFor example, \"treatment1+treatment2\" tests whether the data partitions based on \"treatment1\" and \"treatment2\" sample metadata. \"treatment1*treatment2\" test both of those effects as well as their interaction.\nMore examples can be found in the R documentation, https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Formulae-for-statistical-models",
"fa_icon": "fas fa-greater-than-equal"
},
"ancombc_formula_reflvl": {
"type": "string",
"description": "Reference level for `--ancombc_formula`",
"help_text": "This will only affect ANCOM-BC started by `--ancombc_formula`, but for all provided model formula, therefore it might be best to restrict `--ancombc_formula` to one formula. The syntax is as follows: 'column_name::column_value' or for multiple 'column_name1::column_value1 column_name2::column_value2'",
"fa_icon": "fas fa-greater-than-equal"
},
"ancombc_effect_size": {
"type": "number",
"default": 1.0,
"minimum": 0,
"description": "Effect size threshold for differential abundance barplot for `--ancombc` and `--ancombc_formula`",
"fa_icon": "fas fa-greater-than-equal"
},
"ancombc_significance": {
"type": "number",
"default": 0.05,
"minimum": 0,
"maximum": 1,
"description": "Significance threshold for differential abundance barplot for `--ancombc` and `--ancombc_formula`",
"fa_icon": "fas fa-greater-than-equal"
},
"ancombc2": {
"type": "boolean",
"description": "Perform differential abundance analysis with ANCOMBC2",
"help_text": "ANCOMBC2 will be performed on all suitable columns in the metadata sheet. Empty values will be removed, therefore it is possible to perform tests on subsets. The reference level will default to highest alphanumeric group (e.g. in alphabetical or numeric order, as applicable) within each metadata column. Formula for specific tests can be supplied with `--ancombc2_formula`.",
"fa_icon": "fas fa-greater-than-equal"
},
"ancombc2_formula": {
"type": "string",
"description": "Formula to perform differential abundance analysis with ANCOMBC2",
"help_text": "Comma separated list of model formula(s), e.g. \"treatment1,treatment2\". The reference level will default to highest alphanumeric group (e.g. in alphabetical or numeric order, as applicable) within each formula term. The reference level can be overwritten by `--ancombc2_formula_reflvl`. Model formula should contain only independent terms in the sample metadata. These can be continuous variables or factors, and they can have interactions as in a typical R formula. Essentially, columns in the metadata sheet can be chosen that have no empty values, not only unique values, or not only identical values.\nFor example, \"treatment1+treatment2\" tests whether the data partitions based on \"treatment1\" and \"treatment2\" sample metadata. \"treatment1*treatment2\" test both of those effects as well as their interaction.\nMore examples can be found in the R documentation, https://cran.r-project.org/doc/manuals/r-release/R-intro.html#Formulae-for-statistical-models",
"fa_icon": "fas fa-greater-than-equal"
},
"ancombc2_formula_reflvl": {
"type": "string",
"description": "Reference level for `--ancombc2_formula`",
"help_text": "This will only affect ANCOMBC2 started by `--ancombc2_formula`, but for all provided model formula, therefore it might be best to restrict `--ancombc2_formula` to one formula. The syntax is as follows: 'column_name::column_value' or for multiple 'column_name1::column_value1 column_name2::column_value2'",
"fa_icon": "fas fa-greater-than-equal"
}
}
},
"pipeline_report": {
"title": "Pipeline summary report",
"type": "object",
"description": "Customization of the pipeline report",
"default": "",
"properties": {
"report_template": {
"type": "string",
"default": "${projectDir}/assets/report_template.Rmd",
"description": "Path to Markdown file (Rmd)"
},
"report_css": {
"type": "string",
"default": "${projectDir}/assets/nf-core_style.css",
"description": "Path to style file (css)"
},
"report_logo": {
"type": "string",
"default": "${projectDir}/assets/nf-core-ampliseq_logo_light_long.png",
"description": "Path to logo file (png)"
},
"report_title": {
"type": "string",
"default": "Summary of analysis results",
"description": "String used as report title"
},
"report_abstract": {
"type": "string",
"description": "Path to Markdown file (md) that replaces the 'Abstract' section"
}
},
"fa_icon": "fas fa-book-open"