Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 13 additions & 13 deletions .github/CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,26 +49,26 @@ cd XAVIER

- Install the python dependencies with pip

```sh
pip install .
```
```sh
pip install .
```

If you're developing on biowulf, you can use our shared conda environment which already has these dependencies installed
If you're developing on biowulf, you can use our shared conda environment which already has these dependencies installed

```sh
. "/data/CCBR_Pipeliner/db/PipeDB/Conda/etc/profile.d/conda.sh"
conda activate py311
```
```sh
. "/data/CCBR_Pipeliner/db/PipeDB/Conda/etc/profile.d/conda.sh"
conda activate py311
```

- Install [`pre-commit`](https://pre-commit.com/#install) if you don't already
have it. Then from the repo's root directory, run

```sh
pre-commit install
```
```sh
pre-commit install
```

This will install the repo's pre-commit hooks.
You'll only need to do this step the first time you clone the repo.
This will install the repo's pre-commit hooks.
You'll only need to do this step the first time you clone the repo.

### Create a branch

Expand Down
24 changes: 12 additions & 12 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,17 +148,17 @@ Example:

```json
{
"Insert CRAFT prompt": {
"prefix": "craft",
"body": [
"/* C: Context: Repo=${workspaceFolderBasename}; bioinformatics pipelines; NIH HPC (Biowulf/Helix); containers: quay.io/ccbr */",
"/* R: Rules: no PHI, no secrets, containerize, pin versions, follow style */",
"/* F: Flow: inputs/ -> results/, conf/, tests/ */",
"/* T: Tests: provide a one-line TEST_CMD and expected output */",
"",
"A: $1"
],
"description": "Insert CRAFT prompt and place cursor at Actions"
}
"Insert CRAFT prompt": {
"prefix": "craft",
"body": [
"/* C: Context: Repo=${workspaceFolderBasename}; bioinformatics pipelines; NIH HPC (Biowulf/Helix); containers: quay.io/ccbr */",
"/* R: Rules: no PHI, no secrets, containerize, pin versions, follow style */",
"/* F: Flow: inputs/ -> results/, conf/, tests/ */",
"/* T: Tests: provide a one-line TEST_CMD and expected output */",
"",
"A: $1"
],
"description": "Insert CRAFT prompt and place cursor at Actions"
}
}
```
14 changes: 7 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ exclude: |
)
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v1.2.3
rev: v6.0.0
hooks:
- id: check-added-large-files
- id: end-of-file-fixer
- id: trailing-whitespace
- id: check-json
# spell check
- repo: https://github.com/codespell-project/codespell
rev: v2.2.4
rev: v2.4.2
hooks:
- id: codespell
# https://github.com/codespell-project/codespell/issues/1498
Expand All @@ -26,23 +26,23 @@ repos:
)$
args: [-I, .github/WORDLIST]
# Python formatting
- repo: https://github.com/psf/black
rev: 23.7.0
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 26.5.1
hooks:
- id: black
# R formatting
- repo: https://github.com/lorenzwalthert/precommit
rev: v0.1.2
rev: v0.4.3.9025
hooks:
- id: style-files
# general linting
- repo: https://github.com/pre-commit/mirrors-prettier
rev: v2.7.1
rev: v4.0.0-alpha.8
hooks:
- id: prettier
# enforce commit format
- repo: https://github.com/compilerla/conventional-pre-commit
rev: v2.3.0
rev: v4.4.0
hooks:
- id: conventional-pre-commit
stages: [commit-msg]
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,5 +125,5 @@ Want to **contribute** to this project? Check out the [contributing guidelines](

## References

<sup>**1.** Kurtzer GM, Sochat V, Bauer MW (2017). Singularity: Scientific containers for mobility of compute. PLoS ONE 12(5): e0177459.</sup>
<sup>**1.** Kurtzer GM, Sochat V, Bauer MW (2017). Singularity: Scientific containers for mobility of compute. PLoS ONE 12(5): e0177459.</sup>
<sup>**2.** Koster, J. and S. Rahmann (2018). "Snakemake-a scalable bioinformatics workflow engine." Bioinformatics 34(20): 3600.</sup>
19 changes: 19 additions & 0 deletions docs/pipeline-details/methods.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,22 +39,41 @@ Job execution and management is done using Snakemake (v. 6.8.2)[^20] using custo
## References

[^1]: Bolger, A.M., M. Lohse, and B. Usadel, Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 2014. 30(15): p. 2114-20.

[^2]: Li, H. and R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics, 2009. 25(14): p. 1754-60.

[^3]: Faust, G.G. and I.M. Hall, SAMBLASTER: fast duplicate marking and structural variant read extraction. Bioinformatics, 2014. 30(17): p. 2503-5.

[^4]: Van der Auwera, G.A. and B.D. O'Connor, Genomics in the cloud : using Docker, GATK, and WDL in Terra. First edition. ed. 2020, Sebastopol, CA: O'Reilly Media.

[^5]: Poplin, R., et al., Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv, 2018: p. 201178.

[^6]: Cibulskis, K., et al., Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol, 2013. 31(3): p. 213-9.

[^7]: Benjamin, D., et al., Calling Somatic SNVs and Indels with Mutect2. bioRxiv, 2019: p. 861054.

[^8]: Kim, S., et al., Strelka2: fast and accurate calling of germline and somatic variants. Nat Methods, 2018. 15(8): p. 591-594.

[^9]: Lai, Z., et al., VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research. Nucleic Acids Res, 2016. 44(11): p. e108.

[^10]: McLaren, W., et al., The Ensembl Variant Effect Predictor. Genome Biol, 2016. 17(1): p. 122.

[^11]: Memorial Sloan Kettering Cancer Center. vcf2maf. 2013; Available from: https://github.com/mskcc/vcf2maf.

[^12]: Boeva, V., et al., Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics, 2012. 28(3): p. 423-5.

[^13]: Favero, F., et al., Sequenza: allele-specific copy number and mutation profiles from tumor sequencing data. Ann Oncol, 2015. 26(1): p. 64-70.

[^14]: Pedersen, B. somalier: extract informative sites, evaluate relatedness, and perform quality-control on BAM/CRAM/BCF/VCF/GVCF. 2018; Available from: https://github.com/brentp/somalier.

[^15]: Wood, D.E., J. Lu, and B. Langmead, Improved metagenomic analysis with Kraken 2. Genome Biol, 2019. 20(1): p. 257.

[^16]: Wingett, S.W. and S. Andrews, FastQ Screen: A tool for multi-genome mapping and quality control. F1000Res, 2018. 7: p. 1338.

[^17]: Okonechnikov, K., A. Conesa, and F. Garcia-Alcalde, Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinformatics, 2016. 32(2): p. 292-4.

[^18]: Cingolani, P., et al., A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin), 2012. 6(2): p. 80-92.

[^19]: Ewels, P., et al., MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 2016. 32(19): p. 3047-8.

[^20]: Koster, J. and S. Rahmann, Snakemake-a scalable bioinformatics workflow engine. Bioinformatics, 2018. 34(20): p. 3600.
20 changes: 10 additions & 10 deletions docs/release-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,19 @@ Only approve or merge PRs that either update the changelog or have no user-facin

1. Determine the new version number according to [semantic versioning guidelines](https://semver.org/).
1. Update `CHANGELOG.md`:
- Edit the heading for the development version to match the new version.
- If needed, clean up the changelog -- fix any typos, optionally create subheadings for 'New features' and 'Bug fixes' if there are lots of changes, etc.
- Edit the heading for the development version to match the new version.
- If needed, clean up the changelog -- fix any typos, optionally create subheadings for 'New features' and 'Bug fixes' if there are lots of changes, etc.
1. Update the version in [`src/__init__.py`](https://github.com/CCBR/XAVIER/blob/main/src/__init__.py).
1. On GitHub, go to "Releases" and click "Draft a new release". <https://github.com/CCBR/XAVIER/releases/new>
- Choose a tag: same as the version number.
- Choose the target: most likely this should be the main branch, or a specific commit hash.
- Set the title as the new version number, e.g. **v3.0.2**
- Copy and paste the release notes from the CHANGELOG into the description box.
- Check the box "Set as the latest release".
- Click "Publish release".
- Choose a tag: same as the version number.
- Choose the target: most likely this should be the main branch, or a specific commit hash.
- Set the title as the new version number, e.g. **v3.0.2**
- Copy and paste the release notes from the CHANGELOG into the description box.
- Check the box "Set as the latest release".
- Click "Publish release".
1. Post release chores:
- Add a new "development version" heading to the top of `CHANGELOG.md`.
- Bump the version number in `src/__init__.py` to include `-dev`, e.g. `v3.0.2-dev` if you just released `v3.0.2`.
- Add a new "development version" heading to the top of `CHANGELOG.md`.
- Bump the version number in `src/__init__.py` to include `-dev`, e.g. `v3.0.2-dev` if you just released `v3.0.2`.

## How to install a release on biowulf

Expand Down
54 changes: 27 additions & 27 deletions docs/usage/run.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Each of the following arguments are required. Failure to provide a required argu

`--input INPUT [INPUT ...]`

> **Input FastQ or BAM file(s) to process.**
> **Input FastQ or BAM file(s) to process.**
> _type: file(s)_
>
> One or more FastQ files can be provided. The pipeline does NOT support single-end WES data. Please provide either a set of FastQ files or a set of BAM files. The pipeline does NOT support processing a mixture of FastQ files and BAM files. From the command-line, each input file should separated by a space. Globbing is supported! This makes selecting FastQ files easy. Input FastQ files should be gzipp-ed.
Expand All @@ -54,7 +54,7 @@ Each of the following arguments are required. Failure to provide a required argu

`--output OUTPUT`

> **Path to an output directory.**
> **Path to an output directory.**
> _type: path_
>
> This location is where the pipeline will create all of its output files, also known as the pipeline's working directory. If the provided output directory does not exist, it will be initialized automatically.
Expand All @@ -65,11 +65,11 @@ Each of the following arguments are required. Failure to provide a required argu

`--runmode {init,dryrun,run}` `

> **Execution Process.**
> **Execution Process.**
> _type: string_
>
> User should initialize the pipeline folder by first running `--runmode init`
> User should then perform a dry-run to list all steps the pipeline will take`--runmode dryrun`
> User should initialize the pipeline folder by first running `--runmode init`
> User should then perform a dry-run to list all steps the pipeline will take`--runmode dryrun`
> User should then perform the full run `--runmode run`
>
> **_Example:_** `--runmode init` _THEN_ `--runmode dryrun` _THEN_ `--runmode run`
Expand All @@ -78,15 +78,15 @@ Each of the following arguments are required. Failure to provide a required argu

`--genome {hg38, hg38_noalt, mm10, custom.json}`

> **Reference genome.**
> **Reference genome.**
> _type: string/file_
>
> This option defines the reference genome for your set of samples. On Biowulf, xavier does comes bundled with pre built reference files for human samples; however, it is worth noting that the pipeline does accept a pre-built resource bundle pulled with the cache sub command (coming soon). Currently, the pipeline only supports the human reference hg38 (both primary and no-alternate contig versions) and mouse (mm10).
>
> **_Pre built Option_**
> **_Pre built Option_**
> Here is a list of available pre built genomes on Biowulf: hg38, hg38_noalt, mm10.
>
> **_Custom Option_**
> **_Custom Option_**
> For users running the pipeline outside of Biowulf, a pre-built resource bundle can be pulled with the cache sub command (coming soon). Please supply the custom reference JSON file that was generated by the cache sub command.
>
> **_Example:_** `--genome hg38` _OR_ `--genome /data/${USER}/hg38/hg38.json`
Expand All @@ -95,7 +95,7 @@ Each of the following arguments are required. Failure to provide a required argu

`--targets TARGETS`

> **Exome targets BED file.**
> **Exome targets BED file.**
> _type: file_
>
> This file can be obtained from the manufacturer of the target capture kit that was used.
Expand All @@ -110,7 +110,7 @@ Each of the following arguments are optional and do not need to be provided.

`-h, --help`

> **Display Help.**
> **Display Help.**
> _type: boolean flag_
>
> Shows command's synopsis, help message, and an example command
Expand All @@ -121,7 +121,7 @@ Each of the following arguments are optional and do not need to be provided.

`--silent`

> **Silence standard output.**
> **Silence standard output.**
> _type: boolean flag_
>
> Reduces the amount of information directed to standard output when submitting master job to the job scheduler. Only the job id of the master job is returned.
Expand All @@ -132,16 +132,16 @@ Each of the following arguments are optional and do not need to be provided.

`--mode {local,slurm}`

> **Execution Method.**
> _type: string_
> **Execution Method.**
> _type: string_
> _default: slurm_
>
> Execution Method. Defines the mode or method of execution. Valid mode options include: local or slurm.
>
> **_local_**
> **_local_**
> Local executions will run serially on compute instance. This is useful for testing, debugging, or when a users does not have access to a high performance computing environment. If this option is not provided, it will default to a local execution mode.
>
> **_slurm_**
> **_slurm_**
> The slurm execution method will submit jobs to a cluster using a singularity backend. It is recommended running xavier in this mode as execution will be significantly faster in a distributed environment.
>
> **_Example:_** `--mode slurm`
Expand All @@ -150,7 +150,7 @@ Each of the following arguments are optional and do not need to be provided.

`--job-name JOB_NAME`

> **Set the name of the pipeline's master job.**
> **Set the name of the pipeline's master job.**
> _type: string_ > _default: pl:xavier_
>
> When submitting the pipeline to a job scheduler, like SLURM, this option always you to set the name of the pipeline's master job. By default, the name of the pipeline's master job is set to "pl:xavier".
Expand All @@ -161,7 +161,7 @@ Each of the following arguments are optional and do not need to be provided.

`--callers CALLERS [CALLERS ...]`

> **Variant Callers.**
> **Variant Callers.**
> _type: string(s)_ > _default: mutect2, mutect, strelka, vardict, varscan_
>
> List of variant callers to detect mutations. Please select from one or more of the following options: [mutect2, mutect, strelka, vardict, varscan]. Defaults to using all variant callers.
Expand All @@ -172,11 +172,11 @@ Each of the following arguments are optional and do not need to be provided.

`--pairs PAIRS`

> **Tumor normal pairs file.**
> **Tumor normal pairs file.**
> _type: file_
>
> This tab delimited file contains two columns with the names of tumor and normal pairs, one per line. The header of the file needs to be `Tumor` for the tumor column and `Normal` for the normal column. The base name of each sample should be listed in the pairs file. The base name of a given sample can be determined by removing the following extension from the sample's R1 FastQ file: `.R1.fastq.gz`.
> **Contents of example pairs file:**
> This tab delimited file contains two columns with the names of tumor and normal pairs, one per line. The header of the file needs to be `Tumor` for the tumor column and `Normal` for the normal column. The base name of each sample should be listed in the pairs file. The base name of a given sample can be determined by removing the following extension from the sample's R1 FastQ file: `.R1.fastq.gz`.
> **Contents of example pairs file:**
>
> ```
> Normal Tumor
Expand All @@ -190,7 +190,7 @@ Each of the following arguments are optional and do not need to be provided.

`--ffpe`

> **Apply FFPE correction.**
> **Apply FFPE correction.**
> _type: boolean flag_
>
> Runs an additional steps to correct strand orientation bias in Formalin-Fixed Paraffin-Embedded (FFPE) samples. Do NOT use this option with non-FFPE samples.
Expand All @@ -201,7 +201,7 @@ Each of the following arguments are optional and do not need to be provided.

`--cnv`

> **Call copy number variations (CNVs).**
> **Call copy number variations (CNVs).**
> _type: boolean flag_
>
> CNVs will only be called from tumor-normal pairs. If this option is provided without providing a --pairs file, CNVs will NOT be called.
Expand All @@ -212,8 +212,8 @@ Each of the following arguments are optional and do not need to be provided.

`--singularity-cache SINGULARITY_CACHE`

> **Overrides the $SINGULARITY_CACHEDIR environment variable.**
> _type: path_
> **Overrides the $SINGULARITY_CACHEDIR environment variable.**
> _type: path_
> _default: `--output OUTPUT/.singularity`_
>
> Singularity will cache image layers pulled from remote registries. This ultimately speeds up the process of pull an image from DockerHub if an image layer already exists in the singularity cache directory. By default, the cache is set to the value provided to the `--output` argument. Please note that this cache cannot be shared across users. Singularity strictly enforces you own the cache directory and will return a non-zero exit code if you do not own the cache directory! See the `--sif-cache` option to create a shareable resource.
Expand All @@ -224,7 +224,7 @@ Each of the following arguments are optional and do not need to be provided.

`--sif-cache SIF_CACHE`

> **Path where a local cache of SIFs are stored.**
> **Path where a local cache of SIFs are stored.**
> _type: path_
>
> Uses a local cache of SIFs on the filesystem. This SIF cache can be shared across users if permissions are set correctly. If a SIF does not exist in the SIF cache, the image will be pulled from Dockerhub and a warning message will be displayed. The `xavier cache` subcommand can be used to create a local SIF cache. Please see `xavier cache` for more information. This command is extremely useful for avoiding DockerHub pull rate limits. It also remove any potential errors that could occur due to network issues or DockerHub being temporarily unavailable. We recommend running xavier with this option when ever possible.
Expand All @@ -235,8 +235,8 @@ Each of the following arguments are optional and do not need to be provided.

`--threads THREADS`

> **Max number of threads for each process.**
> _type: int_
> **Max number of threads for each process.**
> _type: int_
> _default: 2_
>
> Max number of threads for each process. This option is more applicable when running the pipeline with `--mode local`. It is recommended setting this value to the maximum number of CPUs available on the host machine.
Expand Down
6 changes: 3 additions & 3 deletions docs/usage/unlock.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,10 @@ Use you can always use the `-h` option for information on a specific command.

`--output OUTPUT`

> **Output directory to unlock.**
> **Output directory to unlock.**
> _type: path_
>
> Path to a previous run's output directory. This will remove a lock on the working directory. Please verify that the pipeline is not running before running this command.
> Path to a previous run's output directory. This will remove a lock on the working directory. Please verify that the pipeline is not running before running this command.
> **_Example:_** `--output /data/$USER/WES_hg38`

### 2.2 Options
Expand All @@ -40,7 +40,7 @@ Each of the following arguments are optional and do not need to be provided.

`-h, --help`

> **Display Help.**
> **Display Help.**
> _type: boolean_
>
> Shows command's synopsis, help message, and an example command
Expand Down
Loading
Loading