T.Y. Blog: 1月 2023

2023年1月28日土曜日

Learning nextflow 1: get started

The other day, I installed workflow manager, nextflow on my computer. I installed using conda package manager with bioconda channel, that was quite easy way to install nextflow on my system.

Installation of nextflow - T.Y. Blog

There is a nice tutorial for nextflow (https://www.nextflow.io/docs/latest/index.html). In get started page, there is a simple sample script of workflow that convert a string.

On the Nextflow scripting page, the language used as nextflow scripting language was introduced (https://www.nextflow.io/docs/latest/script.html). It is an extension of the Groovy programming language.

Back to get started page, let's take a look at tutorial script,

tutorial.nf

params.str = 'Hello world!'

process splitLetters {

output:

path 'chunk_*'

"""

printf '${params.str}' | split -b 6 - chunk_

"""

}

process convertToUpper {

input:

path x

output:

stdout

"""

cat $x | tr '[a-z]' '[A-Z]'

"""

}

workflow {

splitLetters | flatten | convertToUpper | view { it.trim() }

}

At first, one string parameter was written.

params.str = 'Hello world!'

This string will be processed and finally, processed string will be printed.

Following params.str , there are two processes in this script.

First, a process named splitLetters was shown.

process splitLetters {

output:

path 'chunk_*'

"""

printf '${params.str}' | split -b 6 - chunk_

"""

}

In this process, first, the output: block was shown. In processes page (https://www.nextflow.io/docs/latest/process.html#outputs), this output: is explained,

The output block allows you to define the output channels of a process, similar to function outputs. A process may have at most one output block, and it must contain at least one output.

I am an absolute beginner about this language, but it seems that the description within """ """ is the prosessing conducted in this "splitLetters" process. It starts from printing the parameter, then it is piped to "split".

Next, a process named convertToUpper was shown.

process convertToUpper {

input:

path x

output:

stdout

"""

cat $x | tr '[a-z]' '[A-Z]'

"""

}

This process have input block because it receive the output of splitLetters. This process transforms the str received by input block to uppercase letters "tr '[a-z]' '[A-Z]'"

Finally the following output was shown on the console,

(base) tk$ nextflow run tutorial.nf

N E X T F L O W ~ version 22.10.5

Launching `tutorial.nf` [mad_mcclintock] DSL2 - revision: 5af7f346f0

executor > local (3)

[f5/563a73] process > splitLetters [100%] 1 of 1 ✔

[63/7f8425] process > convertToUpper (2) [100%] 2 of 2 ✔

HELLO

WORLD!

The first process "splitLetters" is executed once. It is shown in the output

[f5/563a73] process > splitLetters [100%] 1 of 1 ✔

The next process "convertToUpper" is executed twice, because there are two chunks, "Hello" and "world!".

[63/7f8425] process > convertToUpper (2) [100%] 2 of 2 ✔

If the params.str is shorter than 6, "split" in "splitLetters" will make only one chunk. and "convertToUpper" will be executed once.

N E X T F L O W ~ version 22.10.5

Launching `tutorial.nf` [curious_einstein] DSL2 - revision: 72afad773e

executor > local (2)

[f5/b5207d] process > splitLetters [100%] 1 of 1 ✔

[ab/b1650f] process > convertToUpper (1) [100%] 1 of 1 ✔

HELLO

If "--str" is specified on the commands, for example,

(base) tk$ nextflow run tutorial.nf --str 'Bonjour le monde'

then, the default parameter in "param.str" will be overridden by "Boujour le monde".

N E X T F L O W ~ version 22.10.5

Launching `tutorial.nf` [special_wright] DSL2 - revision: 3f2cae5687

executor > local (4)

[55/9358d8] process > splitLetters [100%] 1 of 1 ✔

[55/6d93c7] process > convertToUpper (2) [100%] 3 of 3 ✔

BONJOU

ONDE

R LE M

In this case, the second process were executed three times. You can see "3 of 3" on the line of "convertToUpper" row.

In the getting started page, there is a Tip about the delimiter of a nested scope.

As of version 20.11.0-edge, any . (dot) character in a parameter name is interpreted as the delimiter of a nested scope. For example, --foo.bar Hello will be interpreted as params.foo.bar. If you want to have a parameter name that contains a . (dot) character, escape it using the back-slash character, e.g. --foo\.bar Hello.

I modified the tutorial.nf to check this tips. For example, I revised the delimiter from params.str to params.str1

tutorial2.nf

params.str1 = 'Hello world!'

process splitLetters {

output:

path 'chunk_*'

"""

printf '${params.str1}' | split -b 6 - chunk_

"""

}

For this script I can specify the default parameter by "--str1" instead of "--str",

(base) tk$ nextflow run tutorial2.nf --str1 'Bonjour le monde'

N E X T F L O W ~ version 22.10.5

Launching `tutorial.nf` [grave_gilbert] DSL2 - revision: 421400f9e3

executor > local (4)

[2d/b9e426] process > splitLetters [100%] 1 of 1 ✔

[d7/1053d8] process > convertToUpper (2) [100%] 3 of 3 ✔

BONJOU

ONDE

R LE M

I cannot specify the default parameter by "--str".

(base) tk$ nextflow run tutorial2.nf --str 'Bonjour le monde'

N E X T F L O W ~ version 22.10.5

Launching `tutorial.nf` [gloomy_lamport] DSL2 - revision: 421400f9e3

executor > local (3)

[25/dbde42] process > splitLetters [100%] 1 of 1 ✔

[00/93720f] process > convertToUpper (2) [100%] 2 of 2 ✔

HELLO1

If the number was used, the error was returned,

tutorial3.nf

params.1 = 'Hello world!'

process splitLetters {

output:

path 'chunk_*'

"""

printf '${params.1}' | split -b 6 - chunk_

"""

}

The output was,

(base) tk$ nextflow run tutorial3.nf

N E X T F L O W ~ version 22.10.5

Launching `tutorial3.nf` [tiny_joliot] DSL2 - revision: 7f88a4f3a1

Script compilation error

- file : /XXX/YYY/ZZZ/tutorial3.nf

- cause: The LHS of an assignment should be a variable or a field accessing expression @ line 1, column 7.

params.1 = 'Hello12'

1 error

These tutorials can be found at:

https://www.nextflow.io/docs/latest/getstarted.html#your-first-script

----------------

Revised: 2023.1.29

2023年1月22日日曜日

Installation of Nextflow

What is Nextflow？

I am recently working on setting up and learning the usage of Nextflow, the managing system of workflows combining various processing and calculations.

Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting language.
https://www.nextflow.io

It is important to conduct reproducible analysis of Omics data, though the reproducibility is not frequently kept through the entire research process of studies by various reasons. There are many discussions how to improve and guarantee the reproducibility of computational analyses especially in biological, life science study (for example, here). Nextflow has the potential of analysis managing system that could be used for describing bioinformatic analysis pipelines with enough reproducibility.

Nextflow is already used in the bioinformatic workflows such as EPI2ME labs long-read data analysis tools (https://labs.epi2me.io/wfindex). I got to know this workflow management system when I searched the tools of nanopore analysis applications to analyze transcriptome data. I found wf-transcriptomes from epi2me-labs (https://github.com/epi2me-labs/wf-transcriptomes) and this tool uses nextflow. I think that not only it would be a convenient method to install the analysis tools for long-read sequence, but also it could be used in my daily works to describe the workflows. That's why I started to learn the usage of this tool.

Quick check and getting started

Nextflow can be used on the Linux, macOS, Unix, etc. (so called POSIX compatible system). in the webpage, 3 steps to install the nextflow are shown,

Java11 or later is installed on one's system
In terminal, curl -s https://get.nextflow.io | bash
Run nextflow, for example, ./nextflow run hello

I usually use conda-forge in my personal computer. Nextflow can be also installed by using bioconda package manager. Maybe it would be a easiest way to quick check and try this tool if you use conda manager.

In console, I executed the following command,

(base) tk$ conda install -c bioconda nextflow

In the package plan, nextflow and openjdk was shown.

The following packages will be downloaded:

package | build

---------------------------|-----------------

coreutils-8.25 | 1 1.7 MB bioconda

nextflow-22.10.4 | h4a94de4_0 24.8 MB bioconda

openjdk-17.0.3 | hbc0c0cd_5 157.4 MB conda-forge

------------------------------------------------------------

Total: 183.9 MB

I proceeded this plan. It was quite easy and soon completed. I checked the java,

(base) tk$ java -version

openjdk version "17.0.3" 2022-04-19 LTS

OpenJDK Runtime Environment Zulu17.34+19-CA (build 17.0.3+7-LTS)

OpenJDK 64-Bit Server VM Zulu17.34+19-CA (build 17.0.3+7-LTS, mixed mode, sharing)

The help message was shown by executing nextflow -h ,

Usage: nextflow [options] COMMAND [arg...]

Options:

-C

Use the specified configuration file(s) overriding any defaults

-D

Set JVM properties

-bg

Execute nextflow in background

-c, -config

Add the specified file to configuration set

-config-ignore-includes

Disable the parsing of config includes

-d, -dockerize

Launch nextflow via Docker (experimental)

-h

Print this help

-log

Set nextflow log file path

-q, -quiet

Do not print information messages

-syslog

Send logs to syslog server (eg. localhost:514)

-v, -version

Print the program version

Commands:

clean Clean up project cache and work directories

clone Clone a project into a folder

config Print a project configuration

console Launch Nextflow interactive console

drop Delete the local copy of a project

help Print the usage help for a command

info Print project and system runtime information

kuberun Execute a workflow in a Kubernetes cluster (experimental)

list List all downloaded projects

log Print executions log and runtime info

pull Download or update a project

run Execute a pipeline project

secrets Manage pipeline secrets (preview)

self-update Update nextflow runtime to the latest available version

view View project script file(s)

The version installed can be shown by typing nextflow -version

N E X T F L O W

version 22.10.4 build 5836

created 09-12-2022 09:58 UTC (18:58 JDT)

cite doi:10.1038/nbt.3820

http://nextflow.io

I conducted the classic "Hello world" prepared for demo,

(base) tk$ nextflow run hello

N E X T F L O W ~ version 22.10.4

Pulling nextflow-io/hello ...

downloaded from https://github.com/nextflow-io/hello.git

Launching `https://github.com/nextflow-io/hello` [kickass_brenner] DSL2 - revision: 4eab81bd42 [master]

executor > local (4)

[3b/ff8648] process > sayHello (2) [100%] 4 of 4 ✔

Hola world!

Hello world!

Bonjour world!

Ciao world!

It seems that nextflow was successfully installed on my system.

It was quite easy steps when I used bioconda for installation. But also when installing without conda, it might be not so difficult.

What's next?

The document can be available from here: https://www.nextflow.io/docs/latest/index.html

I would like to study how to use this system and incorporate into my research activity in my home.

登録: 投稿 (Atom)