2023年1月28日土曜日

Learning nextflow 1: get started

The other day, I installed workflow manager, nextflow on my computer. I installed using conda package manager with bioconda channel, that was quite easy way to install nextflow on my system. 

Installation of nextflow - T.Y. Blog

There is a nice tutorial for nextflow (https://www.nextflow.io/docs/latest/index.html). In get started page, there is a simple sample script of workflow that convert a string. 

On the Nextflow scripting page, the language used as nextflow scripting language was introduced (https://www.nextflow.io/docs/latest/script.html). It is an extension of the Groovy programming language.


Back to get started page, let's take a look at tutorial script,

 tutorial.nf 


params.str = 'Hello world!'




process splitLetters {

  output:

    path 'chunk_*'


  """

  printf '${params.str}' | split -b 6 - chunk_

  """

}





process convertToUpper {

  input:

    path x

  output:

    stdout


  """

  cat $x | tr '[a-z]' '[A-Z]'

  """

}





workflow {

  splitLetters | flatten | convertToUpper | view { it.trim() }

}




At first, one string parameter was written.


params.str = 'Hello world!'



This string will be processed and finally, processed string will be printed.
Following  params.str , there are two processes in this script.

First, a process named splitLetters was shown.


process splitLetters {

  output:

    path 'chunk_*'


  """

  printf '${params.str}' | split -b 6 - chunk_

  """

}


In this process, first, the output: block was shown. In processes page (https://www.nextflow.io/docs/latest/process.html#outputs), this output: is explained,

The  output  block allows you to define the output channels of a process, similar to function outputs. A process may have at most one output block, and it must contain at least one output.

I am an absolute beginner about this language, but it seems that the description within """ """ is the prosessing conducted in this "splitLetters" process. It starts from printing the parameter, then it is piped to "split".

Next, a process named convertToUpper was shown.


process convertToUpper {

  input:

    path x

  output:

    stdout


  """

  cat $x | tr '[a-z]' '[A-Z]'

  """

}


This process have  input  block because it receive the output of splitLetters. This process transforms the str received by input block to uppercase letters "tr '[a-z]' '[A-Z]'"

Finally the following output was shown on the console,


(base) tk$ nextflow run tutorial.nf 

N E X T F L O W  ~  version 22.10.5

Launching `tutorial.nf` [mad_mcclintock] DSL2 - revision: 5af7f346f0

executor >  local (3)

[f5/563a73] process > splitLetters       [100%] 1 of 1 ✔

[63/7f8425] process > convertToUpper (2) [100%] 2 of 2 ✔

HELLO

WORLD!



The first process "splitLetters" is executed once. It is shown in the output


[f5/563a73] process > splitLetters       [100%] 1 of 1 ✔



The next process "convertToUpper" is executed twice, because there are two chunks, "Hello" and "world!". 


[63/7f8425] process > convertToUpper (2) [100%] 2 of 2 ✔



If the params.str is shorter than 6, "split" in "splitLetters" will make only one chunk. and  "convertToUpper" will be executed once.


N E X T F L O W  ~  version 22.10.5

Launching `tutorial.nf` [curious_einstein] DSL2 - revision: 72afad773e

executor >  local (2)

[f5/b5207d] process > splitLetters       [100%] 1 of 1 ✔

[ab/b1650f] process > convertToUpper (1) [100%] 1 of 1 ✔

HELLO



If "--str" is specified on the commands, for example,


(base) tk$ nextflow run tutorial.nf --str 'Bonjour le monde'


then, the default parameter in "param.str" will be overridden by "Boujour le monde".


N E X T F L O W  ~  version 22.10.5

Launching `tutorial.nf` [special_wright] DSL2 - revision: 3f2cae5687

executor >  local (4)

[55/9358d8] process > splitLetters       [100%] 1 of 1 ✔

[55/6d93c7] process > convertToUpper (2) [100%] 3 of 3 ✔

BONJOU

ONDE

R LE M


In this case, the second process were executed three times. You can see "3 of 3" on the line of "convertToUpper" row. 


In the getting started page, there is a Tip about the delimiter of a nested scope.

As of version 20.11.0-edge, any . (dot) character in a parameter name is interpreted as the delimiter of a nested scope. For example, --foo.bar Hello will be interpreted as params.foo.bar. If you want to have a parameter name that contains a . (dot) character, escape it using the back-slash character, e.g. --foo\.bar Hello.

I modified the tutorial.nf to check this tips. For example, I revised the delimiter from  params.str  to  params.str1 

 tutorial2.nf 

params.str1 = 'Hello world!'




process splitLetters {

  output:

    path 'chunk_*'


  """

  printf '${params.str1}' | split -b 6 - chunk_

  """

}



For this script I can specify the default parameter by "--str1" instead of "--str",


(base) tk$ nextflow run tutorial2.nf --str1 'Bonjour le monde'


N E X T F L O W  ~  version 22.10.5

Launching `tutorial.nf` [grave_gilbert] DSL2 - revision: 421400f9e3

executor >  local (4)

[2d/b9e426] process > splitLetters       [100%] 1 of 1 ✔

[d7/1053d8] process > convertToUpper (2) [100%] 3 of 3 ✔

BONJOU

ONDE

R LE M




I cannot specify the default parameter by "--str".


(base) tk$ nextflow run tutorial2.nf --str 'Bonjour le monde'


N E X T F L O W  ~  version 22.10.5

Launching `tutorial.nf` [gloomy_lamport] DSL2 - revision: 421400f9e3

executor >  local (3)

[25/dbde42] process > splitLetters       [100%] 1 of 1 ✔

[00/93720f] process > convertToUpper (2) [100%] 2 of 2 ✔

HELLO1

2




If the number was used, the error was returned,

 tutorial3.nf 


params.1 = 'Hello world!'




process splitLetters {

  output:

    path 'chunk_*'


  """

  printf '${params.1}' | split -b 6 - chunk_

  """

}



The output was,


(base) tk$ nextflow run tutorial3.nf 


N E X T F L O W  ~  version 22.10.5

Launching `tutorial3.nf` [tiny_joliot] DSL2 - revision: 7f88a4f3a1

Script compilation error

- file : /XXX/YYY/ZZZ/tutorial3.nf

- cause: The LHS of an assignment should be a variable or a field accessing expression @ line 1, column 7.

   params.1 = 'Hello12'

         ^


1 error







These tutorials can be found at:


----------------
Revised: 2023.1.29












2023年1月22日日曜日

Installation of Nextflow

What is Nextflow?


I am recently working on setting up and learning the usage of Nextflow, the managing system of workflows combining various processing and calculations. 

Nextflow enables scalable and reproducible scientific workflows using software containers. It allows the adaptation of pipelines written in the most common scripting language.

 https://www.nextflow.io

It is important to conduct reproducible analysis of Omics data, though the reproducibility is not frequently kept through the entire research process of studies by various reasons. There are many discussions how to improve and guarantee the reproducibility of computational analyses especially in biological, life science study (for example, here). Nextflow has the potential of analysis managing system that could be used for describing bioinformatic analysis pipelines with enough reproducibility. 

Nextflow is already used in the bioinformatic workflows such as EPI2ME labs long-read data analysis tools (https://labs.epi2me.io/wfindex). I got to know this workflow management system when I searched the tools of nanopore analysis applications to analyze transcriptome data. I found wf-transcriptomes from epi2me-labs (https://github.com/epi2me-labs/wf-transcriptomes) and this tool uses nextflow. I think that not only it would be a convenient method to install the analysis tools for long-read sequence, but also it could be used in my daily works to describe the workflows. That's why I started to learn the usage of this tool.


Quick check and getting started


Nextflow can be used on the Linux, macOS, Unix, etc. (so called POSIX compatible system). in the webpage, 3 steps to install the nextflow are shown,

  1. Java11 or later is installed on one's system
  2. In terminal, curl -s https://get.nextflow.io | bash
  3. Run nextflow, for example, ./nextflow run hello

I usually use conda-forge in my personal computer. Nextflow can be also installed by using bioconda package manager. Maybe it would be a easiest way to quick check and try this tool if you use conda manager. 


In console, I executed the following command,


(base) tk$ conda install -c bioconda nextflow


In the package plan, nextflow and openjdk was shown.


The following packages will be downloaded:


    package                    |            build

    ---------------------------|-----------------

    coreutils-8.25             |                1         1.7 MB  bioconda

    nextflow-22.10.4           |       h4a94de4_0        24.8 MB  bioconda

    openjdk-17.0.3             |       hbc0c0cd_5       157.4 MB  conda-forge

    ------------------------------------------------------------

                                           Total:       183.9 MB



I proceeded this plan. It was quite easy and soon completed. I checked the java,


(base) tk$ java -version

openjdk version "17.0.3" 2022-04-19 LTS

OpenJDK Runtime Environment Zulu17.34+19-CA (build 17.0.3+7-LTS)

OpenJDK 64-Bit Server VM Zulu17.34+19-CA (build 17.0.3+7-LTS, mixed mode, sharing)



The help message was shown by executing  nextflow -h ,


Usage: nextflow [options] COMMAND [arg...]


Options:

  -C

     Use the specified configuration file(s) overriding any defaults

  -D

     Set JVM properties

  -bg

     Execute nextflow in background

  -c, -config

     Add the specified file to configuration set

  -config-ignore-includes

     Disable the parsing of config includes

  -d, -dockerize

     Launch nextflow via Docker (experimental)

  -h

     Print this help

  -log

     Set nextflow log file path

  -q, -quiet

     Do not print information messages

  -syslog

     Send logs to syslog server (eg. localhost:514)

  -v, -version

     Print the program version


Commands:

  clean         Clean up project cache and work directories

  clone         Clone a project into a folder

  config        Print a project configuration

  console       Launch Nextflow interactive console

  drop          Delete the local copy of a project

  help          Print the usage help for a command

  info          Print project and system runtime information

  kuberun       Execute a workflow in a Kubernetes cluster (experimental)

  list          List all downloaded projects

  log           Print executions log and runtime info

  pull          Download or update a project

  run           Execute a pipeline project

  secrets       Manage pipeline secrets (preview)

  self-update   Update nextflow runtime to the latest available version

  view          View project script file(s)




The version installed can be shown by typing  nextflow -version 


      N E X T F L O W

      version 22.10.4 build 5836

      created 09-12-2022 09:58 UTC (18:58 JDT)

      cite doi:10.1038/nbt.3820

      http://nextflow.io



I conducted the classic "Hello world" prepared for demo,


(base) tk$ nextflow run hello

N E X T F L O W  ~  version 22.10.4

Pulling nextflow-io/hello ...

 downloaded from https://github.com/nextflow-io/hello.git

Launching `https://github.com/nextflow-io/hello` [kickass_brenner] DSL2 - revision: 4eab81bd42 [master]

executor >  local (4)

[3b/ff8648] process > sayHello (2) [100%] 4 of 4 ✔

Hola world!


Hello world!


Bonjour world!


Ciao world!



 

It seems that nextflow was successfully installed on my system.
It was quite easy steps when I used bioconda for installation. But also when installing without conda, it might be not so difficult.

What's next?


The document can be available from here: https://www.nextflow.io/docs/latest/index.html
I would like to study how to use this system and incorporate into my research activity in my home.