Available Software

System software

Nodes on the Extreme cluster use the following system software as default:

  • CentOS 6.8
  • GCC 4.7.7
  • OpenMPI 1.6.2
  • Java 1.6.0_35
  • Python 2.6.6
  • R 3.2.0

Modules repository

We’re using modules software environment management to load and unload modules dynamically in a clean fashion and maintain different version of the software. User can use “module avail” command to view list of at the available software modules. To use particular module you can use “module load apps/{module_name}” as this command will put specified module in your path. Please include this load command in the jobs you submit to cluster.

We’re continuously updating the following list of installed software on extreme cluster.

Applications (95 packages) –

  • Abaqus (6.13)  An Abaqus environment that provides a simple, consistent interface for creating, submitting, monitoring, and evaluating results from Abaqus simulations.
  • Abinit (7.10.5) – ABINIT is a package whose main program allows one to find the total energy, charge density and electronic structure of systems made of electrons and nuclei (molecules and periodic solids) within Density Functional Theory (DFT), using pseudopotentials and a planewave or wavelet basis.
  • Abyss (1.9.0) – ABySS is a de novo, parallel, paired-end sequence assembler that is designed for short reads.
  • Amber 12 – Amber is the collective name for a suite of programs that allow users to carry out molecular dynamics simulations, particularly on biomolecules.                  
  • AMOS (3.1.0) – A Modular Open Source Assembler.
  • ANNOVAR  –  An efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes.
  • Apt1.15.2
  • ASPECT (1.3) – An extensible code written in C++ to support research in simulating convection in the Earth mantle and elsewhere.
  • Bamm (2.3.0) – Program for modeling complex dynamics of speciation, extinction, and trait evolution on           phylogenetic trees.
  • Bedtools2 (2.19.1) –   Bedtools utilities are tools for a wide-range of genomics analysis tasks.
  • BLAST+ (2.5.0) – BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequence databases and calculates the statistical significance. (Note: The software would require the user to download the required database set and change the environment variables accordingly.)
  • Bowtie2 (2.1.0)/(2.2.2)/(2.2.5) – Ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences.
  • Bwa (0.7.5a) – BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome.
  • CEAS (1.0.2) – A tool designed to characterize genome-wide protein-DNA interaction patterns from ChIP-chip and ChIP-Seq of both sharp and broad binding factors.
  • CLOVER
  • COMSOL (4.3b) – A finite element analysis, solver and Simulation software / FEA Software package for various physics and engineering applications, especially coupled phenomena, or multiphysics.
  • CONN (v.15) – A Matlab-based cross-platform software for the computation, display, and analysis of functional connectivity in fMRI (fcMRI).
  • Converge-2.1.0 – a multipurpose computational fluid dynamics (CFD) code with innovative features including a fully coupled automated mesh created at runtime and Adaptive Mesh Refinement (AMR).
  • Cp2k-2.5.1-ssmpCP2K is a freely available (GPL) program, written in Fortran 95, to perform atomistic and molecular simulations of solid state, liquid, molecular and biological systems.
  • Cufflinks (2.2.0) –  Transcriptome assembly and differential expression analysis for RNA-Seq.
  • DANPOS (2.2.2) – A toolkit for Dynamic Analysis of Nucleosome and Protein Occupancy by Sequencing, version 2.
  • Deal.II (8.2.1) – A C++ software library supporting the creation of finite element codes and an open community of users and developers.
  • Espresso (Quantum-Espresso-5.3) – An integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling at the nanoscale. It is based on density-functional theory, plane waves, and pseudopotentials.
  • FALCON – A set of tools for fast aligning long reads for consensus and assembly.
  • FastQC (0.10.1) – FastQC is a quality control application for high throughput sequence data. It reads in sequence data in a variety of formats and can either provide an interactive application to review the results of several different QC checks, or create an HTML based report which can be integrated into a pipeline.
  • Fastx The FASTX-Toolkit is a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing.
  • Gaussian (09) – It provides state-of-the-art capabilities for electronic structure modeling.
  • Gtk (3.1.1)
  • Hmmer (2.3.2)(3.1b2) – HMMER is used for searching sequence databases for sequence homologs, and for making sequence alignments. It implements methods using probabilistic models called profile hidden Markov models (profile HMMs).
  • HPL (2.1) – Benchmark for High performance clusters.
  • HTSeq (0.6.1) – A Python package that provides infrastructure to process data from high-throughput sequencing assays.
  • JAGS (3.4.0) – JAGS is Just Another Gibbs Sampler.  It is a program for analysis of Bayesian hierarchical models using Markov Chain Monte Carlo (MCMC) simulation.
  • Java-genomics-toolkit – This is a collection of applications for genomics data processing, primarily high-throughput next-generation sequencing.
  • Lammps (11Nov13-mpich3) – Molecular Dynamics Simulator .
  • MACS (1.4.2) – Model-based Analysis of ChIP-Seq.
  • Mafft (7) – MAFFT is a multiple sequence alignment program for unix-like operating systems.  It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of <∼200 sequences), FFT-NS-2 (fast; for alignment of <∼30,000 sequences), etc.
  • mapDamage – Tracking and quantifying damage parrerns in ancient DNA sequences.
  • Marc(MSC) – Simulate products more accurately with the industry’s leading nonlinear FEA solver technology.
  • Mathematica-10.3 – A mathematical tool for analysis.
  • MATLAB (R2010b) – A multi-paradigm numerical computing environment and fourth-generation programming language.
  • Meme (4.9.1) – Motif-based sequence analysis tools.
  • MetaVelvet (1.2.02) – An extension of Velvet assembler to de novo metagenomic assembly.
  • Moose Framework (PETSc) – The Multiphysics Object-Oriented Simulation Environment (MOOSE) is a finite-element, multiphysics framework.
  • Mrbayes (3.2.2)/(3.2.3)(3.2.5) – A program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models.
  • MUSCLEIt is one of the best-performing multiple alignment programs according to published benchmark tests, with accuracy and speed that are consistently better than CLUSTALW.
  • NAMD (2.9)(2.10) – A parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems.
  • Ncbi_cxx- (12.0.0) NCBI Toolkit – NCBI C++ Toolkit is a public-domain collection of portable libraries, consisting of a cross-platform application framework and a set of utilities and supporting classes to work with biological data.
  • Nclncargs (6.2.0) – An  interpreted language designed by the National Center for Atmospheric Research for scientific visualization and data processing.
  • Oases (0.2.08) – De novo transcriptome assembler for very short reads.
  • OpenFOAM (2.4.0) – OpenFOAM is free, open source software for computational fluid dynamics (CFD).
  • P4est (1.1) – Parallel adaptive mesh refinement library. (FAST and DEBUG modes).
  • Perl (5.20.2) – Perl is a family of high-level, general-purpose, interpreted, dynamic programming languages.
  • Picard-tools (1.107) – A set of tools (in Java) for working with next generation sequencing data in the BAM.
  • PICRUSt (1.0.0) – A bioinformatics software package designed to predict metagenome functional content from marker gene surveys and full genomes.
  • Pplacer (1.1) – The pplacer binary actually does phylogenetic placement and produces place files, guppy does all of the downstream analysis of placements, and rppr does useful things having to do with reference packages.
  • Prodigal (2.6.3) – Fast, reliable protein-coding gene prediction for prokaryotic genomes.
  • Pyicoteo (2.0.7) – Pyicos is a command line utility for the conversion and manipulation of genomic coordinates files.
  • Qualimap (0.8.1) –  Evaluating next generation sequencing alignment data.
  • R (3.0.2)/(3.2.0) – R is a programming language and software environment for statistical computing and graphics.
  • Scope
  • SICER (1.1) – A clustering approach for identification of enriched domains from histone modification ChIP-Seq data.
  • Simpson (4.1.1) –  A general-purpose software package for simulation virtually all kinds of solid-state NMR experiments.
  • SOAPdenovo2 – A novel short-read assembly method that can build a de novo draft assembly for the human-sized genomes.
  • SPM12 (MATLAB) – Statistical Parametric Mapping refers to the construction and assessment of spatially extended statistical processes used to test hypotheses about functional imaging data.
  • STAR (2.3.0e) – STAR is an ultrafast universal RNA-seq aligner.
  • STORM-Cread (0.84) – Comprehensive Regulatory Element Analysis and Discovery.
  • Subread (1.4.6) – A tool kit for processing next-gen sequencing data.
  • Tophat (2.0.11)(2.1.0) – A fast splice junction mapper for RNA-Seq reads.
  • Trimmomatic (0.32) – Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.
  • Trinityrnaseq (2.0.2) –  Package for the efficient and robust de novo reconstruction of transcriptomes from RNA-seq data.
  • Usearch (8.1.1861) USEARCH offers search and clustering algorithms that are often orders of magnitude faster than BLAST.
  • Vcftools (0.1.11) – A program package designed for working with VCF files, such as those generated by the 1000 Genomes Project.
  • VASP (5.3.5) – The Vienna Ab initio Simulation Package (VASP) is a computer program for atomic scale materials modelling.
  • Velvet (1.2.10) – Velvet is a de novo genomic assembler specially designed for short read sequencing technologies.
  • Vicuna (1.3) – A de novo assembly program targeting populations with high mutation rates.
  • Visit (2.9.2) VisIt is an Open Source, interactive, scalable, visualization, animation and analysis tool.
  • Vmd (1.9.2) – VMD is a molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting.
  • Zlib (1.2.8) – A massively spiffy yet delicately unobtrusive compression library.

Tools and libraries (62 packages)-

  • Autodock Vina (1.1.2) – AutoDock Vina is an open-source program for doing molecular docking.
  • Basespacepy – A Python based SDK to be used in the development of Apps and scripts for working with Illumina’s BaseSpace cloud-computing solution for next-gen sequencing data analysis.
  • BEAGLE – BEAGLE is a high-performance library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics package.
  • Biom-format (2.1.5) – Python tool designed to be a general-use format for representing biological sample by observation contingency tables.
  • BioPythonThe Biopython Project is an international association of developers of freely available Python tools for computational molecular biology.
  • Boost (1.58.0)/(1.55.0)  – Boost provides free peer-reviewed portable C++ source libraries.
  • CheckM – CheckM provides a set of tools for assessing the quality of genomes recovered from isolates, single cells, or metagenomes.
  • Cmake (2.8.11.2) (3.2.3)– A cross-platform, open-source build system.
  • CPLEX Studio (12.6.1) – Analytical decision support toolkit for rapid development and deployment of optimization models using mathematical and constraint programming.
  • Crispresso -Analysis of CRISPR-Cas9 genome editing outcomes from deep sequencing data.
  • Cython (0.23.4) – Cython is an optimising static compiler for both the Python programming language and the extended Cython programming language.
  • Dendropy – A Python library for phylogenetic computing.
  • ELK (3.1.12) – An all-electron full-potential linearised augmented-plane wave (FP-LAPW) code with many advanced features.
  • FFTW (3.3.3) – FFTW is a C subroutine library for computing the discrete Fourier transform (DFT).
  • GATK (3.2-0) (3.6) – A software package developed at the Broad Institute to analyze high-throughput sequencing data.
  • GNUPlot (5.0.3) A portable command-line driven graphing utility.
  • GSL (1.16) – A numerical library for C and C++ programmers.
  • H5py The h5py package is a Pythonic interface to the HDF5 binary data format.
  • Hdf5 HDF5 is a data model, library, and file format for storing and managing data.
  • Hmmlearn (0.2.0) – hmmlearn is a set of algorithms for unsupervised learning and inference of Hidden Markov Models.
  • MACS2 (2.1) – A python module to provide a powerful ChIP-Seq analysis method.
  • Matplotlib (1.3.1) (1.4.3) – Matplotlib is a python 2D plotting library.
  • Moose Framework – The Multiphysics Object-Oriented Simulation Environment (MOOSE) is a finite-element, multiphysics framework.
  • Mpi4py (2.0.0)  – An MPI implementation for Python.
  • MPICH (2-1.5)/(3.0.4) – A high performance and widely portable implementation of the Message Passing Interface (MPI) standard.
  • Numpy (1.8.1)(1.11.1) – NumPy is the fundamental package for scientific computing with Python.
  • Octave (4.0.0) – A high-level interpreted language, primarily intended for numerical computations.
  • Omics Pipe – An open-source, modular computational platform that automates best practice multi-omics data analysis pipelines
  • OpenBabel (2.3.2)Open Babel is a chemical toolbox designed to speak the many languages of chemical data.
  • OpenBLAS (0.2.18) – OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
  • Pbh5tools – Tools for manipulating HDF5 files produced by Pacific Biosciences.
  • PETSc (3.6.3) – A suite of data structures and routines for the scalable (parallel) solution of scientific applications modeled by partial differential equations.
  • PyGTK (2.16.0) – PyGTK lets you to easily create programs with a graphical user interface using the Python.
  • PySam (0.9.1.4) – Pysam is a python module for reading and manipulating files in the SAM/BAM format.
  • PyYAML(3.11) – A data serialization format designed for human readability and interaction with scripting languages.
  • Qiime (1.9.1) – An open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data.
  • Samtools (1.3) – Samtools is a suite of programs for interacting with high-throughput sequencing data.
  • Scikit-Learn (0.17.1) – Simple and efficient tools for data mining, machine learning and data analysis.
  • SciPy (0.13.3)(0.18.0) –  A Python-based ecosystem of open-source software for mathematics, science, and engineering.
  • ScreamingBackPack – A utility for handing syncing of remote and local data resources. Developed for use in CheckM but hopefully generic enough to be used elsewhere.
  • SymPy (0.7.5) –  a Python library for symbolic mathematics.
  • Qt4 (4.8.7) (qmake) – Application development environment.
  • Ruffus – Ruffus is a Computation Pipeline library for python. It is open-sourced, powerful and user-friendly, and widely used in science and bioinformatics.
  • TCL (8.6.4) – (Tool Command Language) is a very powerful but easy to learn dynamic programming language
  • Thor and Odin – THOR & ODIN is an HMM-based approach to detect and analyze differential peaks in two sets of ChIP-seq data from distinct biological conditions with replicates.
  • UCSC – Genome Browser and Blat application binaries.
  • VarScan (2.3.6) – A massively parallel sequencing technology for the study of genetics.

Compilers (15 packages)-