Skip to content

Quick Start

This guide will walk you through the basic workflow for importing genome data into the Arx folder structure.

Prerequisites

  • Arx Tools installed (see Installation)
  • Genome data files (assembly, annotations, etc.)

Environment Variables

Set these environment variables before starting:

# Set the path to your Arx folder structure
export FOLDER_STRUCTURE=/path/to/folder_structure

# Set the path to your import configuration file (optional)
export ARX_IMPORT_SETTINGS=/path/to/import_config.json

Basic Workflow

1. Initialize Folder Structure

First, create the basic Arx folder structure:

init_folder_structure

This creates the following structure:

folder_structure
├── organisms
├── annotations.json
├── annotation-descriptions
│   ├── SL.tsv
│   ├── KO.tsv
│   ├── KR.tsv
│   ├── EC.tsv
│   └── GO.tsv
├── orthologs
└── pathway-maps
    ├── type_dictionary.json
    └── svg

2. Prepare a genome

Before importing, you may want to run annotation pipelines to generate the necessary files. (See File Formats)

a) Prokka Annotation

To get correct locus tags with Prokka:

prokka \
  --strain STRAIN \ 
  --locustag STRAIN.1 \
  --prefix STRAIN.1 \
  --genus Mycoplasma --species genitalium \
  --out /prokka/out/dir \
  assembly.fasta

b) PGAP Annotation

For PGAP, configure at least these lines in submol.yaml:

organism:
  genus_species: 'Mycoplasma genitalium'
  strain: 'STRAIN'
locus_tag_prefix: 'STRAIN.1'

c) Download from NCBI

If you need to download genomes from NCBI:

download_ncbi_genome \
  --assembly_name GCF_005864195.1 \
  --out_dir /path/to/outdir \
  --new_locus_tag_prefix STRAIN.1_

Additional Annotations Recommended

For best results, we recommend enriching your annotations with additional databases such as eggNOG, KEGG (for use with the Arx Pathways tool), CAZy, antimicrobial resistance annotations, or phage identification.

For further details or assistance, feel free to reach out to Abrinca at infoabrinca.com.

3. Import Genome Data

Import your genome data into the folder structure using the import_genome command.

import_genome --import_dir=/path/to/genome/data --organism STRAIN --genome STRAIN.1

This will: - Copy genome files to the appropriate locations - Generate metadata files (genome.json, organism.json) - Organize files according to the default structure

5. Orthology Analysis (Optional)

For orthology analysis with OrthoFinder:

# Prepare OrthoFinder analysis
init_orthofinder --representatives_only

# Run OrthoFinder (command will be printed)
# ...

# Import results
import_orthofinder --which hog

Import Configuration

The import configuration file controls how files are organized during import. See the import_genome documentation for detailed examples.

Next Steps