Quick Start¶
This guide will walk you through the basic workflow for importing genome data into the Arx folder structure.
Prerequisites¶
- Arx Tools installed (see Installation)
- Genome data files (assembly, annotations, etc.)
Environment Variables¶
Set these environment variables before starting:
# Set the path to your Arx folder structure
export FOLDER_STRUCTURE=/path/to/folder_structure
# Set the path to your import configuration file (optional)
export ARX_IMPORT_SETTINGS=/path/to/import_config.json
Basic Workflow¶
1. Initialize Folder Structure¶
First, create the basic Arx folder structure:
init_folder_structure
This creates the following structure:
folder_structure
├── organisms
├── annotations.json
├── annotation-descriptions
│ ├── SL.tsv
│ ├── KO.tsv
│ ├── KR.tsv
│ ├── EC.tsv
│ └── GO.tsv
├── orthologs
└── pathway-maps
├── type_dictionary.json
└── svg
2. Prepare a genome¶
Before importing, you may want to run annotation pipelines to generate the necessary files. (See File Formats)
a) Prokka Annotation¶
To get correct locus tags with Prokka:
prokka \
--strain STRAIN \
--locustag STRAIN.1 \
--prefix STRAIN.1 \
--genus Mycoplasma --species genitalium \
--out /prokka/out/dir \
assembly.fasta
b) PGAP Annotation¶
For PGAP, configure at least these lines in submol.yaml:
organism:
genus_species: 'Mycoplasma genitalium'
strain: 'STRAIN'
locus_tag_prefix: 'STRAIN.1'
c) Download from NCBI¶
If you need to download genomes from NCBI:
download_ncbi_genome \
--assembly_name GCF_005864195.1 \
--out_dir /path/to/outdir \
--new_locus_tag_prefix STRAIN.1_
Additional Annotations Recommended
For best results, we recommend enriching your annotations with additional databases such as eggNOG, KEGG (for use with the Arx Pathways tool), CAZy, antimicrobial resistance annotations, or phage identification.
For further details or assistance, feel free to reach out to Abrinca at infoabrinca.com.
3. Import Genome Data¶
Import your genome data into the folder structure using the import_genome command.
import_genome --import_dir=/path/to/genome/data --organism STRAIN --genome STRAIN.1
This will:
- Copy genome files to the appropriate locations
- Generate metadata files (genome.json, organism.json)
- Organize files according to the default structure
5. Orthology Analysis (Optional)¶
For orthology analysis with OrthoFinder:
# Prepare OrthoFinder analysis
init_orthofinder --representatives_only
# Run OrthoFinder (command will be printed)
# ...
# Import results
import_orthofinder --which hog
Import Configuration¶
The import configuration file controls how files are organized during import. See the import_genome documentation for detailed examples.
Next Steps¶
- Explore Core Tools for detailed usage
- Learn about Helper Scripts for renaming operations