Lapidary: Identifying and reporting amino acid sequences in metagenomes using sequence reads and Diamond
Genome and metagenome comparisons rely on identifying genetic elements that differ or are in common between samples. These genetic elements can be identified by assembling sequenced reads and identifying the genetic element in the assembly, or by aligning nucleotide sequences in the reads to the nucleotide sequences of a reference genetic element. The first relies on the complete assembly of the genetic element of interest, and the second relies on a reference sequence represented in nucleotides. This is particularly challenging with metagenome data, where the genetic elements, including genes, are often fragmented because sequences are shared between different species in the metagenomic data, resulting in contig breaks in or around genetic elements. This presents a difficulty when identifying genetic elements through the first approach. A common approach with metagenomes is to map reads against reference nucleotide sequences and extract the depth and coverage from those reference sequ