Improved Python Package for DNA Sequence Encoding using Frequency Chaos Game Representation
Frequency Chaos Game Representation (FCGR), an extended version of Chaos Game Representation (CGR), emerges as a robust strategy for DNA sequence encoding. The core principle of the CGR algorithm involves mapping a one-dimensional sequence representation into a higher-dimensional space, typically in the two-dimensional spatial domain. This paper introduces a use case wherein FCGR serves as a kmer frequency-based encoding method for motif classification using a publicly available dataset. Availability and implementation: The FCGR python package, use case, along with additional functionalities, is available in the GitHub. Our FCGR package demonstrates superior accuracy and computational efficiency compared to a leading R-based FCGR library {lochel2020deep}, which is designed for versatile tasks, including proteins, letters, and amino acids with user-defined resolution. Nevertheless, it is important to note that our Python package is specifically designed for DNA sequence encoding, where