Isolation, biochemical characterization, and genome sequencing of two high-quality genomes of a novel chitinolytic Jeongeupia species
Graphical Abstract
In this study, a novel chitinolytic Jeongeupia species “wiesaeckerbachi” was isolated from soil samples, characterized biochemically, and sequenced with the long-read platform PacBio Sequel IIe. In silico analysis unraveled genomic differences to the closest related type strain Jeongeupia naejangsanensis TAS4-2 in addition to an usually extensive chitinolytic machinery.
Abstract
Chitin is the second most abundant polysaccharide worldwide as part of arthropods' exoskeletons and fungal cell walls. Low concentrations in soils and sediments indicate rapid decomposition through chitinolytic organisms in terrestrial and aquatic ecosystems. The enacting enzymes, so-called chitinases, and their products, chitooligosaccharides, exhibit promising characteristics with applications ranging from crop protection to cosmetics, medical, textile, and wastewater industries. Exploring novel chitinolytic organisms is crucial to expand the enzymatical toolkit for biotechnological chitin utilization and to deepen our understanding of diverse catalytic mechanisms. In this study, we present two long-read sequencing-based genomes of highly similar Jeongeupia species, which have been screened, isolated, and biochemically characterized from chitin-amended soil samples. Through metabolic characterization, whole-genome alignments, and phylogenetic analysis, we could demonstrate how the investigated strains differ from the taxonomically closest strain Jeongeupia naejangsanensis BIO-TAS4-2T (DSM 24253). In silico analysis and sequence alignment revealed a multitude of highly conserved chitinolytic enzymes in the investigated Jeongeupia genomes. Based on these results, we suggest that the two strains represent a novel species within the genus of Jeongeupia, which may be useful for environmentally friendly N-acetylglucosamine production from crustacean shell or fungal biomass waste or as a crop protection agent.
1 INTRODUCTION
Chitin, the second most abundant naturally occurring polysaccharide on Earth, consists of β-(1,4)-linked N-acetyl-d-glucosamine and to a smaller extent, d-glucosamine monomers. Parallelly to cellulose, which differs structurally through a lack of an amido-functionality, and imparts stability and structure to higher plants, chitin is the principal structural component of fungal cell walls and cuticles of insects and crustacean exoskeletons, algal cell walls, and mollusks endoskeletons in aquatic organisms (Rinaudo, 2006; Younes & Rinaudo, 2015).
Despite its ubiquity, no significant long-term accumulation could be quantified in environmental soil or sediments, implying high turnover rates by chitinolytic organisms in nature (Gooday, 1990). While glucosamine-specific importers resemble to be widespread among bacteria (Riemann & Azam, 2002), the distribution of chitinoplastic enzymes is according to current reports limited to several groups within the phyla Proteobacteria, Bacteroides, Actinobacteria, and Firmicutes (Cottrell et al., 2000; Hunt et al., 2008). While bacteria compete with fungi for chitinous resources on land, bacteria of the orders Vibrionales, Enterobacterales, and Neisseriales, prevail in carbon and nitrogen cycling of polysaccharides in aquatic environments (Aumen, 1980; Beier & Bertilsson, 2013; de Boer et al., 2005; Hunt et al., 2008; Swiontek Brzezinska et al., 2014; Yu et al., 1991). Most bacterial chitinases are classified as glycoside hydrolases of family 18 (GH18) and to a vastly lesser extent, those of family 19 (Cantarel et al., 2009). Chitin composition varieties in terrestrial and aquatic environments are reflected in the formation of distinct chitinolytic systems (Bai et al., 2016).
Aquatic chitinolytic bacteria might operate with a smaller toolkit of enzymes on average (Bai et al., 2016), are not enriched on the substrate (Brzezinska et al., 2008), and exhibit generally weaker catalytic activities (Swiontek Brzezinska et al., 2014). By contrast, terrestrial bacteria are more chitinolytically active in comparison, with Streptomyces as the predominant genus in the early stages of chitin decomposition, whereas other Actinomycetes take over the reins in later stages (de Boer et al., 1999; Swiontek Brzezinska et al., 2014). Furthermore, a correlation between the abundance of bacteria and chitin decomposition rates could be observed in soil systems (Kielak et al., 2013), both of which could be promoted through the addition of substrate (Jacquiod et al., 2013; Mitchell & Alexander, 1962).
Applications of their products, the chitooligosaccharides (COS) and corresponding deacetylated derivatives comprise the food-, cosmetic-, wastewater treatment-, and medical industries (Aam et al., 2010; Abu Hassan et al., 2009; Hamed et al., 2016; Rinaudo, 2006). On account of high energy costs and hazardous by-products of chemical processes, biotechnological COS production is more sustainable and the preferred method long-term (Beaney et al., 2005; Kaur & Dhillon, 2013; Oyeleye & Normi, 2018).
By virtue of their industrial potential and the increased relevance of sustainable (bio)technologies, extensive research on chitinases has been conducted (Binod et al., 2005; Juarez-Jimenez et al., 2008; Lan et al., 2004; Songsiriritthigul et al., 2010; Sun et al., 2019; Vaikuntapu et al., 2016). With 10% of the global crop loss arising from plant pathogens (Strange & Scott, 2005), chitinases could gain importance as environmentally friendly crop protection agents, in particular, due to their fungal cell wall-directed hydrolase activities (Adrangi & Faramarzi, 2013; Gomaa, 2012; Neeraja et al., 2010; Veliz et al., 2017). However, turnover rates of recalcitrant chitin represent the biggest obstacle that hinders chitinases from becoming economically feasible contenders for industrial valorization. Thus, the exploration of novel chitinolytic organisms is important to further deepen our understanding regarding catalytic mechanisms and inferred optimization of enzymes. In this respect, recent improvements regarding the costs and accessibility of next-generation sequencing technologies enable the continuous democratization of whole-genome sequencing. Long-read sequencing platforms are well suited for de novo genome assembly applications, while high-accuracy short-read sequencing is apt for clinical variant discovery (Koboldt et al., 2013; Goodwin et al., 2016).
In this study, colloidal chitin amended soil samples were screened for chitinolytic organisms, isolated on chitin agar plates, and identified with 16S ribosomal RNA (rRNA) gene analysis (Jacquiod et al., 2013; Mitchell & Alexander, 1962). High-fidelity genomes were created employing Pacific Biosciences' long-read sequencer Sequel IIe and National Center for Biotechnology Information (NCBI's) Prokaryotic Genome Annotation Pipeline (PGAP). Biocomputational comparison with a highly similar Illumina NextSeq 500-based draft genome of Jeongeupia naejangsanensis BIO-TAS4-2T (Turrini et al., 2021) was utilized as a basis for taxonomic discussion. In addition, biochemical sugar metabolism capabilities were investigated utilizing API NE20 and CH50 stripes, revealing differences between the two strains investigated in this study and the type strain BIO-TAS4-2T (Yoon et al., 2010). Finally, in silico analysis of the chitinolytic systems demonstrated the highly conserved nature within the genus Jeongeupia and shed light on the enzymatic composition.
2 MATERIALS AND METHODS
2.1 Chemicals and consumables
All chemicals were supplied from Sigma-Aldrich, and general consumables were obtained from VWR. All necessary buffers and enzymes for next-generation genome sequencing were shipped from Pacific Biosciences. High molecular weight DNA was extracted with the Quick-DNA™ High Molecular Weight (HMW) MagBead Kit from Zymo Research and HMW genomic DNA (gDNA) shearing was conducted with g-TUBEs (Covaris) according to the manufacturer's manual.
2.2 Colloidal chitin and media preparation
Colloidal chitin (CC) was prepared according to (Murthy & Bleakley, 2012) with slight modifications. Twenty grams of crab shell chitin powder (Sigma-Aldrich) were incrementally added to 150 mL 37% HCl under moderate stirring, increasing the viscosity of the solution. When the viscosity decreased sufficiently, more chitin was carefully added. The slur was then incubated for 2–3 h at room temperature under moderate stirring, evading the formation of bubbles. Afterward, the nonviscous, fully dissolved chitin of intense brown color was slowly poured into 2 L of ice-cold diH2O in a 5 L glass beaker and vigorously stirred, rapidly swelling to white colloidal chitin. The solution was incubated overnight at 4°C without stirring and neutralized the following day by adding excessive amounts of deionized water and subsequent centrifugation in a Beckman JLA8.1000 rotor for 15 min at 10,000g until pH 5 of the supernatant was reached. CC was harvested, autoclaved, and kept in the fridge until utilization in liquid chitinase screening media (CSM) or agar plates. The recipe was adapted and modified from (Lee et al., 1997; Singh et al., 1998): 20 g/L (2% wt/vol) CC, 0.7 g/L K2HPO4, 0.3 g/L KH2PO4, 0.5 g/L MgSO4 × 5 H2O, 10 mg/L FeSO4 × 7H2O, 20 g/L agar (optional), adjust to pH 6.5 for plates or 7 for liquid medium. After autoclaving, 0.001 g/L ZnSO4 and MnCl2 were added from sterile filtrated stock solutions before pouring of agar plates/inoculation of liquid media.
2.3 Soil screening and cultivation of chitinolytic organisms
Soil samples were collected in sterile 50 mL falcon tubes and normalized to 60 g before transfer into 250 mL glass beakers. Tap water was added if the collected soil was completely dry. Afterward, samples were amended with either 1% or 10% wt/wt colloidal chitin or crab shell chitin powder (Sigma-Aldrich) and covered with tin foil. After incubation at room temperature for 2 weeks, portions of the amended soil samples were transferred to sterile 50 mL falcon tubes and filled to 50 mL with sterile 1X PBS. Soil samples were incubated in a thermal shaker at 30°C and 600 rpm for 30 min. Supernatants were streaked out on CSM agar plates with different pH (5.5, 6, 6.5, 7, 8) using inoculation loops and incubated at 28°C for 2–3 days. Colony-forming units (CFU) surrounded by halos were streaked onto separate CSM agar plates of the respective pH until axenic strains were obtained (Figure 1).
2.4 Bacterial strains
Through the method described above, the two chitinolytic bacteria J. n. and J. sp. were isolated from environmental soil samples. Species identification was realized using 16S rRNA gene analysis with the polymerase chain reaction (PCR) primer pair F27-5′-AGA GTT TGA TCC TGG CTC AG-3′ and 1492R-5′-AAG GAG GTG ATC CAA GCC-3′ at 55°C annealing temperature. After deploying the DreamTaq DNA polymerase (Thermo Fisher Scientific), the length and quantity of PCR products were validated via agarose gel electrophoresis, gel bands were excised, and gene fragments extracted with the Monarch DNA Gel Extraction Kit (New England BioLabs GmbH) according to the protocol. Eurofins Genomics Europe Sequencing GmbH conducted Sanger sequencing with the provided primer pair F27 and 1492R. Finally, the Geneious Prime software (v.2022.0.1) was used for read quality control, alignment, and assembly to obtain near full-length rRNA sequences, which were compared to NCBI's 16S rRNA gene database through their BLASTn suite (Sayers et al., 2022).
2.5 Whole-genome sequencing
2.5.1 HMW gDNA extraction and DNA library preparation
Singular bacteria colonies were picked from CSM agar (pH 6.5) and incubated in 20 mL Tryptic Soy Broth medium in 150 mL baffled shaking flasks at 120 rpm and 28°C overnight. HMW gDNA was extracted according to the instructions of the Quick-DNA HMW MagBead Kit (Zymo Research).
To assess the quantity and purity of the obtained DNA, 260/280 nm absorption ratios and concentrations were measured with a photometer (Nano Photometer NP80; IMPLEN) and a Qubit 4 fluorometer with the Qubit 1X dsDNA HS Assay-Kit (Thermo Fisher Scientific). To confirm the high molarity of the gDNA, fragment sizes were analyzed with a Femto Pulse capillary electrophoresis instrument (Agilent Technologies).
When samples passed the quality control, shearing of 8 µg gDNA in 150 µL Elution Buffer was conducted with g-TUBEs (Covaris), utilizing 1700g in a tabletop centrifuge. This yielded DNA fragments with a size of ca. 12 kbp, as confirmed with Femto Pulse. Subsequently, HiFi libraries were prepared according to the SMRTbell prep kit 3.0 manual, fusing barcoded adapters to the samples (Pacific Biosciences). Libraries were stored at −20°C until the day of sequencing, where primers and the polymerase bound the samples with the Sequel II Binding Kit 3.2 (Pacific Biosciences), closely following the manufacturer's recommendations.
2.5.2 Sequencing
Whole genome sequencing was performed on a Sequel IIe platform (Pacific Biosciences) on a single SMRT cell (lot number 418096) with the following parameters: 2 h of pre-extension, 2 h of adaptive loading (target p1 + p2 = 0.95) for a final on-plate concentration of 85 pM and a 30-h long movie window for signal detection (Ritz et al., 2023).
2.5.3 Assembly and annotation
After demultiplexing with the SMRT link software (v.11.0.0.144466) to separate the barcoded reads, obtained FASTQ raw read files were assembled utilizing the Canu assembler 2.0 (Koren et al., 2017). An estimated genome size of 3.8 Mb was provided (genomeSize = 3.8 mb) and the -pacbio parameter was deployed; otherwise, standard settings were utilized. Log files can be found in Supporting Information: Data. Annotation was performed utilizing NCBI's PGAP (Ciufo et al., 2018; Haft et al., 2018; Li et al., 2021; Tatusova et al., 2016), which employs GeneMarkS-2+ for gene prediction (Lomsadze et al., 2018) and TIGRFAMs for functional identification of proteins (Haft, 2001, 2003; Haft et al., 2012; Selengut et al., 2007).
2.5.4 Bioinformatic analysis
- 1.
Functional genome characterization via eggNOG Mapper (Huerta-Cepas et al., 2017; Huerta-Cepas, Szklarczyk, et al., 2016) to retrieve Cluster of Orthologous Groups (COG) Proteins and Gene Ontology terms.
- 2.
Genome quality assessment with CheckM (Parks et al., 2015) and BUSCO v.5.3.2 (Manni et al., 2021), based on near-universal single-copy orthologs. CheckM was run through Protologger, part of the Galaxy network (Hitch et al., 2021).
Visualization of the circular genome plot was realized with CIRCOS (Krzywinski et al., 2009), while the R-Studio software with the ggplot2 package served as the main tool for the creation of all other plots, if not stated differently (Posit Team, 2022; Wickham, 2008). Mobile genetic elements and phage regions were detected with the browser-based tool PHASTER (Arndt et al., 2016; Zhou et al., 2011). The origin of replication (ORI) was identified with DoriC 12.0 (Dong et al., 2023).
Carbohydrate active enzymes (CAZymes) were retrieved with dbCAN 3.0 (Cantarel et al., 2009; Zheng et al., 2023). Alignment and phylogenetic reconstructions of the chitin-enacting enzymes were performed using the function “build” of ETE3 3.1.2 as implemented on the GenomeNet (Huerta-Cepas, Serra, et al., 2016; Kyoto University Bioinformatics Center, 2023). The tree was constructed using FastTree v2.1.8 with default parameters. Values at nodes are Shimodaira–Hasegawa-like local support (Thompson et al., 1994; Kyoto University Bioinformatics Center, 2023).
Whole genome alignment was realized with the progressiveMauve plugin within the Geneious Prime software v.2022.0.1, which is suitable for genomes containing rearranged segments due to recombination (Darling et al., 2010). Several locally collinear block (LCB) sizes were tested, whereby a compromise of conserved region count and sequence identity was selected, see Supporting Information: Data (Figure A6).
2.5.5 Phylogenetic trees with Type (Strain) Genome Server (TYGS)
- 1.
Determination of closest type strain genomes: Was done in two complementary ways: First, all user genomes were compared against all type strain genomes available in the TYGS database via the MASH algorithm, a fast approximation of intergenomic relatedness (Ondov et al., 2016), and, the 10 type strains with the smallest MASH distances chosen per user genome. Second, an additional set of 10 closely related type strains was determined via the 16S rRNA gene sequences. These were extracted from the user genomes using RNAmmer (Lagesen et al., 2007). Each sequence was subsequently BLASTed (Camacho et al., 2009) against the 16S rRNA gene sequence of all 18,977-type strains currently available in the TYGS database. This was used as a proxy to find the best 50 matching type strains (according to the bitscore) for each user genome and to subsequently calculate precise distances using the Genome BLAST Distance Phylogeny approach (GBDP) under the algorithm “coverage” and distance formula d5 (Camacho et al., 2009; Meier-Kolthoff et al., 2013). These distances were finally used to determine the 10 closest type strain genomes for each of the user genomes.
- 2.
Pairwise comparison of genome sequences: For the phylogenomic inference, all pairwise comparisons among the set of genomes were conducted using GBDP and accurate intergenomic distances inferred under the algorithm “trimming” and distance formula d5 (Meier-Kolthoff et al., 2013). A total of 100 distance replicates were calculated each. Digital (DNA–DNA hybridization) values and confidence intervals were calculated using the recommended settings of the GGDC 3.0 (Meier-Kolthoff et al., 2013, 2022).
- 3.
Phylogenetic inference: The resulting intergenomic distances were used to infer a balanced minimum evolution tree with branch support via FASTME 2.1.6.1 including subtree-prune-regraft moves postprocessing (Lefort et al., 2015). Branch support was inferred from 100 pseudobootstrap replicates each. The trees were rooted at the midpoint (Farris, 1972) and visualized with PhyD3 (Kreft et al., 2017).
3 RESULTS AND DISCUSSION
3.1 Screening, isolation, and 16S rRNA gene-based identification of chitinolytic bacteria
Through the amendment of environmental soil samples with chitin in colloidal or powder form and dosages of 0.6% or 6% (wt/wt), respectively, chitinolytic microorganisms could putatively be enriched, as previously reported (Jacquiod et al., 2013). Streaking onto minimal media agar plates with 2% (wt/vol) colloidal chitin as the sole carbon- and nitrogen source produced CFUs, whose chitin hydrolyzing ability became visible through halos in varying diameters, indicating degradation of the paste-like, white colloidal chitin (Figure A1). The two most promising candidates would then be subjected to 16S rRNA gene PCR (Figure 1) and identified based on BLASTn comparison with the type strain database of NCBI (Sayers et al., 2022).

Both candidates were identified as J. naejangsanensis strain BIO-TAS4-2T with identical percent identities of 98.72%, query coverages of 99%, and E values of 0. With identities of 98.48% (J. n.) and 98.01% (J. sp.), respectively, Jeongeupia chitinilytica's 16S rRNA gene showed the second most sequence homology. The transitory names were awarded based on these results, indicating that the investigated organism is J. naejangsanensis, leading to “J. n.” for the first strain. Due to visible morphological differences on the screening plates earlier in the study, possibly originating from the presence of a contaminant, the second candidate strain was thought to be a deviant Jeongeupia. This hypothesis was later reinforced by an aligned nucleotide identity-based taxonomic analysis, leading to the name “J. sp.” When deploying the 16S rRNA gene sequences extracted from the novel high-quality genomes (gene IDs pgaptmp_00343 [J. n.] and pgaptmp_1503 [J. sp]) to a BLASTn query, the colony 16S rRNA gene PCR-based results could be confirmed with percent identities of 99.06%, E values of 0.0 and query coverages of 97%, respectively.
3.1.1 Sugar metabolism
Carbon source utilization capabilities of the investigated strains were assessed using API 50CH and 20NE stripes (bioMérieux) and compared to the taxonomically closest strain J. naejangsanensis BIO-TAS4.2T (Table 1). As expected for closely related species, most examined characteristics were congruent, among these positive results for motility, nitrate reduction, N-acetylglucosamine, d-glucose, d-fructose, d-mannose, and d-ribose. Please refer to the Supporting Information: Data for a detailed list of all results and depictions of the API stripes.
Characteristic | 1 | 2 | 3 |
---|---|---|---|
Nitrate reduction | + | + | + |
Assimilation of | |||
d-glucose | + | + | + |
d-fructose | + | + | + |
d-mannose | + | + | + |
d-ribose | + | + | + |
N-acetylglucosamine | + | + | + |
Potassium gluconate | + | + | + |
Citrate | + | + | + |
Malate | + | + | + |
Capric acid | + | − | − |
Xylitol | + | − | − |
d-lyxose | + | − | − |
l-arabitol | + | − | − |
Hydrolysis of | |||
Esculin | − | + | w* |
Gelatin | − | + | + |
Interestingly, certain differences could be illustrated regarding the assimilation of xylitol, d-lyxose, l-arabitol, and capric acid, all of which the type strain can utilize as a carbon source (Yoon et al., 2010). Hydrolysis of the substrates esculin and gelatine was exclusive to J. n. and J. sp. on the other hand. In this regard, a minor metabolic distinction between the two strains described in this study could be made—with J. n. exhibiting a more potent esculin hydrolysis capability compared to J. sp., as detected with the API 20NE stripe (Figure A4). These observations were mitigated by the API 50CH test results though, which demonstrated very similar β-glucosidase activity levels based on the substrate's shading (Figures A2 and A3).
3.2 Genome sequencing, assembly, and quality control
Barcode adapter fused genome libraries were sequenced along other biosamples with PacBio's long-read platform Sequel IIe (Pacific Biosciences). Subsequently, reads were demultiplexed (binned according to the barcode) computationally. The overall HiFi reads from the circular consensus sequencing mode were satisfactory in quantity and quality, with a Q36 score, translating into a 99% accurate base calling. Owed to the delicate balancing act regarding library concentrations during multiplexing, the J. sp. library was overrepresented, indicated by the inflated zero-mode waveguide values and polymerase read counts compared to the J. n. library (Table 2). Through genome assembly with Canu 2.0 (Koren et al., 2017), reads were trimmed to 50- or 40-fold remaining coverages, respectively. Full-length circular bacterial genomes could be constructed with a length of 3.79 Mbps, while additional contigs added up to full genome sizes of 3.82/3.87 Mbps, respectively. The genomes are accessible at NCBI via the BioProject ID PRJNA978547. Obtained results are in concordance with other available genome sizes of the genus Jeongeupia, which range from 3.4 to 3.9 Mbps. High G + C contents of approximately 63% are also in line with 62%–65% of the other four currently known Jeongeupia members J. chitinilytica (KCTC 23701, RefSeq GCF_014652315.1), J. naejangsanensis (DSM 24253, RefSeq GCF_016865585.1), J. sp. HS-3 (RefSeq GCF_015140455.1), and J. sp. USM3 (RefSeq GCF_001808185.1).
J. n. | J. sp. | ||
---|---|---|---|
Mean barcode quality (%) | 97 | 98 | |
Number of ZMWs | 10,385 | 18,144 | |
Polymerase reads | 604,197 | 1,049,345 | |
Bases | 7,823,401,458 | 13,704,381,117 | |
Mean read length | 12,969 | 13,059 | |
Coverage (fold) | Before trimming | 50 | 200 |
After trimming | 50 | 40.14 | |
Genome size (Mbps) | 3.82 | 3.87 | |
Circular contig size (Mbps) | 3.79 | 3.79 | |
Contigs | 2 | 3 | |
GC-content (%) | 63.23 | 63.25 |
- Abbreviations: GC, guanine–cytosine; ZMW, zero-mode waveguide.
To further evaluate the assembled genome qualities, BUSCO 5.3.2 guided assessment of orthologue gene set completeness was performed, utilizing the order of the Neisseriales database (neisseriales_odb10) as the closest available reference (Manni et al., 2021). Calculated genome completeness was high with 99.7% and 99.8% for J. n. and J. sp., respectively (Figure A5), on par with the J. naejangsanensis type strain BIO-TAS4-2 genome (99.7%). The lack of fragmented BUSCOs as opposed to the reference strain could originate in the gapless assembly enabled by the long-read sequencing platform. Duplicated orthologues, as prevalently seen in the J. sp. genome, stem exclusively from its minor secondary and tertiary contigs, which were not omitted as contaminations by the Canu 2.0 assembler.
3.2.1 Genome-based identification
Average nucleotide identity (ANI)-based identification: The ANI values, widely accepted as a computational tool to define species boundaries and confirm identities of Bacteria and Archaea, were routinely assessed by the annotation pipeline PGAP (Ciufo et al., 2018). When J. naejangsanensis was chosen as the reference organism (user provided), the highest respective ANI values were 87.61% for both J.n. and J. sp. compared to J. naejangsanensis BIO-TAS4-2T (RefSeq GCA_016865585.1, ASM1686558v1). In the case of submitting the genus Jeongeupia as a reference, the best hit changed to an aligned nucleotide identity of 85.9% for J. sp. with the species J. chitinilytica (RefSeq GCA_014652315.1, ASM1465231v1), instead (Ciufo et al., 2018). However, it remained unchanged for J. n., still being identified as J. naejangsanensis BIO-TAS4-2T. All ANIs were determined inconclusive, with values below 95%. When consulting the external fastANI and orthoANIu tools (Jain et al., 2018; Yoon et al., 2017), J. sp. had higher respective ANI values of 88.37% and 87.56% with J. naejangsanensis BIO-TAS4-2T opposed to 86.93% and 85.5% with J. chitinilytica.
Possible reasons for ANI values below 95% involve genome contamination and incompleteness, the genomes belonging to a novel species, the database lacking high-quality type strain genomes, or biogeographical effects between the strains investigated in this study (country of origin Germany) and the type strain BIO-TAS4-2T (country of origin Korea). Furthermore, it is reported for several genera, for example, Variovorax or Stenotrophomonas, to be defined more loosely by the ANIs, with lower cutoff values of 88% and 88.5%, respectively (Ciufo et al., 2018). This might also apply to the genus of Jeongeupia. Genome contamination is invalidated by analysis with the CheckM tool (Parks et al., 2015), which asserts completeness of 99.57% and contamination of 1.07%/1.28% for J. n./J. sp., respectively. Assessment of near-universal single-copy orthologues through BUSCO v.5.3.2 (Manni et al., 2021) supports the notion, that genome completeness and assembly qualities were high (Figure A5).
Horizontal gene transfer and adaptation to local habitats, driven by interactions between local bacterial communities (Polz et al., 2013) should be accounted for when discussing genomic rearrangement or gene flux. Additionally, soil pH and salinity largely affect bacterial communities' composition (Fierer & Jackson, 2006; Lauber et al., 2009; Lozupone & Knight, 2007). Given the vast distance between the two sample collection sites, Naejang mountain in Korea and a billabong of the river Isar near Munich, Germany, the soil composition most likely differed. Furthermore, the distributed genome hypothesis, which states, that the gene pool of a bacterial taxon is more complex than that of an individual species, might serve to explain differences in observed genomes even within the species level, leading to genetic differences possibly reflected in ANI values (Baumdicker et al., 2012).
Whole-genome sequence-based identification: Due to the ambiguous nature of the ANI-based identification results, whole-genome sequence-based taxonomic identification was performed utilizing the free bioinformatics platform of the TYGS (Meier-Kolthoff et al., 2022; Meier-Kolthoff & Göker, 2019). Since the phylogenetic tree assembly incorporates a 16S rRNA gene sequence BLAST database search (Camacho et al., 2009), it is no coincidence, that our earlier results were reproduced and the investigated strains assigned closest to J. naejangsanensis BIO-TAS4-2 (DSM 24253) with a confidence value of 97% and a delta statistic of 0.26 (Holland et al., 2002) (Figure 2a). Intriguingly, when inferring the phylogenetic tree based on comparing whole-genome sequences, the strains investigated in this study were located in their branch next to J. naejangsanensis and J. chitinilytica with a confidence value of 100% and a delta statistic of 0.258 (Figure 2b). The results described above suggest, that the bacteria J. n. and J. sp. of this study might represent identical or closely related strains of a novel species within the Jeongeupia genus. This hypothesis aligns with ANI values of below 95% and their closest hit, indicating diverging species in most cases, with a few taxonomic exceptions mentioned above (Ciufo et al., 2018).

3.2.2 Functional annotation and chitinolytic potential
In addition to the TIGRFAM database directed annotation automatically performed by PGAP (Haft et al., 2012; Li et al., 2021), genomes were functionally categorized based on the COGs of proteins with the eggNOG-mapper (Huerta-Cepas et al., 2017) (Figure 3). This way, 79% of all genes could be annotated for both strains, while 21% are of unknown function, which is a typical distribution, even for genomes of the well-studied model organism Escherichia coli (Cummins et al., 2022). While the two investigated genomes generally exhibit extremely similar characteristics, the results suggest, that J. n. possesses a higher fraction of cell wall biogenesis genes, whereas J. sp. has access to more amino acid transport and metabolism-related genes. With the chitin hydrolyzing ability in mind, demonstrated both on colloidal chitin agar plates as well as in shaking flasks, the genomes were analyzed for CAZymes with dbCAN 3.0 (Zheng et al., 2023) and manually. Specifically, the chitinase (EC 3.2.1.14) containing GH18, GH19 as well as the β-N-acetyl-hexosaminidases comprising GH20 (EC 3.2.1.52), and the central auxiliary enzyme of family 10 (AA10), the lytic polysaccharide monooxygenase (LPMO), were of interest (Drula et al., 2021; Hemsworth et al., 2014, 2015; Henrissat et al., 2023; Mekasha et al., 2017; Slámová et al., 2010). LPMOs or more specifically, lytic chitin monooxygenases (EC 1.14.99.53), are copper-dependent oxidoreductases that can cleave recalcitrant chitin biomass (Vaaje-Kolstad et al., 2010; Walton & Davies, 2016). Through C1 carbon atom oxidation at the glycosidic bond, fueled by O2 or H2O2 and a reducing agent, oligosaccharide aldonic acids are ultimately released in the process (Kuusk et al., 2018; Westereng et al., 2017). Although the majority of research focused on fungal LPMOs (AA9 and AA11), their bacterial equivalents (AA10) are reported to boost the conventional hydrolytic activity of GH18 on chitin, as well (Forsberg et al., 2016; Vaaje-Kolstad et al., 2013). This way, 21 enzymes possibly involved in chitin degradation could be identified for J. n. and J. sp., respectively, comprising 13 GH18, 3 of which possess carbohydrate-binding modules of family 5, 3 GH19, 3 GH20, a single β-N-acetylhexosaminidase, and a single LPMO. Based on its published annotated draft genome (GCA_016865585.1), type strain J. naejangsanensis BIO-TAS4-2 exhibited 21 potentially chitinolytic enzymes, with 13 GH18, 3 GH19, 2 GH20, a single LPMO and one, as partial chitinase annotated putative protein (Turrini et al., 2021).

According to a study from 2016, which compared the chitinolytic systems of aquatic and terrestrial chitinolytic systems based on available genomes at that time, Jeongeupia exhibits an exceptionally rich enzyme toolkit (Bai et al., 2016), that reminds us of fungal Trichoderma species (Seidl et al., 2005). To our knowledge, few bacteria, among them Streptomyces coelicolor A3(2) (Saito et al., 2000) and Andreprevotia ripae (Lorentzen et al., 2021), are described with access to comparable chitinase gene copy numbers.
To compare the chitinolytic systems taxonomically, and reveal orthologous enzymes, a CLUSTALW sequence alignment of the translated amino sequences was performed, followed by a phylogenetic tree generation (Huerta-Cepas, Serra, et al., 2016; Kyoto University Bioinformatics Center, 2023; Thompson et al., 1994) (Figure 4). The results suggest that the chitinolytic enzymes of the three compared Jeongeupia strains are highly conserved, except for one orthologous GH20 unique to the two strains of this study and one single chitinase exclusive to the type strain reference genome. Comprising the majority of bacterial chitinases, the GH18 were separated into three distinct clades, one of which could be functionally annotated as chitinase C by the eggNOG-mapper (Huerta-Cepas et al., 2017; Huerta-Cepas, Szklarczyk, et al., 2016). The latter might represent the endo-chitinases, responsible for randomized cleavage along the chitin polysaccharide chain. Despite belonging to the same family, GH18 enzymes differentiate in sequence and catalytic mechanisms (Hoell et al., 2010), which is reflected by the two separate chitinase A-like branches, identified with the SWISS-MODEL sequence homology database (Studer et al., 2020; Waterhouse et al., 2018). The auxiliary oxidoreductase enzyme LPMO was assigned to its own, distant branch based on sequence homology, and its oxygen-driven mechanism, which deviates drastically from conventionally operating hydrolase-based chitin-active enzymes (Bissaro et al., 2017; Kuusk et al., 2018). Curiously, GH19 was represented in two separate clades, one seemingly homologous to a GH18 clade with carbohydrate-binding module 5, while the other clade shared more sequence identity with vastly different GH20 and β-N-acetyl-hexosaminidases.

All enzymes annotated as GH20 or β-N-acetyl-hexosaminidases, responsible for processive exo-chitinase activities, were assigned as descendants of a branch with three distinct clades. Since CLUSTALW is based purely on amino acid sequence alignment, the taxonomic allocation does not necessarily elucidate the singular clades' function but rather illuminates phylogenetic coherences and evolutionary processes.
3.2.3 Comparison to the J. naejangsanensis BIO-TAS4-2T genome
A whole-genome sequence alignment was conducted with the computational tool progressiveMauve (Darling et al., 2010) (Figure 5). The software workflow includes selecting a reference sequence, followed by gapless multiple alignments of the input sequences, which serve as anchor regions. Subsequently, a phylogenetic guide tree is inferred, which is utilized to progressively apply an algorithm at every internal node, removing small matches that cause rearrangements and negatively affect the anchoring scores. Through an iterative process, progressiveMauve tries to align the sequences to maximize the conserved regions shared among the input sequences (Armstrong et al., 2019).

Figure 5 depicts, how the genomes of J. n. and J. sp. are highly conserved but entirely inverted. Orange-shaded segments illustrate loosely conserved regions. When looking at the sequence homologies, several regions within individual LCBs, featuring low sequence identities, become apparent. These regions indicate horizontal gene transfer, where externally acquired genes interrupt otherwise conserved blocks. Furthermore, the whole-genome sequence alignment revealed that the investigated strains share a high sequence homology with J. naejangsanensis BIO-TAS4-2T. However, more genetic rearrangements or inversions of singular LCBs are apparent. One prominent partially nonconserved, partially low conserved region of approximately 105,000 bp distinguishes the Jeongeupia genomes, indicated with a blue frame.
A circular plot is a helpful tool to visualize large data amounts clearly and further highlight gaps between the presented genomes (Figure 6). Apart from the obvious advantages of a long-read sequencer, allowing for a gapless assembly of reads into circular bacterial genomes consisting of one contig, all results presented above have been conveyed in the figure. Besides, an analysis with PHASTER (Arndt et al., 2016; Zhou et al., 2011) revealed active and inactive phage regions, which are important driving forces of gene flux and microbial evolution (Canchaya et al., 2004; Mavrich & Hatfull, 2017). J. n. and J. sp. exhibit an identical phage region pattern, which might indicate that both strains are the same organism. On the other hand, the close relative J. naejangsanensis BIO-TAS4-2T has fewer phage regions overall, with more inactive regions as depicted with gray in contrast to black stripes.

Circular plotting of the chitin-active gene loci elucidated different arrangements within the respective genomes. All chitin hydrolysis-related genes are clustered tightly in contig 1 of the reference genome, whereas the corresponding genes are distributed more evenly in the J. n./J. sp. genomes, with one GH18 in a particularly remote locus. Nevertheless, both chitinase C-like hydrolases in the genome reside in close proximity as well as two out of three GH19 and GH20 enzymes, respectively, forming small pseudo clusters.
Although the existence of distinct chitin hydrolase clusters might tempt one to assume varying enzymes, the alignment with CLUSTALW and the inferred phylogenetic tree depicted (Kyoto University Bioinformatics Center, 2023; Thompson et al., 1994) (Figure 4), that the chitinolytic enzymes are highly conserved among the Jeongeupia genomes, but rearranged drastically. As suggested by the progressiveMAUVE alignment, most likely through gene flux events. Overall, the three genomes differ merely in two genes: J. naejangsanensis BIO-TAS4-2T has an additional chitinase WP_239000134.1 which the strains of this study lack, whereas J. n. and J. sp. have access to one additional GH20 hexosaminidase pgaptmp_000306/002118.
Lastly, GC-skew calculation could highlight over- and underabundance of the nucleotides guanine and cytosine. As a result, the two eligible ORI loci per genome could be unraveled, typically placed at the transition points of nucleotide overrepresentation (Lobry, 1996). Due to the replication initiation gene dnaA at one of those two conversion regions, the ORI could be located exactly with the DoriC 12.0 tool (Dong et al., 2023; Kosmidis et al., 2020; Trojanowski et al., 2018). Interestingly, the J. naejangsanensis BIO-TAS4-2T genome shows inconsistent regions of GC overabundance in contigs 11 and 16, which could hint at either misassembled regions or gene flux. The corresponding LCBs of J. n. and J. sp., indicated by a color code, are arranged differently and in accordance with the general GC-skew.
4 CONCLUSIONS
In this study, soil sample-derived chitinolytic organisms could be enriched through chitin amendment as demonstrated before (Jacquiod et al., 2013). Sequential screening and isolation on chitin agar plates were followed by 16S rRNA gene PCR-guided identification, according to which the two most promising candidates were J. naejangsanensis BIO-TAS4-2T (DSM 24253) (Turrini et al., 2021; Yoon et al., 2010). Long read-sequencing with Pacific Bioscience's Sequel IIe platform and annotation with NCBI's PGAP provided high-quality genomes of the investigated strains, as confirmed with CheckM and BUSCO (Li et al., 2021; Manni et al., 2021; Parks et al., 2015; Tatusova et al., 2016). Whole-genome alignment revealed horizontal gene transfer and inversions of LCBs in comparison to the type strain (Darling et al., 2010; Arndt et al., 2016). Taxonomic evaluation based on aligned nucleotide identity (ANI) values yielded inconclusive results (Ciufo et al., 2018; Jain et al., 2018; Yoon et al., 2017). On the contrary, the whole-genome alignment-based taxonomic assessment suggested, that the two strains investigated in this study are novel Jeongeupia species closely related to J. naejangsanensis and J. chitinilytica (Meier-Kolthoff & Göker, 2019). This hypothesis is further supported by the results of the biochemical characterization, which demonstrated distinct differences between the type strain BIO-TAS4-2T and J.n./J. sp. of this study. We, therefore, propose the species name Jeongeupia wiesaeckerbachi, based on the name of the billabong in proximity to the organism's finding site. A thorough in silico query for enzymes involved in the chitinolytic machinery and phylogenetic analysis thereof revealed an extraordinary amount of enzymes with a high degree of conservation among the investigated Jeongeupia species (Thompson et al., 1994; Bai et al., 2016; Huerta-Cepas, Serra, et al., 2016).
The novel Jeongeupia species presented in this study might provide a cost-effective and environmentally friendly process to convert crustacean shell and fungal biomass waste into N-acetylglucosamine based on its large set of chitin-active enzymes. Further research must be conducted to demonstrate their suitability as antimycotic crop protection agents in a similar fashion to other studies (Neeraja et al., 2010; Swiontek Brzezinska et al., 2014). In addition, chitinases and other chitinoplastic enzymes such as chitin-deacetylases could play significant roles in future circular bioeconomic approaches, where insects, crustaceans exoskeletons, or fungal residues are to be valorized in chemoenzymatic processes for applications in the food, chemical, cosmetic, and pharmaceutical industry (Intasian et al., 2021; Triunfo et al., 2022).
AUTHOR CONTRIBUTIONS
Nathanael D. Arnold: Conceptualization (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); software (equal); validation (equal); visualization (equal); writing—original draft (equal). Daniel Garbe: Conceptualization (equal); supervision (lead); writing—review and editing (equal). Thomas B. Brück: Conceptualization (equal); funding acquisition (equal); project administration (equal); resources (equal); supervision (equal); writing—review and editing (equal).
ACKNOWLEDGMENTS
The authors would like to thank Zora Rerop and Manfred Ritz for their valuable bioinformatics advice as well as Nadim Ahmad and Kevin Heieck for proofreading. This research was funded by the German Ministry for Education and Research with grant number 031B0838B in the framework of the Canadian/German BMBF bioeconomy international project “ChitoMat.” The Galaxy server that was used for some calculations is in part funded by Collaborative Research Centre 992 Medical Epigenetics (DFG Grant SFB 992/1 2012) and German Federal Ministry of Education and Research (BMBF Grants 031 A538A/A538C RBC, 031L0101B/031L0101C de.NBI-epi, 031L0106 de.STAIR [de.NBI]). Open Access funding enabled and organized by Projekt DEAL.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflict of interest.
ETHICS STATEMENT
None required.
APPENDIX
See Figures A1-A6 and Table A1.






Tube | Characteristic | 1 | 2 | 3 |
---|---|---|---|---|
1 | Glycerol | − | − | − |
2 | Erythritol | − | − | − |
3 | d-arabinose | − | − | − |
4 | l-arabinose | − | − | − |
5 | d-ribose | + | + | + |
6 | d-xylose | − | − | − |
7 | l-xylose | − | − | − |
8 | d-xylose | − | − | − |
9 | Methyl-beta-d-xylopyranoside | − | − | − |
10 | d-galactose | − | − | − |
11 | d-glucose | + | + | + |
12 | d-fructose | + | + | + |
13 | d-mannose | + | + | + |
14 | l-sorbose | − | − | − |
15 | l-rhamnose | − | − | − |
16 | Dulcitol | − | − | − |
17 | Inositol | − | − | − |
18 | d-mannitol | − | − | − |
19 | d-sorbitol | − | − | − |
20 | Methyl-alpha-d-mannopyranoside | − | − | − |
21 | Methyl-alpha-d-glucopyranoside | − | − | − |
22 | N-acetylglucosamine | + | + | + |
23 | Amygdalin | − | − | − |
24 | Arbutin | − | − | − |
25 | Esculin ferric citrate | − | (+) | (+) |
26 | Salicin | − | − | − |
27 | d-cellobiose | − | − | − |
28 | d-maltose | − | − | − |
29 | d-lactose (bovine origin) | − | − | − |
30 | d-melibiose | − | − | − |
31 | d-saccharose (sucrose) | − | − | − |
32 | d-trehalose | − | − | − |
33 | Inulin | − | − | − |
34 | d-melezitose | − | − | − |
35 | d-raffinose | − | − | − |
36 | Amidon (starch) | − | − | − |
37 | Glycogen | − | − | − |
38 | Xylitol | + | − | − |
39 | Gentiobiose | − | − | − |
40 | d-turanose | − | − | − |
41 | d-lyxose | + | − | − |
42 | d-tagatose | − | − | − |
43 | d-fucose | − | − | − |
44 | l-fucose | − | − | − |
45 | d-arabitol | − | − | − |
46 | l-arabitol | + | − | − |
47 | Potassium gluconate | + | + | + |
48 | Potassium 2-ketogluconate | − | − | − |
49 | Potassium 5-ketogluconate | − | − | − |
- Note: Differences are highlighted with bold type.
Open Research
DATA AVAILABILITY STATEMENT
All data are provided in full in the results section and the appendix of this paper. All raw datasets generated and/or analyzed during the current study are available in the Zenodo online repository: https://doi.org/10.5281/zenodo.8032359. The J. n. and J. sp. genomes can be accessed at NCBI with the BioSample IDs SAMN35557021 and SAMN35557022, respectively.