Volume 97, Issue 7 e70467
RESEARCH ARTICLE
Open Access

Kaposi Sarcoma-Associated Herpesvirus Sequencing in People Living With HIV in the Southern United States Reveals Subtype Diversity and Multiple Infections

Vickie A. Marshall

Vickie A. Marshall

Viral Oncology Section, AIDS and Cancer Virus Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland, USA

Search for more papers by this author
Sheena M. Knights

Sheena M. Knights

Department of Internal Medicine, Division of Infectious Diseases and Geographic Medicine, University of Texas Southwestern Medical Center, Dallas, Texas, USA

Parkland Health, Dallas, Texas, USA

Search for more papers by this author
Elena M. Cornejo Castro

Elena M. Cornejo Castro

Viral Oncology Section, AIDS and Cancer Virus Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland, USA

Search for more papers by this author
Nazzarena Labo

Nazzarena Labo

Viral Oncology Section, AIDS and Cancer Virus Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland, USA

Search for more papers by this author
Isabella Liu

Isabella Liu

Viral Oncology Section, AIDS and Cancer Virus Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland, USA

Search for more papers by this author
Wendell J. Miley

Wendell J. Miley

Viral Oncology Section, AIDS and Cancer Virus Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland, USA

Search for more papers by this author
Kyle N. Moore

Kyle N. Moore

Viral Oncology Section, AIDS and Cancer Virus Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland, USA

Search for more papers by this author
Charles A. Goodman

Charles A. Goodman

Retroviral Evolution Section, AIDS and Cancer Virus Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland, USA

Search for more papers by this author
Christine M. Fennessey

Christine M. Fennessey

Retroviral Evolution Section, AIDS and Cancer Virus Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland, USA

Search for more papers by this author
Brandon F. Keele

Brandon F. Keele

Retroviral Evolution Section, AIDS and Cancer Virus Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland, USA

Search for more papers by this author
Susana M. Lazarte

Susana M. Lazarte

Department of Internal Medicine, Division of Infectious Diseases and Geographic Medicine, University of Texas Southwestern Medical Center, Dallas, Texas, USA

Parkland Health, Dallas, Texas, USA

Search for more papers by this author
Elizabeth Y. Chiao

Elizabeth Y. Chiao

Department of General Oncology, University of Texas MD Anderson Cancer Center, Houston, Texas, USA

Search for more papers by this author
Ank E. Nijhawan

Ank E. Nijhawan

Department of Internal Medicine, Division of Infectious Diseases and Geographic Medicine, University of Texas Southwestern Medical Center, Dallas, Texas, USA

Parkland Health, Dallas, Texas, USA

Search for more papers by this author
Denise Whitby

Corresponding Author

Denise Whitby

Viral Oncology Section, AIDS and Cancer Virus Program, Frederick National Laboratory for Cancer Research, Frederick, Maryland, USA

Correspondence: Denise Whitby ([email protected])

Search for more papers by this author
First published: 05 July 2025

*Institute at which the work was performed.

ABSTRACT

Kaposi sarcoma-associated herpesvirus (KSHV) is the causative agent of Kaposi's sarcoma and lymphoproliferative diseases collectively identified as KSHV-associated diseases (KAD). While KAD incidence has decreased across the United States, regional and population-based variability exists, with higher rates in southern states. To understand the molecular epidemiology of KSHV in this region, samples were collected from people living with HIV (PWH) with or without history of KAD. PWH, mainly men who have sex with men (MSM), were recruited from a large, urban hospital system in Dallas, Texas, in two separate studies. The studies included 220 individuals without KAD and 59 patients with KAD. Whole blood and/or oral fluids were collected and tested by qPCR. KSHV subtypes were determined from 66 of 85 individuals with detectable KSHV loads by a combination of next-generation and targeted Sanger sequencing. All major KSHV subtypes, except D, were observed including subtypes E and F. In each of three individuals, multiple KSHV genome variants were identified. This study importantly highlights KSHV subtype diversity in the southern United States, which is an area with a high KS incidence. Genome diversity and multiple infections merit epidemiological consideration, including for the future development of vaccines.

1 Introduction

The incidence of Kaposi's sarcoma associated diseases (KAD), especially Kaposi's sarcoma (KS), has decreased significantly in people living with HIV (PWH) in the United States (U.S) since the introduction of potent combination anti-retroviral therapies (ART). The decline, however, has not been uniform in all regions of the country. Notably, recent cancer registry data shows that the incidence of KS in PWH increased among Black men in the southern US which may be due to geographical, age, and racial disparities that affect access to health care [1-4]. A nation-wide survey using 2000–2013 data from the Surveillance, Epidemiology, and End Results (SEER) database showed an increase of KS diagnoses, with an associated trend of higher mortality due to KS, in Black men when compared to other ethnicities [5]. More recently, SEER data through 2021 indicates a slight decline in KS incidence in Black and Hispanic men, however the rates remain twice that observed in non-Hispanic white men. In fact, despite observed declines across the United States, the overall trend of KS incidence across all ethnicities in the United States South remains stable and above the levels seen before the AIDS epidemic (https://seer.cancer.gov/statistics-network/explorer/) (evaluated 6/28/24).

Limited molecular epidemiology studies of KSHV have been conducted in the United States since the early 2000s and the distribution of viral subtypes is essentially unknown for most communities [6-10]. It is also possible that KSHV subtypes in circulation today may differ from those previously reported in the 1990s as recently speculated [11]. The worldwide distribution of KSHV subtypes as defined by the KSHV K1 gene sequence is regional, with some subtypes predominating within certain geographic areas or within ethnic groups [6, 12]. KSHV K1 subtypes A and C are commonly reported in Europe and Asia as well as in regions that were colonized by peoples from Europe, including North America, South America, and Australia [13, 14]. Subtypes B, and A5 are more common in populations living in Africa or of African origin [7]. The less common subtypes D, E, and F were first observed in indigenous populations in the Pacific, South America, and South Africa respectively. The KSHV K15 gene is also used for subtype analysis and is considered allelic with three recognized subtypes: P, M, and N. The K15 P and M subtypes have been reported world-wide while the N has been only found in people born in Africa based upon currently available data [15, 16].

Within the population of PWH, particularly men who have sex with men (MSM), KSHV coinfection is very prevalent. The KSHV seroprevalence in a population in Dallas, Texas was recently estimated to be 68%. It was also determined that behavioral risk factors, but not race or ethnicity, were associated with KSHV seropositivity [17]. KSHV shedding in oral fluids, which is a driver of transmission, was quite high in the same study population. Taking advantage of the high KSHV loads observed in oral fluids, as well as detectable KSHV DNA in whole blood, this study was initiated to identify the KSHV K1 subtypes in this population.

2 Materials and Methods

2.1 Characteristics of Study Population

Samples from 280 individuals from two separate studies conducted at Parkland Health's outpatient HIV clinic in Dallas, Texas were sequenced. The first study was a cross-sectional cohort investigating KSHV seroprevalence which included 206 individuals with no history of KAD and 14 participants with KS, as previously reported [17]. Participants were recruited between January 2020 and September 2021. Sera were collected for serological assays while whole blood and oral fluids were collected for molecular studies.

The second study enrolled 60 PWH with history of KS, from September 2022 until April 2023. Participants provided oral fluid samples for KSHV viral load (VL) quantitation and sequencing to determine viral subtypes. Both studies were approved by the University of Texas Southwestern Institutional Review Board (STU 2019-1204 and STU 2022-0355) and all participants provided informed consent at enrollment according to the Declaration of Helsinki. Figure 1.

Details are in the caption following the image
Study design. Flowchart summarizing the study design. PWH, people living with HIV; KSHV, Kaposi sarcoma associated herpesvirus; OF, oral fluid; WB, whole blood; NGS, next-generation sequencing. *KSHV subtype total includes three mixed infections.

2.2 DNA Extraction and KSHV qPCR Assays

Whole blood was collected from study participants utilizing Qiagen PAXgene vacutainer tubes (Qiagen, Hilden, Germany), while oral fluids were collected in mouthwash. Samples were processed and DNA extracted as previously reported [17]. KSHV VL was measured in the extracted DNA using qPCR assays targeting the human endogenous retrovirus 3 gene (ERV-3), used as a cell quantitation marker, and the KSHV K6 gene region [18, 19].

2.3 Library Preparation and NGS Sequencing

Genomic DNA was fragmented using a Covaris focused-ultrasonicator (Covaris, Woburn, MA, USA). Sequence libraries were generated using two different target enrichment kits: Agilent SureSelect XT (Agilent Santa Clara, CA) [10, 20, 21] and KAPA HyperCap (KAPA Biosystems Inc, Wilmington, MA). The genomic DNA input ranged from 200 ng to 3 µg for SureSelect and 100 ng for HyperCap. Both library preparation kits contain KSHV specific bait sets based on representative KSHV genomes of all K1 and K15 subtypes.

Samples were sequenced with Illumina MiSeq and NexSeq. 2000 instruments (Illumina, Hayward, CA) with either library preparation method generating 250 bp or 150 bp paired end reads. Some samples were sequenced with both methods; four, UTSW101, UTSW113, UTSW139, and UTSW141 to resolve poor KSHV coverage due to interference from high Epstein-Barr virus (EBV) load and one, UTSW107, to confirm KSHV multiple infections observed with SureSelect XT (Supporting information Table S1).

2.4 Viral Genome Assemblies

Near full-length KSHV genomes were generated combining reference-guided alignment against the KSHV reference genome NC_009333.1 (GK18) with a de novo assembly approach [9, 10]. The internal repetitive regions (NC_009333.1:g.24230-25045, 29927-30055, 118229-113914, 124784-126456, and 137169-137969), and the large terminal repeats between the KSHV K1 and K15 genes were masked in the final alignments as they cannot be computationally resolved. Samples with read depth coverage less than 30X or with high percentages of unresolved base calls were not assembled (Supporting information Table S1). All assembled genomes were manually curated to confirm sequence variants and to resolve areas of high variability including K1, vIRF-2, and K15 gene regions. Variable gene regions were manually examined for evidence of multiple infections observed as overlapping reads with distinctive polymorphisms. Additionally, de novo K1 and K15 gene-specific subassemblies were used to distinguish KSHV subtypes by mapping overlapping reads in these highly variable regions [10]. The curated sequence alignments were exported from Geneious (Geneious Prime 2022.0.2) as consensus FASTA files for submission to GenBank and further downstream analyzes [22].

2.5 Sanger Sequencing of the K1 Gene

For samples with KSHV VL estimates determined to be suboptimal for NGS, KSHV K1 subtype was identified by Sanger sequencing of nested-PCR products as previously reported. Briefly, the outer nested primers used were ATGTTCCTGTATGTTGTCTGC (outer forward) and AGTACCAATCCACTGGTTGCG (outer reverse) followed by inner nested primers GTCTGCAGTCTGGCGGTTTGC (inner forward) and CTGGTTGCGTATAGTCTTCCG (inner reverse). The PCR cycling conditions for both rounds of PCR were similar, consisting of 1 min 45 s at 95°C and 35 cycles of 1 min at 96°C, 45 s at 51°C (outer nest) and 58°C (inner nest), and 1 min at 72°C. The inner nested procedure used 5 µl of first round product and both rounds ended with a 5-min hold at 72°C. The final K1 sequence product size was 840 base pairs [23, 24]. All Sanger sequencing was performed using an Applied Biosystems 3130XL genetic sequencer (Thermo Fisher Scientific). The sequencing of each sample was performed multiple times and a minimum of 4 overlapping reads were used to assemble the K1 gene sequences [9].

2.6 Phylogenetic Analyzes

Three samples, UTSW107, UTSW595, and UTSW601 had unresolvable genomes indicating infections with multiple KSHV genomes and were consequently excluded from phylogenetic analysis. The remaining 22 new near full-length consensus sequences were aligned with 35 published KSHV genomes featuring all available K1 and K15 gene subtypes, using MAFFT v7.511 and default settings before importation into Geneious (Geneious Prime 2022.0.2) [22]. The internal repeat regions were masked in the alignment, and the remaining nucleotide bases were used for phylogenetic analysis. A SplitsTree v4.15.1 analysis was performed using the masked alignment [25]. The Neighbor-Net method with default settings of 1000 bootstrap replicates was used to construct a phylogenetic tree [26].

KSHV K1 gene subtypes were identified using a combination of BLAST similarity searches [27] followed with confirmation by phylogenetic tree analysis. The K1 gene sequences were translated to amino acid and an alignment was made in Geneious using the MAFFT module (v1.5.0) with default settings (Geneious Prime 2022.0.2) [22]. Individual K1 sequences in four samples with multiple KSHV infections were obtained using the K1 de novo subassembly for inclusion in the final alignment. The resulting amino acid alignments, including 63 publicly available KSHV subtype-specific references, were used to infer phylogenomic trees via the neighboring-joining method using IQTree version 2.2.0.5 [28] with default settings. The IQTree output file was visualized in FigTree version v1.4.4 (https://tree.bio.ed.ac.uk/software/figtree).

2.7 Analysis of Co-Infections by Multiple KSHV Genomes

Dot plots mapping the variant nucleotide positions across the KSHV genomes for the three multiple infection samples, UTSW595, UTSW601, and UTSW107 used previously published methods [10]. Visual examples of reads constituting mixed infections were illustrated for each genome using the K1, ORF25 and ORF47 gene regions respectively, generated by the reference guided assembly pipeline, and exported as screen shots from Geneious (Geneious Prime 2022.0.2) [22].

3 Results

3.1 KSHV DNA Load Measurement

The VLs of samples collected during the 2020–2021 recruitment period were previously reported for the 206 individuals without history of KAD [17]. In the current study, 14 individuals from the earlier study and an additional 59 people with history of KAD recruited more recently were tested by qPCR. KSHV DNA was detected by qPCR in 30% (85 of 280) of study participants in either whole blood or oral fluids (Table 1).

Table 1. Summary of characteristics for participants used for KSHV subtype analysis. KSHV subtype determinations were attempted with all 85 samples with detectable KSHV DNA by qPCR as listed. KSHV K1 subtypes were successfully determined for 66 DNA samples from study participants using a combination of next-generation (blue) and Sanger (black) sequencing methods. Samples with evidence of multiple KSHV infections are noted with the subtype with the highest read depth coverage listed first. Input KSHV copy estimates for samples sequenced with both SureSelect XT and KAPA HyperCap are included. KSHV VL estimated as < 3 copies are designated qualitative positive (QP) as these values are below the quantitation cutoff of the qPCR assays. Qualitative values are not calculated as KSHV copies per million cells.
Individual ID Country born Race/ethnicity KAD Age Material HIV status KSHV copies per million cell equivalents KSHV subtype (K1/K15 genes)
UTSW500 USA Black 23 OF Pos 60 000 A1/P
UTSW523 USA Black 41 OF Pos QP ND
UTSW535 USA Black 47 OF Pos 19 000 A5/P
UTSW540 USA Black 35 OF Pos 56 250 ND
UTSW546 USA White 51 OF Pos 2600 A4
UTSW547 USA Black 32 OF Pos 280 850 A4/M
UTSW548 Colombia Hispanic 38 OF Pos 3000 A4
UTSW556 USA Black KS 39 OF Pos 4750 C7
UTSW557 El Salvador Hispanic 33 OF Pos 126 300 A4/P
UTSW559 USA White 40 OF Pos 1 354 150 A4
UTSW560 Mexico Hispanic 57 OF Pos 57 895 E2/M
UTSW562 USA Black 57 OF Pos 8000 C3
UTSW564 USA Black KS 32 OF Pos 1100 C3
UTSW567 USA Other 37 WB Pos QP ND
UTSW568 Mexico Hispanic KS 44 WB Pos QP C3
UTSW569 USA Hispanic 49 WB Pos < 3 A3
UTSW571 USA White 44 OF Pos 600 000 B1/M
UTSW575 USA Hispanic 47 OF Pos 1 314 250 A4/P
UTSW578 Mexico Hispanic 54 WB Pos QP ND
UTSW583 USA Black 58 OF Pos 215 385 A4
UTSW586 Mexico Hispanic KS 38 WB Pos 255 C7
UTSW592 USA Hispanic 32 OF Pos 4440 A4
UTSW595 USA White KS 62 OF Pos 1 000 000 F2, C2/M
UTSW598 USA Black 27 WB Pos QP ND
UTSW601 USA Black 36 OF Pos 380 000 C1, C7/P
UTSW603 USA Hispanic 31 OF Pos QP C3
UTSW604 USA Black KS 34 OF Pos QP A4
UTSW605 USA Black KS 40 OF Pos 91 650 C7/P
UTSW607 USA Black 56 OF Pos 33 735 A2/M
UTSW613 Colombia Hispanic 30 WB Pos QP A3
UTSW615 USA Hispanic 24 OF Pos 14 615 F2
UTSW617 USA Black 34 OF Pos 375 A4
UTSW620 USA Black 55 WB Pos 415 ND
UTSW621 USA Hispanic 53 OF Pos 560 000 A2
UTSW622 USA Black 51 OF Pos 2 800 000 C1
UTSW623 USA Asian 57 OF Pos 17 330 C3
UTSW624 USA White 70 OF Pos 68 085 F2/M
UTSW627 USA White 45 OF Pos 1635 C3
UTSW628 USA Hispanic 37 OF Pos 2025 C3
UTSW631 USA Black 31 WB Pos 435 A1
UTSW637 USA Black 19 WB Pos QP ND
UTSW645 USA Black 32 OF Pos 13 000 A5
UTSW646 USA White 59 OF Pos 916 665 C1/M
UTSW651 USA White 52 WB Pos 680 ND
UTSW652 Mexico Hispanic 61 WB Pos QP ND
UTSW657 Mexico Hispanic 27 OF Pos 4 558 140 E2/M
UTSW662 USA White 53 OF Pos 46 510 A4
UTSW666 USA Black 28 OF Pos 139 620 ND
UTSW668 USA White 51 OF Pos QP ND
UTSW670 Honduras Hispanic 26 WB Pos QP ND
UTSW675 USA Black 23 OF Pos QP A4
UTSW676 USA Hispanic 51 OF Pos 6665 C3
UTSW677 Mexico Hispanic 60 WB Pos 180 A4
UTSW679 USA White 55 OF Pos 638 300 C1
UTSW683 El Salvador Hispanic 37 OF Pos 57 140 A5
UTSW689 USA White 47 OF Pos 14 440 A1
UTSW692 USA Black 35 WB Pos QP ND
UTSW702 USA Black 40 OF Pos 23 635 C1
UTSW704 USA White 44 OF Pos 7855 A3
UTSW706 Mexico Hispanic 52 WB Pos QP ND
UTSW707 USA Hispanic 27 OF Pos 1 259 260 A4
UTSW708 Honduras Hispanic 48 WB Pos QP ND
UTSW709 USA Black 32 OF Pos QP A4
UTSW712 USA Black 35 OF Pos 475 C3
UTSW714 Mexico Hispanic 56 WB Pos QP ND
UTSW717 USA Black 49 WB Pos QP ND
UTSW718 Mexico Hispanic 38 WB Pos QP ND
UTSW101 USA Hispanic KS 28 OF Pos 10 910 A1/P
UTSW105 Mexico Hispanic KS 43 OF Pos 705 880 A4/P
UTSW107 USA Black KS 45 OF Pos 7825 C3, B1/M, P
UTSW109 Mexico Hispanic KS, MCD 41 OF Pos QP A4
UTSW111 USA Hispanic KS 32 OF Pos 250 A4
UTSW113 USA Hispanic KS 43 OF Pos 8380 F2/M
UTSW118 USA White KS 62 OF Pos 21 110 F2/M
UTSW119 USA White KS 55 OF Pos 947 365 A4/P
UTSW124 Democratic Republic of Congo Black KS 54 OF Pos 700 B1
UTSW125 USA Black KS 41 OF Pos 4365 C3
UTSW130 USA White KS 64 OF Pos 31 665 F2/M
UTSW132 USA Hispanic KS & MCD 32 OF Pos 123 635 C3
UTSW136 Cameroon Black KS 69 OF Neg 4910 B1, A4/P
UTSW137 USA Hispanic KS 56 OF Pos 40 870 A5/P
UTSW138 Mexico Hispanic KS 53 OF Pos 330 A4
UTSW139 USA Black KS 42 OF Pos 3220 C7/P
UTSW141 USA Hispanic KS 43 OF Pos 105 880 A4/B1
UTSW144 Mexico Hispanic KS 47 OF Pos 49 230 A1/M
  • Abbreviations: KS, Kaposi's sarcoma; MCD, multicentric castleman disease; OF, oral fluids; WB, whole blood; ND, not determined.

3.2 Patient Characteristics

The characteristics of individuals with detectable KSHV DNA assessed for sequencing are shown in Table 1. The participants were men living with HIV with a median age of 43 (range 19–70 years), and 38% had histories of KAD. The median HIV load in the entire cohort was 24 copies/ml with an interquartile range of 0 to 66. All participants were prescribed ART; however, adherence was not assessed in this study. A total of 22 individuals from both studies had active KS at the time of enrollment. The population was predominantly Hispanic (41%) and Black (36%) with 22% White participants. KSHV K1 A and C subtypes predominate in the study (82%).

3.3 Newly Sequenced Near Full-Length KSHV Genomes and K1 Gene Subtyping

Fifty-nine of the 85 samples with detectable KSHV DNA by qPCR were Sanger sequenced to determine K1 gene subtype, which was successful in 41 participants as summarized in Table 1 and shown in Figure 2. A wide variety of KSHV K1 subtypes were observed including A1, A2, A3, A4, A5, B1, C1, and C3, as well as less frequently observed subtypes C7, E2, and F2. Data is available in GenBank using accession numbers PP789823-PP789863; PP952419-PP952426.

Details are in the caption following the image
KSHV K1 gene subtype determinations. A neighbor-joining phylogenetic tree analysis of the KSHV K1 gene amino acid sequence using IQTree. The analysis was performed using 37 K1 sequences obtained by Sanger (highlighted in blue) and an additional 6 K1 sequences resolved from mixed infections with de novo assembly obtained in the current study (highlighted in red). KSHV K1 subtype determinations were made by comparison to publicly available sequences shown in gray which included: GK18 (NC_009333), BC-1 (U75698.1), BCBL-1 (HQ404500.1), JSC-1 (MK143395.1), BCBL-1 (MT936340.1), K1-27/Ban (AF178796), K1-34/E40 (AF178801), K1-20/Gon (AF178789), K1-21/Gbo (AF178790), Ug374 (AF130289.1), Bot3 (AY329023.1), IER5 (AF130279.1), BCBL-B (AF133039.1), Ife10 (AF130283.1), Ug111 (AF130288.1), BCP-1 (AY787132.1), HKS22 (MH632216.1), Gab135NY (MT900822.1), C64 (AY850983.1), Cub-9/07 (FJ986135.1), Cub-127/07 (FJ986137.1), WAGU128 (AY940426.1), K1-12 Mou (AF178783.1), Ug81 (AF130291.1), P030 (MK876732.1), P075 (MK876735.1), P072 (MK876734.1), SPEL (AP017458), FNL0045 (OR829365), K1-52/Ali (AF178818), BR66 (KT215124.1), Ukma8 (AF130304.1), IER8 (AF130281.1), Ic9 (AF130273.1), Tupi 1 (AF220292.1), Tupi 2 (AF220293.1), Hua1 (AY329026.1), Hua3 (AY329026.1), Sio1 (AY329025.1), Sio2 (AY329024.1), AU1 (AF151687.1), ZKS3 (AF133044.1), TKS10 (MK176598.1), J25 (AF278845.1), J26 (AF278846.1), BR33 (KT215106.1), UKma24 (AF130301.1), Iap3 (AF130271.1), BCBL-R (AF133038.1), BCP-1 (AY787132.1), Ug52 (AF130290.1), P044 (MK876733.1), BC-2 (AF133042.1), BC-3 (MK876731.1), FNL0068 (OR829383), FNL0071 (OR829386), FNL0089 (OR829401), FNL0032 (OR829354), FNL0062 (OR829378), and FNL0051 (OR829371).

Using the KSHV load measured by qPCR, oral fluid DNA samples from twenty-six individuals were selected for NGS. Near full-length genomes were successfully obtained from 22 samples at greater than 30X (range 38X-5622X) read depth coverage (Supporting Information Table S1). One sample, UTSW124, with an estimated input of 17 KSHV copies/100 ng DNA, was sequenced by KAPA HyperPrep during optimization of the protocol to inform lower limit of KSHV load required for full genome coverage. UTSW124 was not included in full genome analysis, but the K1 gene sequence was included in the KSHV subtype analysis. The twenty-two new near full-length KSHV genomes were analyzed using SplitsTree v4.15.1 (Figure 3) and are available in GenBank as accessions PP768312-PP768333 [26].

Details are in the caption following the image
Near full-length SplitsTree analysis of KSHV genome. Splits network of 22 current (highlighted in blue) and 35 publicly available KSHV full and partial genomes (shown in gray) created using SplitsTree v4.51.1. K1 gene subtypes are indicated by black circles and colorized individual branches. A total of 139,207 nucleotide positions were used in the analysis with all repetitive regions excluded. Reference KSHV genomes used in the analysis include: GK18 (NC_009333.1), BC-1 (U75698.1), Japan1 (LC200589.1), Miyako 1 (LC200586.1), BCBL-1 (HQ404500.1), BrK.219 (KF588566.1), P044 (MK876733.1), P100 (MK876737.1), P133 (MK876738.1), P030 (MK876732.1), P075 (MK876735.1), P076 (MK876736.1), P072 (MK876734.1), UNC_KICS009 (MK733606.1), ZM091 (KT271455.1), ZM095 (KT271456.1), ZM128 (KT271467.1), ZM121 (KT271464), UG145 (SAMEA103926607), UG151 (SAMEA103926625), UG160 (SAMEA103926554), FNL0060_EAF (OR829377), FNL0064_MAF (OR829380), FNL0062_SA (OR829378), FNL0072_NA (OR829387), FNL003 (MN419220), FNL0021_NA (OR829344), FNL0039_EAF (OR829361), FNL0026_WAF (OR829348), FNL0038_CAR (OR829360), FNL0053_NA (OR829373), FNL0081_NA (OR829395), FNL0075_CM (OL829860.1), FNL0284_CM (OL829863.1) and FNL0175_CM (OL829891.1).

The individual KSHV genomes for samples UTSW595, UTSW601, and UTSW107 were unresolvable and had clear evidence of more than one KSHV genome. This is shown by subtype specific variations within variable gene regions spanning across the viral genome (Figure 4). Infection with multiple lineages was confirmed for UTSW107 using two independent library preparations. To visualize the mixed infections across the KSHV genome, positions of nucleotide polymorphisms identifying the mixed infections are plotted based on frequency of occurrence (Figure 4A). Minority genomes constituting less than 5% of reads are more easily detectable in highly variable gene regions like K1, as demonstrated for UTSW595, while multiple genomes with coequal frequencies are distinguishable across the viral genome, as shown in UTSW601 (Figure 4A).

Details are in the caption following the image
(A and B) Evidence of multiple KSHV infections. (A) Dot plots indicating single nucleotide variable positions are mapped across the KSHV genomes for four samples, UTSW595, UTSW601, and UTSW107 generated by mapping reads to a sample specific consensus sequence [10]. The KSHV reference genome, GK18 (NC_099333.1) which is a K15 P subtype, is used to illustrate the polymorphic nucleotide positions in study sequences and therefore any genome that does not have a K15 P subtype is not plotted beyond the ORF75 gene, preventing comparisons in that region. (B) Reference-guided alignments visualized in Geneious Prime 2022.0.2 showing variable nucleotide positions. Examples are provided for the K1 (UTSW595), ORF25 (UTSW601), and ORF47 (UTSW107) genes from each sample to demonstrate that subtype-specific reads can be visualized across the viral genome in cases of multiple infections if both genomes are present at > 5% of total read depth.

A wide range of KSHV K1 subtypes were identified in the new near full-length KSHV genomes, including the E2 and F2 for which limited information is available (Figure 3). K15 gene subtypes P and M were observed, while the N, more commonly seen in people born in Africa, was not present in the data set. Both individuals with KSHV K1 E2 subtypes were born in Mexico. Both E genomes were K15 M subtypes, in contrast the only other E genome currently available is from South America and has a K15 P allele (Figure 3). All participants with K1 F2 subtypes were born in the United States.

3.4 KSHV Gene Variations

Inspection of the genomes confirmed KSHV K1 subtype-specific indels and structural features recently described [10] including a sequence inversion between ORF8 and ORF9 observed in sequences with a F2 K1 subtype namely UTSW113, UTSW118, UTSW130 and UTSW624. The F2 subtype samples also have a two amino acid deletion in the ORF64 CDS shared with K1 A and C subtype samples UTSW535 and UTSW646. A K3 amino acid insertion was previously reported in samples of persons not born in Africa. This same K3 insertion was observed in K1 A subtype samples UTSW101, UTSW105, UTSW144, UTSW500, UTSW557, UTSW575, and UTSW607. A previously reported subtype-specific sequence inversion between ORF9-10 was observed in K1 B1 subtype sample UTSW547. Interestingly, the two K1 E2 genomes UTSW560 and UTSW657 share a distinctive 303 bp deletion between miR-K12-6 and miR-K12-5 with the only other published E2 sequence available, FNL0062 [10]. Additional observed structural features are summarized in Table 2.

Table 2. Summary of indels associated with KSHV K1 subtypes. All structural features listed were confirmed with reads spanning each indel. The average genome coverage is listed in supporting information Table S1 and ranges from 38 to 5422X. A minimum of 25 reads spanning the indels were required for validation.
Gene/Function Indel description Reference position UTSW sequences with Indel
ORF8-9 intron 26 bp inversion NC_009333.1:g.11277_11302inv 113, 118, 130, 595, 624
ORF9-10 intron 80 bp inversion NC_009333.1:g.14374_14465inv 547
ORF9-10 intron 4 bp deletion NC_009333,1:g.14393_14396del 113, 118, 130, 137, 141, 547, 560, 571, 624, 657
K3/E3 ubiquitin ligase; downregulation of MHC-I

11 amino acid insertion

QDGPAAGAPGN

NC_009333.1:g.18933_18934ins

GGAGCTGCCCCCGCGGGGCCATTTTGGTCGCCT

101, 105, 144, 500,

557, 575, 607

ORF45/virion phosphoprotein; inhibition of IRF-7

Series of polymorphisms

A67D, D69E, P71L, 105insD, H127N

NC_009333.1:g.68297 G > T (H127N)

NC_009333.1:g.68395_68396ins

GTC (D insertion)

NC_009333.1:g.68464 G > A (P71L)

NC_009333.1:g.68469 G > T(D69E)

NC_009333.1:g.68476 G > T (A67D)

113, 118, 130, 141, 547, 624
ORF46/viral uracil DNA glycosylase Amino acid changes I16N, K67R, K92R, G219A, H221Y *

NC_009333.1:g,69457 T > A (I16N)

NC_009333.1:g. 69336 A > G (K67R)

NC_009333.1:g.69304 A > G (K92R)

NC_009333.1:g.68848 G > C (G219A)

560, 657
ORF47/envelope glycoprotein gL

Series of polymorphisms

T66K, G67D, D68I, W94G, T111A, T114A, A119E, D123N, 125delSIHNV, N130S, I132L

NC_009333.1:g.69621 T > G (I132L)

NC_009333.1:g.69625 G > A (N130S)

NC_009333.1:g.69626 T > C (N130S)

NC_009333.1:g.69628_69642del (SIHNV deletion)

NC_009333.1:g.69646 A > G (D123N)

NC_009333.1:g.69648 C > T (D123N)

NC_009333.1:g.69659 G > T (A119E)

NC_009333.1:g.69675 T > C (T114A)

NC_009333.1:g.69679 G > T

NC_009333.1:g.69682 C > T (T111A)

NC_009333.1:g.69684 T > C (T111A)

NC_009333.1:g.69688 A > G

NC_009333.1:g.69691 T > C

NC_009333.1:g.69735 A > C (W94G)

NC_009333.1:g.69736 G > T

NC_009333.1:g.69745 T > C

NC_009333.1:g.69754 G > C

NC_009333.1:g.69774 T > G

NC_009333.1:g.69775 C > G

NC_009333.1:g.69778 A > G

NC_009333.1:g.69784 C > T

NC_009333.1:g.69790 C > A

NC_009333.1:g.69793 G > A

NC_009333.1:g.69796 T > G

NC_009333.1:g.69812 T > A (D68I)

NC_009333.1:g.69813 C > T (D68I)

NC_009333.1:g.69814 G > A (G67D)

NC_009333.1:g.69815 C > T (G67D)

NC_009333.1:g.69818 G > T (T66K)

NC_009333.1:g.69820 T > C

NC_009333.1:g.69829 C > A

113, 118, 130, 139, 605, 624, 646
ORF64/large tegument protein

Two amino acid deletion

2265delGQ

NC_009333.1:g.110899_110904del (GQ deletion) 113, 118, 130, 535, 624, 646
ORF74/tegument protein

Amino acid deletion

12delD

NC_009333.1:g.129553_129555del

(D deletion)

137 and 547
microRNA cluster region 303 bp deletion microRNA coding region NC_009333.1:g.121190_121492del 560 and 675

4 Discussion

The southern US, including Texas, continues to experience a comparatively higher incidence of KS compared to other US regions, particularly in Black men, despite widespread availability of anti-retroviral medications [1-3]. Our previous studies showed a high KSHV seroprevalence of 68% in MSM with HIV in Dallas, Texas [29], as well as high levels of KSHV DNA in oral fluids. Behavioral factors associated with KSHV seropositivity included self-reported use of methamphetamines and oral-anal and/or oral-penile sex [17]. Additionally, persistent racial health care disparities limit access to treatment, which affects outcomes [30]. In the current study, the distribution of KSHV subtypes was investigated in samples selected based upon the KSHV load results in whole blood and oral fluids.

While A and C subtypes were the most common, a striking variety of KSHV K1 sequences was found, including all major subtypes except D, which likely reflects the diverse underlying population in the Dallas, Texas area. The E2 subtype was observed in two individuals born in Mexico. We recently reported a similar diversity of KSHV K1 subtypes among patients enrolled in clinical studies at the HIV and AIDS Malignancy Branch (HAMB) of the National Cancer Institute in Bethesda, Maryland [10]. The populations differed between the studies, however, as people referred to the NCI for KAD were born in 22 different countries whilst in this study, participants were recruited in a community-based setting and were mostly born in the U.S. (63 of 85, 74%) or neighboring Mexico (14 of 85, 16%).

Studies of KSHV subtypes in the United States have been sparse for the last twenty years and the current study suggests that the distribution of viral subtypes in the context of high KSHV seroprevalence may be more diverse than appreciated. The area that is now the state of Texas has been settled by people of diverse ancestries over the course of the last several hundred years. A 2001 study of KSHV genotypes conducted in a population of PWH and individuals with classical KS in the San Antonio area, primarily with Hispanic ancestry, found a similar predominance of C and A K1 subtypes [31]. Notably many of the sequences obtained from Hispanic participants in the 2001 study were of the K15 M subtype [31] which was also observed in the current study. High KSHV K1 subtype diversity reported in recent studies from Spain and Ireland suggests that KSHV genomes may not be as localized to specific regions as previously assumed [32, 33]. Interestingly, a study from Brazil of 550 KSHV K1 sequences published in GenBank between the years 1997–2020 reported a similarly wide subtype distribution [14]. In contrast, in a KS case-control study conducted in Cameroon we observed only A5 and B K1 subtypes [21]. The current KSHV genome worldwide distribution patterns rely upon available sequencing data which is disproportionally biased to genomes obtained from samples with higher KSHV viral copies. Sometimes referred to as convenience samples, they have done much to inform but do not necessarily reflect the true viral genome distribution as not all members of any specific population are equally represented. Recent improvements in sequencing technologies, including whole genome next-generation platforms, allow the sequencing of samples with lower KSHV estimated VL. It is possible that with increased KSHV genome sequencing efforts, a more informative distribution of viral subtypes within regions of high associated disease incidence can be obtained.

In three individuals, one of whom, UTSW601, did not have KS disease, multiple KSHV infections were identified by NGS. This is the latest of several recent reports of individuals with more than one detectable KSHV genome, adding strength to the hypothesis that multiple infections may not be uncommon [9, 10, 21]. Accumulating evidence of infections by multiple genomes suggests that pre-existing KSHV infections may not be protective of infection against infection during subsequent exposures. Whether multiple infections occur simultaneously or are acquired sequentially during the lifetime of an individual cannot be determined in these cross-sectional data. The observation of multiple KSHV genomes in oral fluids has implications for the understanding of viral transmission and the development of a KSHV vaccine for which efforts are intensifying [34-37]. The new KSHV genomes were also examined for structural variations and compared to publicly available sequences as summarized in Table 2. Many indels previously reported are also observed in this study, including sequence inversions and large deletions [10]. Polymorphisms in the ORF46 gene encoding viral uracil DNA glycosylase, which have previously been shown to affect the function of the protein, were present in the current E2 study genomes, except for H221Y, the second variation noted within the extended leucine loop. The consequences of the specific series of variations observed in the new E2 sequences (I16N, K67R, K92R, and G219A) in terms of the function of the viral uracil DNA glycosylase are currently unknown [10, 38].

In summary, the results of this study of KSHV K1 subtype distribution in a single institution in the Dallas, Texas area show an unexpectedly high diversity of viral genomes. The observed wide variety of subtypes likely reflects the rich and long-standing ethnic diversity in the region and the mobility of its population. It could also reflect high KSHV transmission in highly populated cities in general. Other communities with diverse populations may harbor KSHV K1 subtypes not commonly reported in North America, which has implications for viral dynamics and evolution. The near-full length genomes obtained in this study informs global KSHV genetics and facilitates efforts to characterize viral genomes outside of the variable K1 and K15 gene regions. Importantly, analysis of the twenty-two new genomes confirms previous observations of variations possibly associated with KSHV K1 subtypes, which may ultimately allow a new definition of KSHV specific genotypes using whole viral genomes. As additional KSHV genomes become available, variations can be more precisely analyzed to determine any contribution of sequence variations to KSHV disease, transmission, and outcomes. Future efforts to more precisely define circulating KSHV subtypes within populations can benefit from advantages in sequencing techniques that allow samples of lower estimated viral load to be evaluated, thereby expanding surveys to include more of the general population without disease.

Author Contributions

Authors Vickie A. Marshall, Sheena M. Knights, Nazzarena Labo, Elizabeth Y. Chiao, Susana M. Lazarte, Elizabeth Y. Chiao, Denise Whitby, and Ank E. Nijhawan designed the study. Sheena M. Knights, Susana M. Lazarte, and Ank E. Nijhawan oversaw recruitment, provided participant care, and supervised protocols. Vickie A. Marshall, Nazzarena Labo, Wendell J. Miley, Isabella Liu, Charles A. Goodman, and Elizabeth Y. Chiao curated data including bioinformatics and statistical analyzes. Vickie A. Marshall and Kyle N. Moore preformed qPCR testing and Wendell J. Miley provided serological assay support. Vickie A. Marshall, Isabella Liu, and Elizabeth Y. Chiao conducted all sequencing protocols. Brandon F. Keele and Christine M. Fennessey provided technical support for whole genome sequencing applications. All authors contributed to manuscript writing, review, and editing.

Acknowledgments

We would like to thank the participants and their families without whom this study would not have been possible. Special thanks also to Leslie Lipkey and Agatha Macairan from the Retroviral Evolution Section, ACVP for technical support and Graphics designer Joseph Meyer at Scientific Publications, Graphics, and Media, FNLCR for his expertise in figure conceptualization and design. This study was supported by the National Center for Advancing Translational Science (NCATS) [grant number 1UL1TR003163-02] (SK, AN), as well as a Translational Pilot Program Award from the Simmons Comprehensive Cancer Center at the University of Texas Southwestern Medical Center (SK, AN). Additionally, this project has been funded in whole or in part with federal funds from the National Cancer Institute, National Institutes of Health, under Contract No. 75N91019D00024/HHSN261200800001E (VAM, NL, IL, WJM, EMC, CAG, CMF, KNM, BFK, and DW). The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.

    Ethics Statement

    Studies were approved by the University of Texas Southwestern Institutional Review Board (STU 2019-1204 and STU 2022-0355) and all participants provided informed consent at enrollment according to the Declaration of Helsinki.

    Conflicts of Interest

    The authors declare no conflicts of interest.

    Data Availability Statement

    The data that support the findings of this study are available from the corresponding author upon reasonable request. KSHV sequence information is available in GenBank referencing accession numbers PP768312-PP768333 and PP789823-PP789863; PP952419-PP952426.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.