Volume 74B, Issue 2 pp. 79-90
Original Article
Free Access

Flow cytometric lymphocyte subset enumeration: 10 years of external quality assessment in the Benelux countries

Wilfried H. B. M. Levering

Corresponding Author

Wilfried H. B. M. Levering

Laboratory for Histocompatibility and Immunogenetics, Sanquin Blood Bank South West Region, Rotterdam, The Netherlands

Laboratory for Histocompatibility and Immunogenetics, Sanquin Blood Bank South West Region, P.O. Box 23370, 3001 KJ Rotterdam, The NetherlandsSearch for more papers by this author
Wessel N. van Wieringen

Wessel N. van Wieringen

Department of Statistics, Erasmus Medical Center, Rotterdam, The Netherlands

Search for more papers by this author
Jaco Kraan

Jaco Kraan

Laboratory for Clinical and Tumor Immunology, Department of Internal Oncology, Erasmus Medical Center, Rotterdam, The Netherlands

Search for more papers by this author
Wil A. M. van Beers

Wil A. M. van Beers

Diagnostic Center RDGG, Delft, The Netherlands

Search for more papers by this author
Kees Sintnicolaas

Kees Sintnicolaas

Laboratory for Histocompatibility and Immunogenetics, Sanquin Blood Bank South West Region, Rotterdam, The Netherlands

Search for more papers by this author
Dick J. van Rhenen

Dick J. van Rhenen

Laboratory for Histocompatibility and Immunogenetics, Sanquin Blood Bank South West Region, Rotterdam, The Netherlands

Search for more papers by this author
Jan W. Gratama

Jan W. Gratama

Laboratory for Clinical and Tumor Immunology, Department of Internal Oncology, Erasmus Medical Center, Rotterdam, The Netherlands

Search for more papers by this author
First published: 11 September 2007
Citations: 27

How to cite this article: Levering WHBM, van Wieringen WN, Kraan J, van Beers WAM, Sintnicolaas K, van Rhenen DJ, Gratama JW. Flow cytometric lymphocyte subset enumeration: 10 years of external quality assessment in the benelux countries. Cytometry Part B 2008; 74B: 79–90.

Abstract

A biannual external quality assessment (EQA) scheme for flow cytometric lymphocyte immunophenotyping is operational in the Benelux countries since 1996. We studied the effects of the methods used on assay outcome, and whether or not this EQA exercise was effective in reducing between-laboratory variation. Eighty test samples were distributed in 20 biannual send-outs. Per send-out, 50–71 participants were requested to enumerate CD3+, CD4+, and CD8+ T cells, B cells, and NK cells, and to provide methodological details. Participants received written debriefings with personalized recommendations after each send-out. For this report, data were analyzed using robust multivariate regression. Five variables were associated with significant positive or negative bias of absolute lymphocyte subset counts: (i) platform methodology (i.e., single-platform assays yielded lower CD4+ and CD8+ T-cell counts than did dual-platform assays); (ii) sample preparation technique (i.e., assays based on mononuclear cells isolation yielded lower T-cell counts than those based on red cell lysis); (iii) gating strategies based on CD45 and sideward scatter gating of lymphocytes yielded higher CD4+ T-cell counts than those based on “backgating” of lymphocytes guided by CD45 and CD14); (iv) stabilized samples were generally associated with higher lymphocyte subset counts than nonstabilized samples; and (v) laboratory. Platform methodology, sample stabilization, and laboratory also affected assay variability. With time, assay variability tended to decline; this trend was significant for B-cell counts only. In addition, significant bias and variability of results, independent of the variables tested for in this analysis, were also associated with individual laboratories. In spite of our recommendations, participants tended to standardize their techniques mainly with respect to sample preparation and gating strategies, but less with absolute counting techniques. Failure to fully standardize protocols may have led to only modest reductions in variability of results between laboratories. © 2007 Clinical Cytometry Society.

Enumeration of the major lymphocyte subsets yields important information for diagnosis and monitoring of a variety of conditions affecting the immune system. The discovery of human immunodeficiency virus (HIV) in the 1980s as the causative agent of the destruction of CD4+ T cells leading to the acquired immunodeficiency syndrome (AIDS) was the major drive behind the evolution of flow cytometry from research tool to routine diagnostic technique (reviewed in Ref.1). Progressive depletion of CD4+ T cells is associated with an increased likelihood of severe HIV disease and an unfavorable prognosis (2, 3). CD4+ T-lymphopenia is also associated with opportunistic infections in recipients of allogeneic hematopoietic stem cell transplants (4) and is also a risk factor for skin cancer in renal transplant recipients (5).

Accurate and reliable measures of CD4+ T cells are important as a quantitative tool for immune status assessment and for health care management of individuals infected with HIV (6-10). In addition to CD4+ T-cell counts, CD8+ T-cell counts are relevant. The CD8+ T cells can be activated and increase in counts—inversely to CD4+ T cells—in patients with progressive HIV infection (11). Furthermore, CD8+ T-lymphocytosis is a hallmark of primary immune responses to cytomegalovirus (CMV) and Epstein-Barr virus in otherwise healthy carriers (12), and is associated with recovery from reactivated CMV infection in renal and SCT recipients (13, 14). Also, monitoring of CD4+ and CD8+ T-cell counts may provide predictive and prognostic information in patients with metastatic melanoma receiving chemoimmunotherapy (15). Furthermore, lymphocyte subset enumeration is an important part of the diagnostic workup in patients with acute leukemia and chronic lymphoproliferative disorders (16). In addition, serial monitoring of lymphocyte subsets allows the evaluation of treatment efficacy in patients with posttransplant lymphoproliferative disorders (17). Another application is to monitor the effectiveness of nutrition supplements in patients receiving peritoneal dialysis (18).

Since the 1990s, several protocols for flow cytometric lymphocyte subset enumeration have been developed (reviewed in Ref.19). In response to the increased dependence of clinical decision-making on this assay, various (inter)national guidelines for lymphocyte subset enumeration, as well as external quality assessment (EQA) schemes, have been set up. Until now, a large variability between results of individual centers has been observed in multicenter studies (20-27). In 1995, we organized a pilot multicenter study in Belgium, The Netherlands, and Luxemburg (“Benelux” countries) as an introduction to a biannual EQA scheme. This scheme was aimed to cover the major lymphocyte subsets (i.e., CD3+, CD4+, CD8+ T cells, CD19+ B cells, and CD3, CD56+ NK cells). Here, we review our experience with this scheme during its first 10 years. We addressed the impact of various methodological aspects on assay outcomes with the aim to identify the main sources of between-laboratory variation, and to document the efficacy of our EQA scheme to reduce this variation.

MATERIALS AND METHODS

Study Design

We evaluated 20 send-outs comprising 80 peripheral blood samples distributed to laboratories that participated in the biannual external quality assessment (EQA) scheme for flow cytometric immunophenotyping organized within the Benelux. This scheme was run under the auspices of the Foundation for Immunophenotyping in Hemato-Oncology (SIHON), the Foundation for Quality Control in Medical Laboratories (SKML; both in The Netherlands), and the Belgian Association for Analytical Cytometry (BVAC/ABCA). After informed consent, 73 patients and three apparently healthy donors each donated 100 ml of EDTA-anticoagulated venous blood. Seventy-five samples were distributed without stabilization; one sample from a healthy donor underwent short-term stabilization using StabilCyte™ (BioErgonomics, St Paul, MN). Three of the four remaining samples were either discarded units of blood for transfusion that had been long-term stabilized by UK NEQAS for Leucocyte Immunophenotyping (Sheffield, UK) (28) or a commercial stabilized blood preparation (Ortho AbsoluteControl™; Ortho-Clinical Diagnostics [Raritan, NJ]). The clinical diagnoses associated with the 73 patient samples were: status after allogeneic hematopoietic stem cell transplantation (n = 18); B-chronic lymphocytic leukemia (B-CLL; n = 13); severe fatigue (n = 11); B-cell non-Hodgkin's lymphoma (n = 7); monoclonal B-cell population with undetermined significance (n = 4); T-large granular lymphocytic leukemia (T-LGL; n = 4); leukocytosis (n = 3); multiple myeloma (n = 2); and acute Epstein-Barr virus infection, acute myeloid leukemia, anemia, angioimmunoblastic lymphadenopathy, bladder carcinoma, Burkitt's lymphoma, eosinophilia, polyclonal B-cell lymphocytosis, rheumatoid arthritis, severe aplastic anemia, and T-cell non-Hodgkin's lymphoma (n = 1 each).

Except for the patients with B-CLL or T-LGL, none had expansions of phenotypically abnormal CD3+, CD4+, or CD8+ T cells, B cells, or NK cells. In the latter two groups of patients, the abnormal lymphocytes did not preclude the enumeration of CD3+, CD4+, CD8+ T cells, B cells, and NK cells using standard protocols. The samples were selected so as to include all possible abnormalities in leukocyte counts and/or proportions of lymphocytes; in only 23 samples (29%), both parameters were normal (Table 1). The samples were divided in ∼1.5 ml aliquots and shipped by overnight courier to the participants. Each participant was requested to perform lymphocyte subset enumeration according to its routine protocol and to answer a questionnaire on methodological details. Data processing and analysis for anonymous debriefing of the EQA results were centrally performed by the SKML datacenter. For each send-out, an overall debriefing report was issued to all participants, and discussed at biannual participant meetings. In addition, each participant received an individual report of its data with specific comments and recommendations in case of outlying results. In spring 2001, all participants were invited to participate in a workshop in which single-platform enumeration methodology combined with dual-anchor T-cell gating strategy was addressed as the “state-of-the-art” method. Fifty-five laboratories participated to this workshop (29); in case of poor performance, dedicated hands-on training was offered. The remaining laboratories participated in the regular spring 2001 EQA programme (send-out 11; see latter).

Table 1. Overview of 80 Distributed Samples by Leukocyte Count and Lymphocyte Proportion
Absolute numbers of leukocytes Proportion of lymphocytes Number of samples
Low Low 7
Low Normal 5
Low High 3
Normal Low 11
Normal Normal 23
Normal High 11
High Low 5
High Normal 3
High High 12
  • Normal range of absolute number of leukocytes, 4.0 – 10.0 × 109/l, normal range of proportion of lymphocytes (i.e., percentage of leukocytes), 15–40%.

Data Processing and Parameter Classification

Prior to data processing, ambiguous data entries were corrected after review with the submitting participant. Each laboratory was assigned a unique number (ULN) for referral purposes. The absolute numbers of CD3+ T cells, CD4+ T cells, CD8+ T cells, CD19+ B cells, and NK cells were assigned as response variables. The influence of the following six categorical variables, next to ULN, on the outcomes of flow cytometric immunophenotyping assays (i.e., systematic differences [“bias”] and between-laboratory variability [“variation”]) were investigated, and are summarized in Table 2 and discussed in detail latter.

Table 2. Overview of Categorical Variables
Variable Categories
EQA send-out 1–20
Unique laboratory number (ULN) n.a.
Workshop 2001 participation Yes
No
Sample stabilization No stabilization
Stabilization
Gating strategy FSC-SSC
CD45-CD14
CD45-SSC
SSC-CD45-CD3
T-gating
Platform methodology Single
Dual
Sample preparation Lyse and wash
Lyse no wash
No lyse no wash
Mononuclear cells
  • n.a., Not applicable.
  • a Send outs were numbered sequentially as a function of time.

EQA send-out.

Each send-out, from spring 1996 to autumn 2005, was chronologically assigned with a unique number (i.e., 1–20). In this way, we analyzed any effect of the EQA program on the variation of results as a function of time.

Participation to “workshop 2001”.

Participation to this workshop was offered to all participants to this EQA scheme (see earlier) (29). We analyzed the effect of participation to this educational activity on the variation of results.

Sample stabilization.

Long-term stabilization of whole blood has been shown to reduce sample deterioration in EQA exercises for lymphocyte subset enumeration and CD34+ cell counting (22, 23, 28-32). We distinguished two categories: (i) no stabilization (n = 75) and (ii) stabilization (i.e., short-term and long-term stabilization, n = 5).

Gating strategies.

We distinguished five gating strategies: (i) FSC-SSC, (ii) CD45-CD14, (iii) CD45-SSC, (iv) SSC-CD45-CD3, and (v) The FACSCount™ System [Becton Dickinson Biosciences (San Jose, CA) (BD Biosciences)], further referred as T-gating.
  • i

    FSC-SSC gating. In the late 1970s, implementation of single color analysis combined with dual light scatter for flow cytometric immunophenotyping was introduced. The combined FSC and SSC characteristics of leukocytes allowed the distinct clustering of lymphocyte, monocyte, and granulocyte populations in a bivariate histogram (1). Once cell lineage specific markers were identified and multicolor (i.e., three or more) flow cytometry became available, FSC-SSC gating of lymphocytes became outdated.

  • ii

    CD45-CD14 gating. In the early 1990s, the combined analysis of immunophenotype and light scatter characteristics was introduced for gating on lymphocytes (33). By identifying the cell population of interest based on immunofluorescence, a light scatter window can then be drawn to include all (greater than or equal to 98%) of the lymphocytes. With this procedure, also known as “backgating,” recovery of the lymphocytes within the lymphocyte gate can be optimized. This information can also be used to identify cells other than lymphocytes within the light scatter gate. In this way, it is possible to correct subsequent analyses since the reactivity of monoclonal antibodies on monocytes and granulocytes can be accounted for once cells other than lymphocytes have been identified as being within the acquisition gate (33).

  • iii

    CD45-SSC gating. Here, lymphocytes are identified by their CD45 and SSC characteristics (i.e., CD45bright, SSClow). CD45-SSC gates placed on lymphocytes should contain >95% lymphocytes (9, 34). A possible disadvantage of this approach is the risk for exclusion of CD19+ B cells and NK cells from the lymphocyte gate; CD19+ B cells express slightly less CD45 than do T cells, while NK cells have bright CD45 fluorescence but slightly higher SSC signals than the majority of lymphocytes) (34).

  • iv

    SSC-CD45-CD3 gating. With the introduction of fluorochromes such as peridinin chlorophyllin (PerCP), and allophycocyanin (APC), and tandem fluorochromes such as PE-Cy5 and PE-Texas Red, three- and four-color flow cytometric analyses became feasible. This development extended the possibilities for lymphocyte gating. With the “dual-anchor” approach (35), lymphocytes are selected first on the basis of bright CD45 expression and low SSC, followed by the selection of T cells on the basis of their CD3 positivity. Counterstaining of the T cells for CD4 and/or CD8 allows their further characterization (34, 35). These gating strategies were adopted subsequently in the CDC (36), NIAID-DAIDS (37, 38), and British Committee for Standards in Haematology (BCSH) (39, 40) guidelines.

  • v

    T-gating. The FACSCount™ single-platform kit by BD Biosciences is based on counting beads (TruCOUNT™ tubes) and a no-lyse, no-wash (NLNW) sample preparation procedure. The kit utilizes two panels, that is, CD4 PE/CD3 PE-Cy7 and CD8 PE/CD3 PE-Cy7 mAb mixtures. Using a CD3-FSC gate, most of the erythrocytes, platelets, monocytes, and granulocytes are excluded. A known number of reference beads included in each reagent tube functions as a fluorescence and quantitation standard for calculation of absolute CD3+, CD4+, and CD8+ T-cell counts. In addition, a few laboratories used nonstandard gating strategies or did not provide information on this point (see legend to Fig. 1). These laboratories have been grouped as “Remainder” in Figure 1 (Panel A).

Details are in the caption following the image

Change of usage patterns of methods over time. Panel A, gating strategies: □, FSC-SSC; ▴, CD45-CD14; ○, CD45-SSC; •, SSC-CD45-CD3; *, T-gating, ♦, remainder (i.e., SSC-CD45-CD33-CD14; T/B lineage or no information provided). Panel B, platform methodology: ⋄, single platform; □, dual platform. Panel C, sample preparation: □, lyse and wash; ♦, lyse no wash; *, no lyse no wash; ▴, MNC; ⋄, remainder (i.e., no information provided). The vertical lines indicate the timing of the 2001 educational workshop (see Materials and Methods).

Platform methodology.

Absolute cell counts are traditionally assessed using a “dual-platform” technique that are as follows: (i) the flow cytometer provides the cell percentages as fractions of a denominator, that is, WBC or lymphocytes, and (ii) the hematology analyzer provides the absolute WBC count together with a differential count, which must include the denominator. In the late 1990s, “single platform” techniques were introduced: the absolute cell counts are directly assessed on the flow cytometer in a precisely determined volume of blood sample. Single platform techniques can either be volumetric (41) or based on counting beads (42). The use of single platform techniques reportedly reduces between-laboratory variation in lymphocyte subset enumeration by eliminating the lymphocyte proportion from the hematology analyzer as a source of variation (22, 43, 44).

Sample preparation.

We distinguished four categories: (i) lyse and wash (LW); (ii) lyse no wash (LNW); (iii) no lyse no wash (NLNW); and (iv) gradient separation of mononuclear cells (MNC). Variable cell losses, due to the use of different sample preparation techniques, would increase between-laboratory variation. Small numbers of laboratories did not provide information on this point (see legend to Fig. 1). These laboratories have been grouped as “Remainder” in Figure 1 (Panel C).

Statistical Analysis

For data processing and statistical analyses, the “open source” program “R” (http://www.r-project.org/) was used (Lucent Technologies, Murray Hill, NJ). First, a descriptive analysis of raw data of flow cytometric immunophenotyping was performed. This analysis revealed that the measurements had a low variability at low cell counts and a high variability at high cell counts, which is often observed with cell enumeration data. As standard statistical techniques require an approximately constant variability over the whole cell count range, the data were logarithmically transformed. The logarithm is the transformation of choice for count data (45). To assess the effect of multiple variables on the log-transformed flow cytometric immunophenotyping data we used robust multivariate regression (46). This approach is less sensitive to outliers than standard multivariate regression analysis. We then addressed two aspects of the quality of the log-transformed lymphocyte subset counts: bias (i.e., systematic differences) and variability (i.e., random differences) in separate analyses. Analysis of the mean of the log-transformed data revealed which variables caused systematic differences (“bias”) in the mean lymphocyte subset counts. Subsequently, the bias was removed; the residuals of that analysis were used to investigate the variability of the lymphocyte subset counts. To this end the absolute values of these residuals (termed absolute error, which is related to the standard deviation) was used. As the distribution of the absolute errors was highly skewed, we have applied a Box-Cox transformation (47) to reduce this problem. A robust multivariate regression analysis of the Box-Cox-transformed absolute errors was then performed to assess which variables affected the variability of lymphocyte subset counts. For each variable, the category with the most observations was chosen as benchmark. After log-transformation, the transformed data are shown in a linear scale with benchmark = 1, while after Box-Cox transformation, the transformed data are shown in a linear scale with benchmark = 0.

RESULTS

Methods Used and Change of Usage Patterns Over Time

From 1996 to 2005, 104 laboratories participated to 20 send-outs in our EQA scheme (50–71 participants per send-out). Data from 14 laboratories that submitted results to only one or two send-outs were excluded to avoid imbalance in the data. For analysis of list-mode data, various gating strategies have been used (Fig. 1, panel A). The use of the “older” gating strategies declined over time: FSC-SSC (from 27% in 1996 to 8% of participants in 2005) and CD45-CD14 (from 68% in 1996 to 11% in 2005). In contrast, usage of methods recommended by guidelines (CDC, NIAID-DAIDS, and BCSH) increased: CD45-SSC (from 3% in 1996 to 69% in 2005) and dual-anchor gating (i.e., SSC-CD45-CD3) (from 0% in 1996 to 8% in 2005). After the 2001 workshop in which 55 laboratories participated, the use of CD45-SSC gating clearly increased, while that of lymphocyte “backgating” (CD45-CD14) decreased. To establish absolute counts (Fig. 1, panel B), only 5% of the laboratories had adopted the single platform technique in 1996 versus 47% in 2005. The 2001 workshop did not lead to an accelerated implementation of single platform techniques. For sample preparation (Fig. 1, panel C), the use of “lyse no wash” methods increased with time at the expense of that of “lyse and wash” methods, especially after the 2001 workshop. Eventually, all single platform users and a small proportion of dual platform users had adopted “lyse no wash” methods.

Factors Affecting the Outcomes of Lymphocyte Subset Enumeration

We studied which of the seven categorical variables (Table 2) significantly influenced the outcomes (i.e., bias) of lymphocyte subset enumeration. The effects of five variables were significant and are shown in Figures 2 and 3. The effects of the two remaining factors (i.e., participation to “workshop 2001” and gating strategy) were not significant.

Details are in the caption following the image

Factors significantly affecting the outcome of lymphocyte immunophenotyping. A: platform methodology. B: sample preparation. C: sample stabilization. D: gating strategy. The category with most observations was used as benchmark and assigned a factor value of 1.00 (marked with closed symbols). The graphs show the relative difference of the other categories related to the benchmark. Horizontal bars indicate 95% confidence intervals of the estimates of each factor value. P-values <0.05 are shown. Lymphocyte subsets are represented by the following symbols: ⋄, CD3+ T cells; □, CD4+ T cells; Δ, CD8 T cells; ○, B cells; ▿, NK cells.

Details are in the caption following the image

Outcomes of CD4+ T-cell (panel A) and NK-cell (panel B) enumerations by laboratory. Laboratory 27 had no missing data and was therefore chosen as “benchmark” with factor value 1.00 (marked with a large, closed circle). The factor values reflect the relative differences between each individual laboratory and the benchmark laboratory. Laboratories with a significant bias relative to the benchmark laboratory are marked with arrows (i.e., P-values <0.05).

Platform methodology (Fig. 2, panel A).

Most of the results were obtained using dual platform techniques. Therefore, this strategy was used as benchmark and assigned a factor value of 1 (see Materials and Methods). In comparison, significantly lower results were obtained for CD4+ and CD8+ T cells using single platform as compared with dual platform techniques, while no significant bias was observed for CD3+ T cell, B-cell, and NK-cell counts. The relatively few observations on B cells using single-platform techniques contributed to a wide confidence interval for this parameter.

Sample preparation (Fig. 2, panel B).

Most results had been obtained using “lyse and wash” methods. In comparison, lower counts were obtained for CD3+, CD4+ and CD8+ T cells after mononuclear cell isolation (significant for CD3+ and CD8+ T cells only). The results obtained with “lyse no wash” and the small number of observations using “no lyse no wash” methods were not significantly different from those obtained using “lyse and wash” methods.

Sample stabilization (Fig. 2, panel C).

Most results had been obtained using nonstabilized samples. Significantly higher counts were obtained for all subsets except B cells using stabilized samples.

Gating strategy (Fig. 2, panel D).

Most results had been obtained using the “CD45-CD14” gating strategy, and relatively few using the “T-gating” strategy. The “CD45-SSC” gating strategy yielded higher outcomes for CD3+, CD4+, and CD8+ T-cell counts (significant for CD3+ and CD4+ T cells only). Lower results were obtained for NK-cell counts using the “SSC-CD45-CD3” gating strategy. The outcomes of the remaining two strategies, that is, “FSC-SSC” and “T-gating,” did not differ significantly from those of the “CD45-CD14” gating strategy.

Laboratory (as defined by ULN; Fig. 3).

Laboratory 27 was chosen as benchmark as it had no missing observations (Fig. 3). The results of CD4+ T cells by 90 laboratories are shown in panel A. Six laboratories (i.e., 18, 20, 21, 34, 41, and 63) had factor values ∼1.00, that is, generated similar results as laboratory 27. The bias of CD4+ T-cell counts by 42 laboratories was positive in comparison with the benchmark laboratory (i.e., factor value >1.00); this bias was significant in eight (indicated with arrows). The bias of 41 laboratories was negative, which reached significance in two (indicated with arrows). The patterns of deviation of CD3+ T-cell, CD8+ T-cell, and B-cell counts were similar to those of CD4+ T cells (data not shown). The NK-cell counts, reported by 84 laboratories, stood out by having a pattern of bias that clearly differed from that of CD4+ T cells (panel B). Here, five laboratories (i.e., 11, 33, 47, 50, and 81) had factor values ∼1.00, while the NK-cell counts of 17 were positively—but not significantly—biased, and those of 61 were negatively biased (significantly in 19 of them). Thus, benchmark laboratory 27 reported relatively high NK-cell counts in comparison with most of the other 83 laboratories.

Factors Affecting the Variability of Lymphocyte Subset Counts

We studied which of the seven categorical variables (Table 2) significantly influenced the variability of lymphocyte subset count measurements. The effects of four variables were significant, as shown in Figures 4-6 and discussed latter. The effects of the three remaining factors (i.e., participation to “workshop 2001,” gating strategy and sample preparation) were not significant.

Details are in the caption following the image

Factors significantly affecting the variability of lymphocyte immunophenotyping. Panel A: platform methodology. Panel B: sample stabilization. The category with most observations was used as benchmark and assigned factor value 0.00 (marked with closed symbols). The graphs show the relative differences between the other categories and the benchmark. Horizontal bars indicate 95% confidence intervals of the estimates of each factor value. P-values <0.05 are shown. Lymphocyte subsets are represented by the following symbols: ⋄, CD3+ T cells; □, CD4+ T cells; Δ, CD8 T cells; ○, B cells, ▿, NK cells.

Details are in the caption following the image

Variation of CD4+ T-cell (Panel A) and B-cell (Panel B) enumeration by laboratory. Laboratory 27 had no missing data and was therefore chosen as “benchmark” with factor value 0.00 (marked with a larger, closed circle). The factor values reflect the relative difference between each laboratory and the benchmark laboratory. Laboratories with a significantly larger (factor value >0) or smaller (factor value <0) variation than the benchmark laboratory are marked with arrows (i.e., P-values <0.05).

Details are in the caption following the image

Analysis of the variability of lymphocyte subset enumeration by send-out. For each subset, the first send-out is shown at the bottom of each series, followed by the other 19 ranked in ascending order as a function of time. Lymphocyte subsets are represented by the following symbols: ⋄, CD3+ T cells; □, CD4+ T cells; Δ, CD8 T cells; ○, B cells; ▿, NK cells. For each lymphocyte subset, the first send-out was taken as “benchmark” and assigned factor value 0.00 (marked with closed symbols). The horizontal lines show the 95% confidence intervals of the estimation of each factor value.

Platform methodology (Fig. 4, panel A).

Most of the results were obtained using dual platform methodologies. Therefore, this strategy was used as benchmark and assigned a factor value of 0 (see Materials and Methods). The use of single-platform methods yielded significantly lower variability for CD4+ and CD8+ T-cell counts, but significantly higher variability for B-cell counts. The variability of CD3+ T cells and NK cells was similar for single and dual-platform methods.

Sample stabilization (Fig. 4, panel B).

Most of the results were obtained using nonstabilized samples, which served as benchmark. Stabilized samples yielded significantly higher variability for CD3+ T-cell, CD4+ T-cell, and NK-cell counts, but significantly lower variability for B-cell counts than nonstabilized samples. The variability observed for CD8+ T cells was similar for the two groups of samples.

Laboratory (as defined by ULN; Fig. 5).

Laboratory 27 was chosen as reference as it had no missing observations. The results of CD4+ T cells by 90 laboratories are shown in Figure 5 (panel A). Four laboratories (i.e., 9, 16, 46, and 62) had factor values ∼0.00, that is, had similar variability in CD4+ T-cell counts as laboratory 27. The variability in CD4+ T-cell counts by 60 laboratories was larger in comparison with the benchmark laboratory (i.e., factor value >0); this variation was significantly larger in 10 (indicated with arrows). This variability was smaller in 25 laboratories, which reached significance in 1 (indicated with arrow). The magnitudes of variation of CD3+ T-cell, CD8+ T-cell, and NK-cell counts were similar to those of CD4+ T cell counts. The B-cell counts, reported by 86 laboratories, stood out by having a different pattern of variation (panel B). Here, four laboratories (i.e., 22, 26, 57, and 66) had variations similar to that of laboratory 27, while the variation was larger (i.e., factor value >0) in 31 laboratories (significantly in 10) and smaller in 50 (significantly in 25). Thus, benchmark laboratory 27 had a relatively large variation in B-cell counts in comparison with the other 85 laboratories.

EQA send-out number (Fig. 6).

The test variations of the five lymphocyte subset counts as a function of send-out number (1-20) is set out in Figure 6. The first send-out was taken as benchmark. In general, the variation for all five subsets was significantly smaller in the subsequent 19 send-outs with a few exceptions, that is, send outs five and nine for CD4+ T-cell counts, and send-outs 13 and 14 for NK-cell counts. The standardization workshop (55 participating laboratories) was held concurrently with send-out 11, to which the remaining laboratories participated (indicated with arrows in Fig. 6). The variability of B-cell counts in the nine “post-workshop” send-outs was significantly lower than in the 11 send-outs that were held prior to or concurrent with the workshop (P = 0.01 using the Kruskal-Wallis test). For the other lymphocyte subsets, differences between test variations before and after the workshop were not significant.

DISCUSSION

Here, we review our 10-year experience with the biannual EQA scheme for lymphocyte subset enumeration in the Benelux countries. We analyzed results from nearly 23,000 assays reported by more than 100 laboratories in the context of methodological information provided in questionnaires with each send-out. The main characteristics of this EQA program were: (i) use of abnormal samples in 69% of cases; (ii) use of fresh samples in 94% of cases; (iii) debriefing with personalized recommendations after each send-out, plus annual plenary meetings in which the results were reviewed and discussed with the participants. The purpose of providing fresh, “pathological” samples was to mimic real-time situations in the daily practice of the clinical laboratory. Half way this program, an educational workshop was organized in which 55 participants were trained in the use of what was considered as “state-of-the-art” method (single-platform method with dual-anchor gating strategy) (29). In the current study, we addressed three aspects: (i) change of usage of methods over time; (ii) the influence of methodological variables on the outcome of lymphocyte subset enumeration (i.e., bias); and (iii) the influence of methodological variables on the variation of lymphocyte subset enumeration. We also analyzed whether or not this variation declined with time.

Our recommendations provided with the debriefings and the plenary meetings reflected international recommendations (9, 36-40, 48). As a result, the use of “CD45-SSC” gating increased at the expense of “CD45-14 backgating,” and “lyse-no-wash” techniques were implemented by the majority of participants instead of “lyse and wash” techniques. In spite of our recommendations, “dual-platform” counting techniques remained the preferred approach by the majority of participants. Discussions at the plenary meetings revealed that single-platform techniques were considered to be costly (increased expenses associated with counting beads) and cumbersome (more complicated requirements associated with “reverse pipetting”) in comparison with dual-platform techniques.

A negative bias for CD4+ and CD8+ T-cell counts, but not for CD3+ T cells, B cells, or NK cells were associated with the use of single-platform counting techniques. Similar observations with respect to CD4+ and CD8+ T-cell counts have been made by Reimann et al. (44). Other studies have identified the lymphocyte count derived from the hematology analyzer as the main source of bias in CD4+ T-cell counts (49). It should be noted that conditions in our EQA scheme were suboptimal for dual-platform techniques. Most samples were nonstabilized and ∼24 h old when tested by the participants, while guidelines require that absolute lymphocyte counts by hematology analyzer be performed within 6 h after venipuncture (36). Our review of used methodologies showed that most laboratories used the leukocyte differential to calculate absolute lymphocyte counts, followed by multiplication with % CD4+ cells (fraction of lymphocytes). However, it is recommended for suboptimal samples to use only the absolute leukocyte count from the hematology analyzer, multiplied by % CD4+ cells (fraction of leukocytes) (50). This usage pattern did not change in spite of our repeated recommendations (not shown). We suggest that overestimation of the % lymphocytes using dual-platform techniques may have contributed to the observed bias in CD4+ and CD8+ T-cell counts. However, the lack of such bias for CD3+ T-cell, B-cell, and NK-cell counts remains unexplained.

The use of MNC separation has been discouraged because of the risk for selective cell loss (51), and has generally been abandoned in our study after the first send-outs. It seems safe to suggest that the bias toward lower CD3+, CD4+, and CD8+ T-cell counts associated with MNC isolation is due to such losses.

Strikingly, stabilization showed a bias toward high CD3+, CD4+, CD8+ T-cell counts as well as high NK-cell counts. This result may be explained by some decay of the fresh blood samples during shipment and storage before testing. The absence of this bias for B-cell counts would be explained by the observation that loss of CD19 intensity occurs on stabilization (30, 31).

As for gating, the “CD45-SSC” strategy stood out by a positive bias for CD3+ and CD4+ T-cell counts relative to the “CD45-14 backgating” strategy. Currently, the view is widely held that “CD45-SSC” gating of lymphocytes is more robust than “CD45-14 backgating” (34, 37, 39); the risk of the latter method is that a too restricted FSC-SSC gate is set on lymphocytes to avoid contamination by monocytes, which then would result in an underestimation of the % CD3+ or CD4+ T cells (52). Furthermore, the “dual-anchor” gating strategy was associated with a negative bias toward NK cells. This method defines a three-dimensional gate using SSC, CD45, and CD3, and has been optimized for single-platform T-cell subset enumeration based on the expression of CD3 by T cells (31). However, a disadvantage of this approach may be the selective loss of CD3 lymphocytes because of differences between CD3+ and CD3 cells in CD45 expression and SSC characteristics. This situation may have contributed to the underestimation of NK-cell counts.

Some parameters also influenced the variability of lymphocyte subset enumeration. The variability of CD4+ and CD8+ T-cell counts was smaller with the use of single-platform assays than when dual-platform assays were used. This observation confirms previous studies (22, 29, 38, 43, 44). Single-platform assays bypass the “denominator issue,” that is, the need to express the percentage of lymphocytes as a proportion of either leukocytes, total nucleated cells, or total events exceeding the FSC threshold. Also, single platform assays avoid the variability arising from hematology analyzers used to enumerate total nucleated cells or leukocytes (49). We do not have an explanation for the larger variation of B-cell enumerations in single-platform assays in our study, but our data were concordant with various other studies (27, 53).

The variability of CD3+ and CD4+ T-cell counts as well as NK-cell counts was larger when stabilized samples were tested, while the variability of B-cell counts was smaller in comparison with nonstabilized samples. In contrast, when using between-laboratory CV as parameter, the variability of CD3+, CD4+, and CD8+ T-cell counts was smaller when stabilized samples were tested as compared with fresh samples (28). In line with the results of a similar comparative study (23), a possible explanation of the larger variability with stabilized samples in our study is that these samples were only a minority of test samples (i.e., five of 80) and that the participants were not familiar with the (slightly) different light scatter characteristics and fluorescence patterns of stabilized specimens. That said, stabilized samples make excellent challenge material to assess the ability of an operator to cope with “difficult” specimen or unfamiliar patterns.

Several individual laboratories stood out by significant positive or negative bias and/or higher or lower variability relative to the benchmark laboratory. As this result was obtained in a multivariate analysis, such bias and variability must have been due to other factors than the categorical variables listed in Table 2. Because of the limitations imposed by the study design—for example, no central review of list mode data has been performed to find explanations for outlying results—it was not possible to trace the causes of large bias or variability for individual laboratories.

Last but not least, we analyzed whether or not the implementation of this EQA scheme was effective in reducing test variation as a function of time. The first send-out was exceptional with a high variability of all subset counts, but especially of the CD3+, CD4+, and CD8+ T cells. Thereafter, test variation declined with occasional exceptions. Test variation of B-cell counts declined after the 2001 workshop, but this exercise was specifically aimed to standardize T-cell subset enumeration (29). Nevertheless, inspection of the variation of CD3+, CD4+, and CD8+ T-cell counts as a function of time reveals a trend toward lower variation (Fig. 6). Therefore, this EQA exercise may indeed have resulted in some reduction of test variability, although it should be realized that the participants standardized their techniques only partially over time. They did so with respect to sample preparation and gating strategies, but not with absolute counting techniques. Only when participants are fully committed to standardize their techniques, such as in the Canadian Clinical Trials Network for HIV/AIDS Therapies (35), can significant reductions in between-laboratory variation of lymphocyte subset enumeration be achieved. In this respect, the relatively low frequency of our EQA surveys (i.e., 6 months between test cycles) should be mentioned. This situation may affect the rate at which the changes are implemented and most likely the level of reduction of test variability. On the other hand, a 10-year period has been evaluated, in which standardization of techniques should have become evident.

While participation to EQA of lymphocyte immunophenotyping has been on a voluntary basis in the Benelux countries, results of EQA exercises are increasingly being used for laboratory accreditation. When results of EQA exercises will have financial consequences—example, the need for accreditation status to claim reimbursements by health care insurance—it becomes imperative that such programs themselves meet rigorous quality demands. Such demands include documentation of the quality of distributed samples and the use of validated procedures for evaluating results. Meeting these demands will be a challenge for EQA of lymphocyte subset enumeration in the immediate future.

Acknowledgements

We thank Dr. David Barnett and colleagues (UK NEQAS for Leucocyte Immunophenotyping, Sheffield, UK) for providing long-term stabilized test specimens. This study was performed under the auspices of the Dutch Foundation for Quality Control of Medical Laboratory Diagnosis (SKML) and the Belgian Association for Analytical Cytometry (BVCA/ABCA) with the participation of (in alphabetical order): F.A.T.J.M. van den Bergh (Hospital “Medisch Spectrum Twente,” Enschede); M.H. Bernier [Institute Bordet, Brussel (B)]; A.C. Bloem (UMC “Eijkman-Winkler instituut,” Utrecht); B.M.E. van Blomberg (VUMC, Amsterdam); M. Boland-Favre [Hospital “La Citadelle,” Liege (B)]; A. Borst (Hospital “Laurentius,” Roermond); X. Bossuyt [Hospital “Sint-Rafaël,” Leuven (B)]; X. Bossuyt [Universital Hospital KU Leuven Gasthuisberg, Leuven (B)]; E. Braakman (Erasmus MC, Rotterdam); T. Braeckeveld [Hospital “Zusters van Barmhartigheid,” Ronse (B)]; T. Bril [Hospital “Algemeen Stedelijk Ziekenhuis,” Aalst (B)]; X. Brohee [Hospital “Andre Vesale,” Montigny le Tilleul (B)]; T. van Buul (Hospital “Slingeland,” Doetinchem); L. van Campen [CRI, Zwijnaarde (B)]; B. Cantinieux [Hospital “St. Pieters,” Brussel (B)]; J.W. Cohen Tervaert (AZM, Maastricht); G. Couwenberg (Hospital “Elisabeth,” Tilburg); S. Darwood [Beckman Coulter, Bedfordshire (UK)]; L. Dewulf [Clinical laboratory, Oostende (B)]; X. Dicato [Center Hospital, Luxemburg (L)]; H. van Dijk (Meander Medical Center-Lichtenberg, Amersfoort); R.B. Dinkelaar (Hospital “Albert Schweitzer,” Dordrecht); R. Drent (Hospital “Maasland,” Sittard); A. Dromelet [University Hospital de Mont Godinne, Yvoir (B)]; B.C.G. Dujardin (Hospital “De Gelderse Vallei,” Ede); A.A.M. Ermens (Hospital “Amphia,” Breda); O. Fagnart [Hospital “Sint Jan,” Brussel (B)]; W. Fibbe (LUMC, Leiden); X. De St.Georges [Hospital “Moliere,” Brussel (B)]; B.A.J. Giesendorf, (Hospital “Koningin Beatrix,” Winterswijk); J.W. Gratama (Erasmus MC-Daniel den Hoed, Rotterdam); A. ten Haaft (AZM, Maastricht); J.L. d'Hautcourt [Hospital “Warquignies,” Boussu (B)]; M. Heckman (AMC, Amsterdam); P. Herbrink (SSDZ, Delft); R.M.J. Hoedemakers (Hospital “Jeroen Bosch,” 's Hertogenbosch); N. Hougardy [Hospital “Sud Luxembourg,” Arlon (L)]; A.J. van Houte (Hospital “Diakonessenhuis,” Utrecht); A.J. van Houten (Hospital “St. Antonius,” Nieuwegein); W. Huisman (MC Haaglanden, Den Haag); P.J. Kabel (Laboratory “Streeklaboratorium voor de volksgezondheid,” Tilburg); KAM (Hospital “Catharina,” Eindhoven); M.F.J. Karthaus (Medial, Haarlem); S. van Keer [Becton Dickinson Biosciences, Erembodegem (B)]; J.C. Kluin-Nelemans (LUMC, Leiden); P.A. Kuiper-Kramer (Hospital “Isala Klinieken,” Zwolle); P.C. Limburg (UMCG, Groningen); B.E.M. v.d. Linden-Schrever (SKION, Den Haag); E. van Lochem (Erasmus MC, Rotterdam); C.L. Lodowika (Alysis Zorggroep, Arnhem); B. Lunter (Twente University, Enschede). R. Malfait [Hospital “Middelheim,” Antwerpen (B)]; A. Martens (Hospital “Twente,” Almelo); A.M. Mazzon [UCL Ecole de Sante Publique, Brussel (B)]; P. Meeus (Hospital “Onze Lieve Vrouwe Kliniek,” Aalst (B); M. de Metz (Hospital “Canisius Wilhelmina,” Nijmegen); X. Meyer-Stillemans [Laboratory “St. Elisabeth,” Namur (B)]; E. Moreau [Hospital “Heilig Hart,” Roeselare (B)]; E. Mul (Sanquin Diagnostics, Amsterdam); [B023] A.B. Mulder (Hospital “Jeroen Bosch,” Den Bosch); W.J. Nooijen (Hospital “Antonie van Leeuwenhoek,” Amsterdam); Th. Out (AMC, Amsterdam); D.S. Park (Hospital “Slotervaart,” Amsterdam); J. Pattinama (Hospital “St. Franciscus Gasthuis,” Rotterdam); J. Phillipé [UZG, Gent (B)]; O. Pradier [Hospital “Erasme,” Brussel (B)]; F.W.M.B. Preijers (UMC St. Radbout, Nijmegen); H.J. Puts (Gelre Hospitals, Apeldoorn); S. Rensink Matter (Hospital St. Lucas Andreas, Amsterdam); F. Reymer (LUMC, Leiden); G.T. Rijkers (Hospital “Wilhelmina kinderziekenhuis,” Utrecht); L.G. Rijks (Hospital “Vlietland,” Vlaardingen); M. Rijpert-van Son (Hospital “Tweesteden,” Tilburg); M. Roos (Sanquin Diagnostics, Amsterdam); K.J. Roozendaal (Hospital “Onze Lieve Vrouwegasthuis,” Amsterdam); J.L. Rummens [Hospital “Virga Jesse,” Hasselt (B)]; L.J.M. Sabbe (Laboratory “Zeeland,” Goes); P. de Schouwer [Hospital “Stuivenberg,” Antwerpen (B)]; W. Slieker (MC Alkmaar, Alkmaar); R.J. Slingerland (Hospital “Isala Klinieken,” Zwolle); J.W. Smit (UMCG, Groningen); M. van Tol (UMCG, Groningen); T.A.M. Trienekens (VieCuri, Venlo); H.L. Vader (Maxima Medical Center-Veldhoven, Veldhoven); L.M.B. Vaessen (Erasmus MC, Rotterdam); A. Vanhouteghem [Hospital “Middelares,” Gent (B)]; W. Veenendaal (Hospital “Leyenburg,” Den Haag); J. Verschaeren [Clinical laboratory, Antwerpen (B)]; M. de Waele (AZ-VUB, Brussel); G. Wallef [Hospital “de Jolimont,” Haine-Saint-Paul (B)]; J.W.J. van Wersch (Hospital “Atrium Heerlen,” Heerlen); F.L.A. Willekens (Alysis Zorggroep, Arnhem); A. Wolthuis (Clinical Laboratory, Leeuwarden).

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.