Determining clinical course of diffuse large B-cell lymphoma using targeted transcriptome and machine learning algorithms

New paper in Nature Blood Cancer


Multiple studies have demonstrated that diffuse large B-cell lymphoma (DLBCL) can be divided into subgroups based on their biology; however, these biological subgroups overlap clinically. Using machine learning, we developed an approach to stratify patients with DLBCL into four subgroups based on survival characteristics. This approach uses data from the targeted transcriptome to predict these survival subgroups. Using the expression levels of 180 genes, our model reliably predicted the four survival subgroups and was validated using independent groups of patients. Multivariate analysis showed that this patient stratification strategy encompasses various biological characteristics of DLBCL, and only TP53 mutations remained an independent prognostic biomarker. This novel approach for stratifying patients with DLBCL, based on the clinical outcome of rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone therapy, can be used to identify patients who may not respond well to these types of therapy, but would otherwise benefit from alternative therapy and clinical trials.


Diffuse large B-cell lymphoma (DLBCL) is the most common subtype of lymphoma. However, this disease is heterogeneous [1,2,3,4], i.e., its outcome and course may vary significantly between patients [1]. More than 60% of patients with DLBCL can be cured with rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone (R-CHOP) treatment [1]. Multiple new combinations of therapeutic strategies, including cell therapy, are being tested to improve survival, especially in patients who may not respond to the standard cyclophosphamide, doxorubicin, vincristine, and prednisone therapy [5]. Considering the known heterogeneity of DLBCL, a single therapeutic approach is unlikely to work with all patients with DLBCL [1]. Therefore, multiple approaches have been used to subclassify DLBCL into various subgroups based on biological characteristics. The earliest subclassification was based on expression profiling using microarrays [6,7,8,9]. This classification divides DLBCL into two major groups, namely germinal center B-cell-like (GCB) and activated B-cell-like (ABC) DLBCL, based on the cell of origin (COO). In this classification, 15% of DLBCL cases were classified into the other group. Based on subsequent refining of this classification, the GenClass algorithm was developed. In this algorithm, genetic abnormalities are divided into four groups: MYD88 and CD79B mutations (MCD), BCL6 fusions and NOTCH2 mutations (BN2), NOTCH1 mutations (N1), and EZH2 mutations and BCL2 translocations (EZB); nevertheless, this algorithm can classify only 54% of DLBCL cases. To cover more cases, this algorithm was later extended as the LymphGen algorithm which divides genetic abnormalities into seven groups: MCD, N1, and BN2, as in the GenClass algorithm; MYC-negative and MYC-positive EZB; TP53 abnormality (A53) and mutations in TET2, P2RY8, or GSK1 (ST2) [6].

Using mutation profiling and chromosomal structural abnormalities (chromosomal gains and losses), Chapuy et al. classified DLBCL into five subgroups [9]. Recent FISH tests (double or triple hit) demonstrated that the rearrangement of MYC (Avian Myelocytomatosis Viral Oncogene Homolog) when co-present with BCL2, BCL6, or both leads to a significantly more aggressive DLBCL, making R-CHOP ineffective [1011].

While existing strategies for the subclassification of DLBCLs can distinguish biologically distinct subgroups of DLBCLs, they cannot effectively predict the overall survival or progression-free survival and their distinction performance is not satisfactory [1]. Furthermore, the clinical implementation of these classifications in routine laboratory testing is complicated by the need for performing whole-exome sequencing.

We rationalized that chromosomal structural analysis and mutation profiling eventually lead to changes in RNA profiling and activation or suppression of various pathways through relative RNA changes; thus, the RNA-based classification of DLBCL might be more practical. RNA quantification by next-generation sequencing (NGS) has numerous advantages over quantification methods based on microarrays and hybridization. RNA quantification by NGS is more specific and reproducible and can be performed reliably on formalin-fixed paraffin-embedded (FFPE) tissue. Furthermore, targeted RNA sequencing has the potential to be used in clinical testing because it is easier to manage and more cost-effective as a routine clinical test than traditional methods.

In this study, we developed a DLBCL classification strategy for predicting clinical outcomes using targeted RNA sequencing combined with machine learning algorithms. The developed strategy classifies patients with DLBCL into subgroups based on the clinical course of their disease. To focus on survival, we first used machine learning and divided the patients into subgroups based on their overall survival. We used modified Bayesian statistics to select genes that can predict various survival groups, and then validated these biomarkers using an independent set of cases.

Supplementary information

Supplemental file


Subscribe to our Newsletter

Get all the critical news and events sent straight to your inbox

Share this post with your friends