Background: In oncology, Patterns of Care (PoC) provide a detailed overview of all cancer-treatment interventions and recapitulate the entire patient’s journey. While manual-chart PoC reviewing is time-consuming, the use of Electronic Health Records (EHRs) enables the creation of data-mining solutions to automatically reconstruct the whole cancer trajectory with different granularity.
Methods: We tested the ability of the i2b2 (Informatics for Integrating Biology and Bedside) solution to support the automatic reconstruction of the PoC for consecutive and unselected HER2+ and TNBC breast cancer patients through a retrospective EHRs analysis over a decade of observations.
Results: From 2008 to 2017, 561 HER2+ and 412 TNBC patients were retrospectively identified by i2b2 platform at the Papa Giovanni XXIII Hospital in Bergamo. Most patients, 74.3% in the HER2 group and 71.8% in the TNBC group, received a (neo) adjuvant chemotherapy, with anti-HER2 drugs whenever indicated. Among the HER2 cohort, the 5-year Time to Treatment Change (TTC) and Overall Survival (OS) were 69.4% and 77.4% respectively, with 25% of patients receiving up to 3 lines of treatment in metastatic setting. Among the TNBC cohort, the 5-year TTC and OS were 59.3% and 69.4% respectively, with only 2% of patients receiving active treatment as third line of therapy. The consistency of the automated PoC reconstruction had to be reviewed in 1/3 of cases to clean conflicting data.
Conclusion: The i2b2 solution has the potential to provide a retrospective, automated reconstruction of the different PoC, with limited manual-chart review refinement and might contribute to support investigations in the field of real-world data.
In the contemporary era of precision oncology, national health systems face the challenge of restructuring cancer care delivery to ensure universal access to high-quality care while maintaining cost sustainability. Addressing this challenge necessitates unprecedented levels of coordination and alignment across all cancer diagnosis and treatment interventions to establish appropriate Patterns of Care (PoC) throughout the entire disease trajectory, thereby positively impacting the quality and costs of care [1-3].
The fragmentation of the health information systems poses challenges in obtaining a valuable and comprehensive patient journey, particularly in complex diseases like Breast Cancer (BC). Manual reconstruction of PoC is labor-intensive and time-consuming, often preventing valuable data collection. With the widespread adoption of health Information Technology (IT), such as Electronic Health Records (EHRs), substantial opportunities arise for developing data-mining models to facilitate automated PoC surveys at varying granularity [4-6].
Information derived from diverse sources reflects the multitude of day-to-day clinical decisions made by many physicians, across many patients, over a significant duration of time. The full integration of this information represents a paradigm shift in data management, providing an unprecedented opportunity to accurately recapitulate the entire patient treatment journey [7].
We aim to capitalize on this opportunity using the i2b2 (Informatics for Integrating Biology and Bedside) platform for the automated reconstruction of entire patients' cancer journeys through a retrospective analysis of all available patient data in the archives [8].
The i2b2 tool serves as a scalable informatics framework that organizes and transforms patient-oriented clinical data from different sources in an optimized way for clinical research [9,10]. Essentially, i2b2 functions as a comprehensive data warehouse, enabling free queries and supporting a multidimensional and longitudinal electronic phenotype description of patients [11].
Accordingly, we evaluated the efficacy of the i2b2 solution in providing an automated retrospective reconstruction of PoC for patients affected by Human Epidermal Growth Factor Receptor 2 (HER2) and Triple-Negative Breast Cancer (TNBC) over a decade of observation in a single, large, community hospital. This period captures significant advancements in breast cancer treatment, particularly the introduction and establishment of targeted therapies for HER2-positive breast cancer and evolving strategies for TNBC. Moreover, this period aligns with the hospital's transition to an all-electronic information system, which began in 2007, ensuring comprehensive and consistent data collection throughout the study.
HER2-positive BC is a complex disease characterized by the overexpression of HER2 or by ERBB2/neu gene amplification, with systemic targeted therapies involving different HER2-drug blockades along with chemo or endocrine-therapy to achieve the optimal treatment effect. Robust data from randomized clinical trials (RCTs) in the last decade facilitated the definition of treatment algorithms, supporting optimal strategies in diverse clinical conditions [12-15].
TNBC is an aggressive BC subtype characterized by the lack of Expression of Estrogen (ER), Progesterone (PgR) and HER2. In the absence of these three target biomarkers, cytotoxic chemotherapy remains the standard treatment approach. Considerable efforts have been made in the last decade to better understand the biological behaviour of TNBC, providing valuable information for more effective treatments, including immunotherapy and novel Antibody-Drug Conjugates (ADC) [12,15-17].
The i2b2 platform, established by Harvard University in 2004, is an open-source framework widely adopted in numerous hospitals and universities worldwide, consolidation data from over 250 million patients [18]. Since its inception, the platform has fostered a growing open-source community, attracting users and developers alike and emerged as a standard tool for tasks such as patient cohort identification, clinical trial design, patient recruitment and PoC description.
The primary objective of i2b2 is to construct patient cohorts based on dynamically designed EHRs phenotypes by users. Users can employ a taxonomy of terms conforming to international standards and introduce personalized concepts tailored to specific needs. The platform integrates various data sources, including relational databases, spreadsheets and unstructured text, mapping them onto taxonomy concepts for unrestricted queries. Furthermore, a Natural Language Processing (NLP) pipeline has been incorporated into the i2b2 platform to extract information directly from non-structured medical reports. A comprehensive description of the i2b2 platform was previously reported [19].
Since 2018, the i2b2 data warehouse has been operational at Papa Giovanni XXIII Hospital (HPG23) in Bergamo, Italy. This implementation includes comprehensive cancer patient data gathered from diverse hospital sources, including demographics, oncology-specific EHRs, hospitalization records detailing diagnoses and procedures, pharmacological therapies, laboratory information and pathological anatomy reports. To ensure open access to the data warehouse, the project received full approval from the local independent Institutional Review Board (IRB) and Ethical Committee.
Utilizing the i2b2 platform, patients with a BC diagnosis were identified by the presence of at least one diagnosis of malignant neoplasm of the female breast (identified by ICD9 code 174.xx) during the period 1 September 2007-31 December 2018. Among this group, HER2-positive patients were identified by the presence of at least one of the following criteria: (i) “HER2-positive result”, “HER2 score 2+ (HER2 FISH-positive)”, “HER2 score 3+” in the text of the medical report or (ii) Trastuzumab administration during the period of interest. TNBC were identified by negative expression of ER (< 1%), PgR (< 1%) and HER2 (score 0-1 or score 2+ if FISH-negative). Following the identification of the HER2-positive and TNBC patient cohorts, they were further categorized based on their cancer stage at diagnosis - either early or metastatic - and monitored from their diagnosis until death or the last known contact at HPG23.
Through the i2b2 platform was used to identify all administrations of anti-HER2 treatments and/or cytotoxic therapies received by HER2-positive and TNBC patients from diagnosis until death or the last follow-up contact. This facilitated a comprehensive reconstruction of each patient's treatment history throughout the follow-up period. A transition to an alternative therapy was interpreted as a Time of Treatment Change (TTC), serving as a marker for disease progression or toxicity.
The two outcomes of interest were Time to Treatment Change (TTC) and Overall Survival (OS). Time to 1st line treatment change (in years) was calculated from the diagnosis of HER2-positive/TNBC and expressed as mean and Standard Deviation (SD), and median and Interquartile Range (IQR: 1st quartile, 3rd quartile). Patients who did not experience relapse/progression (with the above definition) were censored at their last follow-up visit. Death from any cause was investigated for all HER2-positive/TNBC patients. The composite outcome “death or treatment change” was defined to estimate TTC. Time to death and time to first event (death/treatment failure, in years) were calculated from HER2-positive/TNBC diagnosis and expressed as mean and SD, and median and IQR (1st quartile, 3rd quartile). Patients alive and who had never experienced a treatment failure were censored at their last follow-up visit.
Descriptive statistics were used to summarize the baseline characteristics of HER2-positive/TNBC identified using the i2b2 platform. Continuous variables were expressed as mean and SD, or as median and IQR, depending on their normal or non-normal distribution. Categorical variables were expressed as absolute counts and percentages. Sankey diagrams were used to represent the pattern of treatments received across (neo) adjuvant therapy and 1st, 2nd and 3rd metastatic lines.
Kaplan-Meier survival curves were reported for the outcomes of interest and the log-rank test was used to test between-group differences. For all tested hypotheses, two-tailed p-values < 0.05 were considered significant. Statistical analysis was performed using Stata Software, release 16 (StataCorp LP, College Station TX, USA).
Over ten years of clinical observations, the i2b2 solution identified 4,763 consecutive and unselected patients diagnosed with malignant BC (Figure 1). In particular, from September 2007 to December 2018, 561 HER2-positive BC patients (13.2%) and 412 TNBC (8.9%) were reported. Patients’ multidimensional and longitudinal clinical histories are represented as one-dimensional sequences of visits, procedures, treatments, diagnoses and outcomes.
The characteristics of HER2-positive BC women are detailed in table 1.
Table 1: Demographic and clinical characteristics of HER2-positive BC and TNBC patients. | ||
HER2 BC (N = 561) | TNBC (n = 412) | |
N (%) | N (%) | |
Year of diagnosis | ||
2007-2010 | 192 (34.2%) | 124 (30.1%) |
2011-2014 | 241 (43.0%) | 139 (33.7%) |
2015-2018 | 128 (22.8%) | 149 (36.2%) |
Age at BC diagnosis, years | ||
Mean ± SD | 57.9 ± 14.1 | 61.7 ± 14.3 |
≤ 35 | 25 (4.5%) | 11 (2.7%) |
(35, 45] | 94 (16.8%) | 51 (12.4%) |
(45, 55] | 144 (25.7%) | 74 (18.0%) |
(55, 65] | 122 (21.7%) | 106 (25.7%) |
(65, 75] | 102 (18.2%) | 93 (22.6%) |
> 75 | 74 (13.2%) | 77 (18.7%) |
Provenance | ||
Bergamo and province | 471 (84.0%) | 350 (85.0%) |
Other Lombardy provinces | 37 (6.6%) | 15 (3.6%) |
Out of Lombardy Region/ Undetected by the system | 53 (9.4%) | 47 (11.4%) |
Type of surgery | ||
Quadrantectomy | 238 (42.4%) | 218 (52.9%) |
Mastectomy | 269 (48.0%) | 190 (46.1%) |
Undetected by the system | 54 (9.6%) | 4 (1.0%) |
Tumor size (T) | ||
T0 | 0 (0.0%) | 9 (2.2%) |
T1 | 253 (45.1%) | 148 (35.9%) |
T2 | 177 (31.6%) | 87 (21.1%) |
T3 | 60 (10.7%) | 50 (12.1%) |
T4 | 8 (1.43%) | 8 (1.9%) |
Tis | 0 (0.0%) | 16 (3.9%) |
Undetected by the system | 63 (11.2%) | 94 (22.8%) |
Lymph Nodes (N) | ||
N0 | 322 (57.4%) | 189 (45.9%) |
N1 | 79 (14.1%) | 57 (13.8%) |
N2 | 51 (9.1%) | 43 (10.4%) |
N3 | 68 (12.1%) | 70 (17.0%) |
Nx* | 13 (2.3%) | 12 (2.9%) |
Undetected by the system | 28 (5.0%) | 41 (10.0%) |
Grade (G) | ||
G1 | 7 (1.25%) | 6 (1.5%) |
G2 | 107 (19.1%) | 34 (8.3%) |
G3 | 355 (63.3%) | 303 (73.5%) |
Undetected by the system | 92 (16.4%) | 69 (16.7%) |
Estrogen Receptor (ER) | ||
Negative (= 0%) | 201 (35.8%) | - |
Positive (> 0%) | 356 (63.5%) | - |
Undetected by the system | 4 (0.7%) | - |
Cellular proliferation (Ki-67) | ||
< 20% | 116 (20.7%) | 47 (11.4%) |
≥ 20% | 440 (78.4%) | 342 (83.0%) |
Undetected by the system | 5 (0.9%) | 23 (5.6%) |
*Nx: not evaluable by histopathological analysis. Abbreviations: BC: Breast Cancer; HER2: Human Epidermal Growth Factor Receptor 2; TNBC: Triple-Negative Breast Cancer. |
Among the 561 HER2-positive BC patients, 531 (94.7%) had a diagnosis of early-stage BC (eBC) while 30 (5.3%) were diagnosed with de novo metastatic BC (mBC). Post-diagnosis, these patients were monitored through the i2b2 solution, with a median follow-up of 4 years (IQR 2-6). Among the HER2-positive BC patients, 144 patients (25.7%) did not receive any anti-HER2 treatments following diagnosis, mainly because of limited disease extension (pT1a/b), clinical judgment (elderly patients > 75 years) and/or patient’s preferences. Based on data integrated into the i2b2 model, the entire cancer treatment journey has been automatically reconstructed with a full description of the prevalent and alternative PoC, as reported in figure 2 (box A) and in figure S1 (box A-B). Among the HER2-positive eBC receiving (neo) adjuvant chemotherapy, the majority (97.7%) were treated with an anthracycline/taxane-based regimen and trastuzumab, while lapatinib was administered in 2.3% of cases. Overall, 162 (29.9%) events (treatment failure or death) occurred during the course of the disease in the entire cohort of HER2-positive BC patients. The 5- and 10-year TTC for HER2-positive BC was 69.4% (65.0%-73.4%) and 62.6% (57.0%-67.7%), respectively, while the 5 and 10-year OS for HER2-positive BC was 77.4% (73.4%-81.1%) and 65.4% (59.6%-70.6%), respectively. Stratifying the population into 2 cohorts, according to the first and second 5-year period of treatment, we observed a 3-year TTC statistically significant advantage for patients treated in the more recent years as compared to the previous ones, namely 86.2% vs. 71.9% (Figures 3,4).
The characteristics of TNBC women are detailed in table 1.
Among the 412 TNBC patients, 395 (95.8%) had a diagnosis of eBC while 17 (4.2%) were diagnosed with de novo mBC. Through the i2b2 platform, we were able to survey patients, with a median follow-up of 3.8 years (IQR 1.5-7.1). Among the 412 TNBC patients, 116 patients (28.2%), did not receive any chemotherapy during follow-up, mainly because of limited disease extension (pT1a/b), clinical judgment (elderly patients > 75 years) and/or patient’s preferences. Based on data integrated into the i2b2-driven model, the entire cancer treatment journey has been automated reconstructed with a full description of the prevalent and alternative PoC, as reported in figure 2 (box B) and in figure S1 (box C-D). Among the TNBC exposed to (neo) adjuvant chemotherapy, the majority (83.9%) received an anthracycline/taxane-based regimen while cyclophosphamide, Methotrexate and Fluorouracil (CMF) in 16.1% of cases. During the course of the disease in the 395 early TNBC patients, 150 events (treatment failure or death) occurred with 111 (28.1%) deaths. Among the 17 de novo metastatic TNBC patients, 15 (88.2%) women failed treatment and 14 (82.4%) eventually died. The 5- and 10-year TTC-free survival and OS for TNBC were 59.3% (53.9%-64.3%) and 46.2% (39.0%-53.2%), and 69.4% (64.1%-74.1%) and 58.2% (51.2%-64.6%), respectively.
Stratifying the population into 2 cohorts according to the first and second 5-years period of treatment, we observed a 3-year TTC statistically significant advantage for patients treated in the more recent years as compared to the previous ones, namely 72.6% vs. 64.3% (Figures 3,4).
To increase the overall accuracy of the PoC automatic reconstruction we performed a manual chart review of any conflicting data as recognized in 283 out of 973 (29%) cases, hence we were able to add valuable information in 1/3 of them (n = 92), eventually with some recalibration of the individual PoC description.
In this study, we assessed the capability of the i2b2 platform solution to automatically survey the PoC of HER2-positive BC and TNBC through a decade-long retrospective clinical history analysis.
Despite the primary technological objective, the study revealed noteworthy clinical insights.
Actually, in the majority of patients, the reconstruction of the whole cancer-treatment history revealed a good adherence to the actual clinical practice guidelines, even though diversions were observed among specific subgroups, including elderly patients, those with limited disease and those with comorbidities. The median OS observed in HER2-positive and TNBC cohorts of patients aligns with expectations from existing literature at the time of diagnosis. This is evident for both HER2-positive and TNBC, in which patients’ outcomes improved over time (first vs. second 5-year period) due to the advent of innovative and more effective treatments (Figure 4) [20-22]. In our experience, in the first 5-year period, the prevalent PoC of HER2-positive mBC patients consisted of a sequence of HER2 single-blockade regimens (mainly trastuzumab-based), while in the second 5-year period the use of HER2 dual-blockade (trastuzumab and pertuzumab) and/or T-DM1 were implemented (Figures S1,a,b) with some clinical advantages (Figure 4). Similarly, the evidence derived from the introduction of platinum derivatives for the management of TNBC [19] explains the changes observed in the PoC over time (Figure S1,c,d), namely the benefit reported in the second 5-year treatment period, when the use of platinum increased (Figure 4). As regards the proportion of mBC patients progressing to subsequent lines of therapy, we observed a significant drop in second- and third-lines treatment, even more pronounced in the TNBC cohort compared to HER2-positive (second-line’s attrition rate: 75.3% vs. 38.1%, respectively). This evidence, related to the specific attitude of the treating physicians, is in line with previous reports [23] and confirms the well-established good clinical practice of anticipating the optimal treatment option whenever possible.
The data obtained from the PoC survey informs about the optimal treatment sequencing and the overall value of different therapeutic interventions, something that is not otherwise captured in RCTs. Moreover, clinicians can utilize the PoC survey to compare local treatment attitudes and the adherence to the national/international practice guidelines, identifying unintended care diversions in specific subgroups of patients and the evolution of the health care management over time.
Since too often the manual-chart PoC reconstruction is a frustrating time-consuming effort with relevant obstacles [24] the automated PoC survey with the integration of all available patients’ data represents an unprecedented opportunity [25,26]. Herein we demonstrate the BC trajectories can be properly reconstructed with an i2b2 solution, using available archival EHRs with longitudinal temporal mining algorithm and appropriate levels of granularity.
While the automated reconstruction of PoC using the i2b2 solution brings significant benefits, there are acknowledged limitations to this approach. In particular, the accuracy and completeness of archival data are crucial for developing insightful practice-based PoC. As patients’ information is automatically captured from EHRs, dedicated time and efforts are requested for a manual chart review to clean conflicting data, mainly from unstructured reports. Furthermore, some relevant information usually reported in plain natural language, including toxicity and disease progression, may lead to gaps or inaccuracies in the data collection. Enhancing the NLP capabilities is essential for refining the data mining process and enriching the dataset. In addition, to derive more robust information from different data sources, selected proxies have been privileged over classical parameters, such as the case of TTC instead of event-free survival/progression-free survival, with some potential misleading interpretation. Tracking chronic comorbidities like diabetes or hypertension poses another challenge, as these are typically recorded only at initial visits or significant health changes, making consistent documentation difficult. The same applies to non-oncologic medications, which may not be consistently updated in EHRs unless there is a change in prescription, thereby overlooking interim adjustments. Ultimately, identifying prevalent patterns within patients' treatment data often leads to the oversight of rare occurrences, particularly those recorded in an unstructured manner. Therefore, enhancing the model’s capacity to systematically include more unstructured data along with improving the scalability of the platform to handle a broader spectrum of relevant information (i.e., causes and complications) are crucial areas for future research and development. Future studies might also benefit from incorporating advanced data processing techniques, such as Blind Source Separation (BSS) used in range ambiguity suppression and echo separation in space-time waveform-encoding Synthetic Aperture Radar (SAR), to further enhance the precision of data integration and analysis [27-29].
In this study, we demonstrated the i2b2 solution is able to provide an automated reconstruction of PoC in an unselected and consecutive HER2-positive and TNBC patients, over 10 years of observations, with a manual-chart refinement in about 1/3 of cases. The consistency of the automated PoC survey is primarily related to the extension and accuracy of the patient’s information collected in the hospital information system and secondly to the performance of the modern IT solution in a multidimensional, longitudinal, data-mining processing. Indeed, the exportable i2b2-solution has the potential to support investigations in the field of real-world data and outcomes research
AZ: conceptualization, methodology, investigation, data collection and analysis, writing-original draft, review and editing, visualization, supervision. AG: participation in writing the original draft, methodology, data-collection and analysis, review, editing and visualization, LC: methodology, data-collection and analysis, review and editing; FJ: methodology, data-collection and analysis, review and editing; MB: methodology, data-collection and analysis, review and editing; NB: methodology, data-collection and analysis, review and editing; AM: methodology, data-collection and analysis, review and editing; SDA: methodology, data-collection and analysis, review and editing; VF: data-collection and analysis, review and editing; RB: methodology, data-collection and analysis, review and editing; CT: methodology, data-collection and analysis, review, editing and supervision.
All authors have read and agreed to the published version of the manuscript.
This work was in part funded by Roche S.p.A with a dedicated research grant.
The authors have no known competing financial interests or personal relationships that could have appeared to influence the work reported in the manuscript.
Aggregated data available by request. Patient-level data will not be shared.
SignUp to our
Content alerts.
Are you the author of a recent Preprint? We invite you to submit your manuscript for peer-reviewed publication in our open access journal.
Benefit from fast review, global visibility, and exclusive APC discounts.