- Research
- Open access
- Published:
Patient-oriented unsupervised learning to uncover the patterns of multimorbidity associated with stroke using primary care electronic health records
BMC Primary Care volume 25, Article number: 419 (2024)
Abstract
Background
We aimed to identify and characterise the longitudinal patterns of multimorbidity associated with stroke.
Methods
We used an unsupervised patient-oriented clustering approach to analyse primary care electronic health records (EHR) of 30 common long-term conditions (LTC) in patients with stroke aged over 18, registered in 41 general practices in south London between 2005 and 2021.
Results
Of 849,968 registered patients, 9,847 (1.16%) had a record of stroke and 46.5% were female. The median age at record of stroke was 65.0 year (IQR: 51.5-77.0) and the median number of LTCs in addition to stroke was 3 (IQR: 2-5). We identified eight clusters of multimorbidity with contrasted socio-demographic characteristics (age, gender, and ethnicity) and risk factors. Beside a core of 3 clusters associated with conventional stroke risk-factors, minor clusters exhibited less common combinations of LTCs including mental health conditions, asthma, osteoarthritis and sickle cell anaemia. Importantly, complex profiles combining mental health conditions, infectious diseases and substance dependency emerged.
Conclusion
This novel longitudinal and patient-oriented perspective on multimorbidity addresses existing gaps in mapping the patterns of stroke-associated multimorbidity not only in terms of LTCs, but also socio-demographic characteristics, and suggests potential for more efficient and patient-oriented healthcare models.
Introduction
Despite recent attention to the topic, current knowledge on multimorbidity in stroke remains limited [1,2,3,4]. Multimorbidity is commonly defined as the co-occurrence of two or more long-term conditions (LTCs) [5, 6]. In the context of the rising prevalence of LTCs within an ageing population [7], most patients with stroke present at least one or more additional LTC [1, 3]. These factors challenge primary healthcare services, as multimorbidity increases patients complexity which may result in poorer short- and long-term outcome after stroke [2, 8] and increased costs of care [3, 6, 9].
To date, the analysis of stroke related multimorbidity has been based on classical epidemiological approaches such as case-control studies, focusing on the association of potential stroke risk factors with stroke status [1, 3, 10, 11] or cross-sectional studies assessing the impact of different indices of multimorbidity, such as the Charlson Comorbidity Index scores [8] or the simple count of LTCs on stroke outcomes [1, 5, 12].
These approaches, however, remain non-specific, and beyond the necessary characterisation of risk factors and long-term conditions associated with stroke [1, 2, 8], the need arise for a better understanding of multimorbidity associated with stroke. This includes patients stratification according to the main patterns of multimorbidity, the longitudinal description of defined clusters, including their relative size, associated socio-demographic characteristics and risk factors [3, 13].
Other methods used in the general setting to analyse trajectories of multimorbidity relies on two main approaches [14]. The first involves the count of long-term conditions at regular intervals over time. This often includes creating composite scores or indices that reflect the accumulation of multiple long-term conditions in patients. This approach typically involves advanced statistical techniques such as multilevel modelling and growth curve analysis. The second approach focuses on modelling transitions between a necessary limited number of long-term conditions.
The life course setting represents an alternative framework to meet this objective [15] using primary care electronic health records (EHRs) and unsupervised learning methods [16]. Unlike clustering approaches used in cross-sectional studies [17] which result in non-specific descriptions of recurrent LTCs observed in analysed cohorts, or longitudinal non specific methods based on indices these approaches allow to use longitudinal EHRs to define individual patient health trajectories using all available patient health history [18]. Resulting patient individual health trajectories are further analysed as input in an unsupervised clustering procedure to identify sub-populations of patients (clusters) characterised by distinct life course health trajectories. In turn, the longitudinal description of the main trajectories in terms of patterns of multimorbidity allows a better understanding of patients experience and complexity and addresses evidence-gaps toward patient-oriented approaches of care delivery [3, 4, 9, 13, 19].
In this study, we analysed longitudinal primary care EHRs related to up to 30 LTCs, including stroke, and conventional stroke risk factors such as hypertension and diabetes, in 9,847 patients with stoke using a patient-oriented unsupervised learning approach. Our objective was to identify and characterise the main patterns of multimorbidity associated with stroke.
Patients and method
Primary care data
We utilised EHRs of 30 LTCs in adult patients aged over 18 and registered in 41 general practices in south London between April 2005 and April 2021. EHRs consisted of anonymized, patient-level data from routinely collected ’real-world’ records, encoded using Read codes and SNOMED Clinical Terms (CT), and managed within the Egton Medical Information Systems (EMIS). This dataset included the dates of diagnosis for selected LTCs, coded according to the Quality and Outcomes Framework (QOF), based on QOF38 definitions. The anonymized, preprocessed EHR data was provided by Lambeth Datanet following project approval by the Lambeth Clinical Commissioning Group.
LTCs where selected as per the consensus definition for multimorbidity of the Academy of Medical Sciences [4] and a definition of multimorbidity proposed by Hafezparast et al. [20] which aimed to define multimorbidity for urban, multi-ethnic, deprived and young age communities. Accordingly, selected LTCs refer to either physical non-communicable diseases of long duration such as a cardiovascular disease or cancer, mental health conditions such as mood disorder or dementia, or infectious diseases such as HIV/AIDS or viral hepatitis.
Resulting LTCs are listed in Table 1. This list includes, conventional stroke risk factors such as coronary heart disease, hypertension, and diabetes, but also LTCs a priori less related or non-directly related to stroke such as cancers, chronic kidney disease (CKD), asthma, and chronic obstructive pulmonary disease (COPD). Patient’s EHRs included the date at which any of the considered LTCs were first ever recorded. All patients with a record of stroke were deemed eligible for this study, regardless their number of recorded LTCs.
Associated socio-demographic variables were gender (self ascribed), age at record of stroke, ethnicity (Asian ethnicity, Black ethnicity, mixed ethnicity, other/unknown ethnicity, and White ethnicity), quintile of the locally calculated index of multiple deprivation (IMD) 2019, and end of follow-up status (censoring or death). Censoring referred to administrative censoring for patients who were still under follow-up at the censoring point. It also applied to patients who had left (alive) the network of Lambeth general practices before the censoring point.
Other variables/risk factors were smoking ever status, alcohol consumption (over 14 units of alcohol per week), substance use/dependency, chronic pain, hypercholesterolaemia (total cholesterol over 5.0 mmol/L) and morbid obesity.
Clustering analysis
Individual patients’ records were analysed using a patient-oriented clustering approach based on the Ward’s minimum variance criterion. This involves the three following steps [16, 21, 22]: i) computation of pairwise distances between patients, ii) computation of the hierarchical clustering, and iii) determination of the size of the typology (Fig. 1).
In order to account for the multiplicity of time-to-event censored endpoints in patients with multimorbidity, individual patient records were converted into state matrices prior to pairwise distances computations. These matrices stack patients’ state indicators across all analyzed LTCs. For instance, if a patient is diagnosed with hypertension at age 60, their state indicator for hypertension would be 0 from age 0 to 59, change to 1 from age 60 until the patient’s death or censoring, and would remain undefined thereafter. Therefore, one patient state matrix is formed by as many rows as the number of analyzed LTCs, and ranges from age 0 to the most advance age at end of follow up in the cohort. Simplified examples of states matrices are displayed in supplementary figure S2. Given patients’ state matrices, patients’ pairwise dissimilarities were computed elementwise using the Jaccard distance [23]. Also referred to as the binary metric, the Jaccard distance is a measure of dissimilarity between binary outcomes, such as state indicators. It is defined as one minus the Jaccard index, which is conversely a measure of similarity between sets. The Jaccard index between two indicators is computed as the number of positive matching units of time over the number of units either indicator has been positive and takes therefore values between 0 and 1. The Jaccard index offers a meaningful epidemiological interpretation: it quantifies, for two patients with a given long-term condition, the number of years these patients were in differing health states throughout the period when either patient had a recorded history of the disease. Consequently, the longer two patients remained in the same state of health, the lower the Jaccard index, increasing the likelihood of their co-clustering, a result that aligns with expectations. To provide a clearer understanding of how patients’ pairwise distances are computed using state matrices and the Jaccard distance, detailed examples are available in the supplementary materials. To better illustrate patients pairwise distance computation using state matrices and the Jaccard distance, detailed examples are provided in supplementary materials.
After computation of the Ward’s hierarchical clustering, the point-biserial correlation [21] was used to determine the optimal size of the typology within a convenient and workable range (from two to 20 clusters). For a given partition size, this indicator reflects the correlation between the original distance matrix (the distance matrix actually used to compute the hierarchical clustering), and a twin binary matrix in which entries are set to zero for any pair of patients belonging the the same cluster (given the tested partition), and one otherwise. Therefore, the higher the point-biserial correlation, the closer the actual distance matrix to its binary twin matrix, and the better the tested partition reflects the underlying structure of the data. A local maximum of this coefficient across the range of tested partition size indicates an optimal partition size according to this criterion. Bootstrap was used to evaluate the variability of the point-biserial correlation for a range typology size and for different subset size.
Statistical analysis
LTCs, socio-demographic variables and other variables, were displayed as frequencies and percentage or median and interquartile range as appropriate. Associations between variables and clusters were tested using the Fisher exact test for categorical variables or the Kruskal-Wallis test for numeric variables (Tables 2 and 3). Multivariate associations between clusters and socio-demographic variables, stroke risk factors, and LTCs were estimated using logistic regressions where cluster indicators were explained by tested variables (Figs. 2 and 3).
Heatmap of log-odds ratio of socio-demographic variables and conventional stroke risk factors (upper panel) and long-term conditions (lower panel) associated with defined clusters: in both panels, values are derived from multivariate logistic regressions where cluster indicators are explained by displayed variables. Positive log-odds ratios (red versus blue) indicate over representation of the corresponding population traits in a given cluster as compared to its conditional expectation. For better readability, coefficients associated with a significance level above 5% were not displayed
Longitudinal representation of the main sequences characterising clusters of the typology: The main sequences between long-term conditions within clusters are represented by directed acyclic graphs (DAG) where nodes and edges represent long-term conditions and transition between them respectively. To get a better insight into sequences temporality, nodes are displayed according to median age at onset of the corresponding long-term condition, on the x-axis
All computations were carried out using the R language and environment for statistical computing (version 4.3.0 (2023-04-21))[24].
Results
Patients characteristics and Long-term conditions
Of 849,968 registered patients, 9,847 (1.16%) had a record of stroke. Table 2 displays patients socio-demographic characteristics across clusters. Briefly, the median age at record of stroke was 65.0 years, 46.5% of patients were female, 50.8% were from White ethnicity and 40.9% had an index of multiple deprivation (IMD) in quintile 1-2 (most deprived) (vs quintile 3-5 (less deprived)). Likewise, distribution of LTCs across clusters is displayed in Table 3, hypertension (64.4%) and diabetes (30.5%) being the most frequently recorded LTCs. Other frequent LTCs were: osteoarthritis (26.2%), depression (23.8%), chronic kidney disease (23.3%), anxiety (20.2%), atrial fibrillation (17.0%), coronary heart disease (16.8%), transient ischemic attack (15.8%) and cancer (15.6%) (Table 2). Supplementary figure S3 displays the scaled densities of patients’ age at record of analysed LTCs. In addition to stroke, the median number of LTCs was 3 (interquartile range: from 2 to 5) (Table 3 and supplementary figure S4-A).
Clustering
After computation of patients pairwise dissimilarities, hierarchical clustering, and the point-biserial correlation, a local maximum of this coefficient indicated an optimal partition size of eight. Bootstrap (n=1000), performed on various sample size of the patient population (n=2500, 5000, 7500 and 9847) confirmed a partition of 8 clusters, although partition of size 7 or 11 clusters presented also overall high scores (supplementary figure S3). Accordingly, clusters were ordered by decreasing frequency and numbered from 1 to 8 (supplementary figure S4-B, and Tables 2 and 3).
Clusters characteristics and graphical representations
Table 2 displays the distribution of socio-demographic characteristics, risk factors, and the median number of medication across clusters, and Table 3 displays the distribution of LTCs across clusters. In analogy with supplementary figure S2, supplementary figure S3 displays the scaled densities of patients’ age at record of analysed LTCs across clusters.
Figure 2 displays the multivariate log-odds ratios of socio-demographic characteristics and risk factors (upper panel) and LTCs (lower panel) across clusters, relatively to all other patients in the cohort. Finally, Fig. 3 illustrates the backbone of the clusters by longitudinally mapping the main sequences of long-term conditions associated with stroke across different clusters. Associated sequence characteristics are displayed in Table 4.
Clusters examination
Using Tables 2 and 3, and Figs. 2 and 3, defined clusters can be characterised as follow:
-
Cluster 1 (n=2392, 24.3%) is the major cluster of the typology, accounting for almost a quarter of patients. This cluster displays average socio-demographic characteristics and a high prevalence of substances dependency (including smoking and alcohol), hypercholesterolaemia and a higher number of LTCs in addition to stroke (median: 4), including cancer, infectious and inflammatory diseases, mental health conditions, and diseases of the respiratory system.
-
Cluster 2 (n=1814, 18.4%) is characterised by the oldest median age at record of stroke (81.2 years) and under-representation of Asian and Black ethnicity. In contrast to cluster 1, this cluster exhibits a reduced prevalence of alcohol and substance dependency, and hypercholesterolaemia while displaying LTCs characteristic of advanced age, such as cancer, atrial fibrillation, dementia, Parkinson’s, and osteoporosis. Additionally, COPD and CKD 3-5 are also prevalent in this cluster, whereas infectious and inflammatory diseases, mental health conditions, liver disease, and osteoarthritis are less represented.
-
Cluster 3 (n=1782, 18.1%) is defined by an older median age at record of stroke (70.7 years), non-White ethnicity, and a higher proportion of male patients. This cluster presents also a high prevalence of hypercholesterolaemia. LTCs associated with this cluster include cancer, cardiovascular conditions, diabetes, dementia, and CKD 3-5. Similarly to cluster 2, cluster 3 has lower prevalence of infectious and inflammatory diseases, mental health conditions, liver disease, and osteoarthritis.
-
Clusters 4 (n=1409, 14.3%) and 3 share similar socio-demographic characteristics. However, records of stroke occur earlier in cluster 4, at the median age of 55 years, compared to 70.7 years in cluster 3. Cluster 4 presents also the highest prevalence of hypertension (92.3%) and diabetes (59.0%) and a higher proportion of patients from Asian, Black and mixed ethnicity.
-
Cluster 5 (n=845, 8.6%) is characterized by a younger median age at record of stroke (47 years), with an over-representation of female patients and patients from White ethnicity. This cluster displays also a higher prevalence of substances dependency (including smoking and alcohol) and chronic pain. Principal LTCs present in this cluster include depression, anxiety disorders, asthma, and liver disease.
-
Cluster 6 (n=685, 6.7%) displays the second oldest median age at record of stroke (78.1 years) a slightly over representation of female patients. This cluster is also characterized by the highest prevalence of chronic pain and LTCs associated with older age, such as atrial fibrillation and dementia. Nearly all patients in this cluster were diagnosed with osteoarthritis.
-
Cluster 7 (n=563, 5.7%) displays the youngest median age at record of stroke (26.8 years), average socio-demographic characteristics, and the lowest median number of LTCs. The majority of patients diagnosed with sickle cell anaemia belong to this cluster.
-
Cluster 8 (n=357, 3.6%) presents similar socio-demographic characteristics to cluster 7, but with a later median age at record of stroke (44.4 years) and a high prevalence of hypercholesterolaemia.
Clusters 1 and 6 present the highest level of multimorbidity with a median of 4 recorded LTCs whereas clusters 2 to 5 present intermediate levels with a median of 3 recorded LTCs. One the other hand, clusters 7 and 8 present low profiles of multimorbidity, their median number of recorded LTC being zero (i.e., apart from stroke, no other LTCs were recorded for at least half of patients in these clusters) (Table 2).
Discussion
These results can be interpreted in various ways such as levels of patient complexity (clusters 1, 5 and 6), levels of conventional stroke risk factors (cluster 2-4) or levels of multimorbidity (clusters 1 and 6 vs. 7 and 8).
Multimorbidity and patient complexity (clusters 1, 5 and 6)
Patient complexity is defined as a self-reinforced dynamic state in which the personal, social and clinical aspects of a patient experience interact [19, 25]. Multimorbidity impacts patient complexity [3, 9] in various ways, such as the direct impact of LTCs on patient functional capacity (e.g., the limitation in usual activities due to osteoarthritis symptoms [26] or mental health disorders [27]) and the burden of treatment [28]. If multimorbidity directly increases patient workload (burden of treatment), it may also differentially affect and erode patient capacity, according to specific patterns of multimorbidity (burden of illness).
Clusters 1, 5 and 6, which display LTCs associated with high burden of illness, such as mental health disorders (clusters 1 and 5), and osteoarthritis (cluster 6) can be regarded as complex clusters. Especially, cluster 1 which displays a combination of mental health conditions, substances dependency and infectious diseases can be seen as even more complex than cluster 5, which lacks the infectious diseases pattern. Similarly, while cluster 6 displays the highest level of multimorbidity and level of medication, it can be seen as less complex than clusters 1 and 5, as it lacks the complexity signature formed by the combination substance dependency-mental health conditions.
In addition, the socio-demographic and risk-factors characteristics of these clusters allows to identify other potential factors of complexity, such as the higher prevalence of substance dependency observed in clusters 1 and 5. These behavioural/lifestyle characteristics can be viewed not only as direct stroke risk factors [29, 30], but also as additional factors of complexity, as they represent also major risk factors to various other LTCs [31] including traumatic injuries [32, 33], HIV/AIDS [34], hepatitis C [35], mental health conditions [36] and cardiovascular diseases [37].
In turn, these behavioural/lifestyle aspects sheds further light on the specific patterns of multimorbidity observed in associated clusters, notably the higher prevalence of HIV/AIDS, viral hepatitis and serious mental illness in cluster 1, and depression and anxiety disorder in cluster 5. Likewise, the higher prevalence of chronic pain observed in cluster 5 and 6 can be apprehended as a factor of complexity associated with depression, anxiety disorders and morbid obesity [38] in cluster 5 and osteoarthritis [39] in cluster 6.
Of note, dementia, prevalent in clusters 2, 3 and 6, is a condition marked by significant cognitive and functional impairments [40] (i.e. a high burden of illness). This condition is also recognized as a major causes of dependency post-stroke [41]. Dependency, however, refers implicitly to a patient’s relational network and its ability to interact with the healthcare system in order to cope with the patient’s workload [25] which prevents further patient complexity. This ability to engage one’s social network, which also depends on the patient’s socioeconomic environment, should be borne in mind when interpreting the higher prevalence of LTCs associated with both high levels of illness and treatment, such as dementia or cancer (clusters 1-3).
Conventional stroke risk factors, age and ethnicity (clusters 2 to 4)
Clusters 2 to 4 form the core clusters within the typology, accounting for over half of analysed patients. Notably these clusters are characterized by a higher prevalence of conventional stroke risk factors. The opposite trends observed from clusters 2 to 4 in the median age at record of stroke and the prevalence of hypertension, diabetes, and hypercholesterolaemia, in parallel with higher representation of patients from Asian, Black, and mixed ethnicity is a marking trait of the typology (Tables 2 and 3, and Figs. 2, 3 and S4). These trends highlight the importance of these conventional stroke risk factors in a lifespan, earlier records of stroke being observed in clusters characterised by earlier records, and higher prevalence of these conventional stroke risk factors. Of note, the parallel increase in the proportion of patients from non-White ethnicity in these clusters not only suggests an ethnic susceptibility to these risk factors, but also an ethnic vulnerability to stoke given these risk factors [42]. Importantly, this highlights also the opportunity for age and culturally tailored lifestyle interventions for the prevention and management of these conventional stroke risk factors [43].
Low levels of multimorbidity (clusters 7 and 8)
Sickle cell anaemia, characterising cluster 7, is the most common and severe form of sickle cell disease and is particularly prevalent in people of African, Middle Eastern and Indian descent [44]. Despite available strategies to prevent (and manage) stroke in sickle cell anaemia patients [45], its appearance as a marking trait of cluster 7 highlights its significance as a stroke risk factor in younger patients [46]. A low level of multimorbidity is also a characteristic of clusters 7 and 8.
Conclusion
This study has adopted an unsupervised patient-oriented clustering approach to analyse longitudinal primary care EHRs of 30 LTCs to identify patterns of multimorbidity associated with stroke in a relatively large cohort of 9,847 patients. A typology of eight clusters of health trajectories characterised by distinct patterns of multimorbidity, levels of medication, socio-demographics profiles, and stroke risk factors was derived. Identified clusters reflect marked trends in age, ethnicity and prevalence of conventional stroke risk factors, and highlight the importance of mental health conditions in conjunction with behavioural/lifestyle related factors in complex profiles of multimorbidity displayed in a significant proportion of patients. The typology reveals also specific combinations, such as osteoarthritis and chronic pain in older patients, sickle cell anaemia or low levels of multimorbidity and risk factors in younger patients.
These results align with previous finding on stroke associated multimorbidity [1,2,3] and more generally with the epidemiology of stroke which is more likely to occur in older male patients with a background of hypertension and/or diabetes [47], these risk factors being more prevalent in patients from Asian and Black ethnicity [48, 49]. Other key points include, the association between mental health conditions and infectious diseases [34], the correlation between the number of long-term conditions and the number of medication [50], and specific long-term conditions such as asthma and mental health conditions, osteoarthritis or sickle-cell anaemia as landmark of the different identified patterns.
While, mental health conditions are considered as non-traditional stroke risk factor [10, 51, 52], the association between stoke and osteoarthritis or asthma is more controversial. Proposed explanations include the shared risk factors hypothesis (older age (cluster 6) or obesity and smoking habits (cluster 5) [53, 54]), the activation of inflammatory pathophysiological pathways, that contribute to vascular inflammation and atherosclerosis observed in stroke [55], and the exposure to non-steroidal anti-inflammatory drugs used for managing osteoarthritis symptoms [56] or \(\beta\)2-agonists and systemic corticosteroids used in asthma treatment [57,58,59]. Stroke is also considered as a catastrophic complication of sickle-cell anaemia [46], occurring in 5% to 10% of patients before adulthood, and up to 24% by the age of 45 [60]. Finally, the low level of multimorbidity and the absence of combination of major risk factors in cluster 8 are also remarkable results of the proposed typology. This trend reflects the lower prevalence of multimorbidity observed in younger patients [61], or possibly cryptogenic strokes [62] and insufficient screening strategies of cardiovascular risk factors in this younger population [63].
These results exemplify the ability of the proposed method to identify specific patterns of multimorbidity or even repeated motives displayed in various subpopulations. For instance the sequences osteoarthritis to stroke which involves more that 90% of patients in cluster 6 (Table 4) and sickle-cell anaemia to stroke observed in 87.8% of patients diagnosed with sickle-cell anaemia in cluster 7, or the sequence hypertension/stroke spotted at different ages and socio-demographic contexts from cluster 2 to cluster 3 with a shift of about 25 years between cluster 2 and 4 and 15 years between cluster 2 and 3.
Multimorbidity is known to have profound implications for both patients and primary care [64]. However, beyond the necessity of measuring the prevalence of multimorbidity, the identification of recurrent patterns of long-term conditions is needed to effectively promote patient oriented prevention and care strategies tailored to the needs of the main patient groups [3, 13]. Accordingly, our results suggest that initiatives such as age and culturally tailored lifestyle interventions to prevent and manage conventional stroke risk factors [43], as well as evaluations of patient complexity and associated needs [1], could be beneficial to both patients and primary care professionals in the considered areas.
Although our study aims to address previously identified methodological and evidence gaps in the longitudinal mapping of long-term conditions into distinct patterns of multimorbidity associated with stroke [3, 13, 28], our results are subject to limitations. These limitations arise from factors including the specific environment and population characteristics (urban, young, multi-ethnic, and deprived) as well as methodological choices, such as the use of a specific metrics to compute patient distances and the subsequent selection of clustering methods. Further research should be conducted in collaboration with patients, primary care professionals, and higher education institutions, including primary care teachers and trainees. This collaborative approach will enhance study designs to better align with patients’ needs and develop strategies to more effectively translate epidemiological evidence into practice.
Data availability
Restrictions apply to the availability of the data supporting the findings of this study. The data were used under license, following project approval by the Lambeth Clinical Commissioning Group, and therefore are not publicly available.
Abbreviations
- EHR:
-
Electronic health records
- LTC:
-
Long-term conditions
- CKD:
-
Chronic kidney disease
- COPD:
-
Chronic obstructive pulmonary disease
- IMD:
-
Index of multiple deprivation
- NHI:
-
National health institute
References
Gallacher KI, Batty GD, McLean G, Mercer SW, Guthrie B, May CR, Langhorne P, Mair FS. Stroke, multimorbidity and polypharmacy in a nationally representative sample of 1,424,378 patients in scotland: implications for treatment burden. BMC Med. 2014;12(1):1–9.
Gallacher KI, McQueenie R, Nicholl B, Jani BD, Lee D, Mair FS. Risk factors and mortality associated with multimorbidity in people with stroke or transient ischaemic attack: a study of 8,751 uk biobank participants. J Comorbidity. 2018;8(1):1–8.
Gallacher KI, Jani BD, Hanlon P, Nicholl BI, Mair FS. Multimorbidity in stroke. Stroke. 2019;50(7):1919–26.
The Academy of Medical Sciences. Multiple long-term conditions (multimorbidity): a priority for global health research. 2023. https://acmedsci.ac.uk/file-download/82222577. Accessed 28 Jul 2023.
Johnston MC, Crilly M, Black C, Prescott GJ, Mercer SW. Defining and measuring multimorbidity: a systematic review of systematic reviews. Eur J Public Health. 2019;29(1):182–9.
Skou ST, Mair FS, Fortin M, Guthrie B, Nunes BP, Miranda JJ, Boyd CM, Pati S, Mtenga S, Smith SM. Multimorbidity. Nat Rev Dis Prim. 2022;8(1):48.
Fortin M, Stewart M, Poitras M-E, Almirall J, Maddocks H. A systematic review of prevalence studies on multimorbidity: toward a more uniform methodology. Ann Fam Med. 2012;10(2):142–51.
Schmidt M, Jacobsen JB, Johnsen SP, Bøtker HE, Sørensen HT. Eighteen-year trends in stroke mortality and the prognostic influence of comorbidity. Neurology. 2014;82(4):340–50.
Safford MM, Allison JJ, Kiefe CI. Patient complexity: more than comorbidity. The vector model of complexity. J Gen Intern Med. 2007;22:382–90.
O’Donnell MJ, Xavier D, Liu L, Zhang H, Chin SL, Rao-Melacini P, Rangarajan S, Islam S, Pais P, McQueen MJ, et al. Risk factors for ischaemic and intracerebral haemorrhagic stroke in 22 countries (the interstroke study): a case-control study. Lancet. 2010;376(9735):112–23.
Bang OY, Ovbiagele B, Kim JS. Nontraditional risk factors for ischemic stroke: an update. Stroke. 2015;46(12):3571–8.
Gruneir A, Griffith LE, Fisher K, Panjwani D, Gandhi S, Sheng L, Patterson C, Gafni A, Ploeg J, Markle-Reid M. Increasing comorbidity and health services utilization in older adults with prior stroke. Neurology. 2016;87(20):2091–8.
Aquino MRJRV, Turner GM, Mant J. Does characterising patterns of multimorbidity in stroke matter for developing collaborative care approaches in primary care? Primary Health Care Res Dev. 2019;20:e110.
Cezard G, McHale CT, Sullivan F, Bowles JKF, Keenan K. Studying trajectories of multimorbidity: a systematic scoping review of longitudinal approaches and evidence. BMJ Open. 2021;11(11):e048485.
Elder GH Jr. The life course as developmental theory. Child Dev. 1998;69(1):1–12.
Kaufman L, Rousseeuw PJ. Finding Groups in Data: An Introduction to Cluster Analysis. 1st ed. Hoboken: Wiley; 2009.
Ng SK, Tawiah R, Sawyer M, Scuffham P. Patterns of multimorbid health conditions: a systematic review of analytical methods and comparison analysis. Int J Epidemiol. 2018;47(5):1687–704.
Pollock G. Holistic trajectories: a study of combined employment, housing and family careers by using multiple-sequence analysis. J R Stat Soc Ser A (Stat Soc). 2007;170(1):167–83.
Shippee ND, Shah ND, May CR, Mair FS, Montori VM. Cumulative complexity: a functional, patient-centered model of patient complexity can improve research and practice. J Clin Epidemiol. 2012;65(10):1041–51.
Hafezparast N, Turner EB, Dunbar-Rees R, Vodden A, Dodhia H, Reynolds B, Reichwein B, Ashworth M. Adapting the definition of multimorbidity-development of a locality-based consensus for selecting included long term conditions. BMC Fam Pract. 2021;22(1):1–11.
Milligan GW, Cooper MC. An examination of procedures for determining the number of clusters in a data set. Psychometrika. 1985;50:159–79.
Delord M, Douiri A. Unsupervised clustering approach to multiple time-to-event electronic health records applied to multimorbidity associated with myocardial infarction. 2023 Jul 12 [Preprint]. Research Square. Available from: https://doiorg.publicaciones.saludcastillayleon.es/10.21203/rs.3.rs-3127943/v1.
Jaccard P. Étude comparative de la distribution florale dans une portion des alpes et des jura. Bull Soc Vaudoise Sci Nat. 1901;37:547–79.
R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2023.
May CR, Eton DT, Boehmer K, Gallacher K, Hunt K, MacDonald S, Mair FS, May CM, Montori VM, Richardson A, et al. Rethinking the patient: using burden of treatment theory to understand the changing dynamics of illness. BMC Health Serv Res. 2014;14:1–11.
Centers for Disease Control and Prevention (CDC). Prevalence of doctor-diagnosed arthritis and arthritis-attributable activity limitation–United States, 2010-2012. MMWR Morb Mortal Wkly Rep. 2013;62(44):869–73
Moussavi S, Chatterji S, Verdes E, Tandon A, Patel V, Ustun B. Depression, chronic diseases, and decrements in health: results from the world health surveys. Lancet. 2007;370(9590):851–8.
Gallacher KI, May CR, Langhorne P, Mair FS. A conceptual model of treatment burden and patient capacity in stroke. BMC Fam Pract. 2018;19:1–15.
De Los RÃos F, Kleindorfer DO, Khoury J, Broderick JP, Moomaw CJ, Adeoye O, Flaherty ML, Khatri P, Woo D, Alwell K, et al. Trends in substance abuse preceding stroke among young adults: a population-based study. Stroke. 2012;43(12):3179–83.
O’Donnell MJ, Chin SL, Rangarajan S, Xavier D, Liu L, Zhang H, Rao-Melacini P, Zhang X, Pais P, Agapay S, et al. Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries (interstroke): a case-control study. Lancet. 2016;388(10046):761–75.
Mertens JR, Weisner C, Ray GT, Fireman B, Walsh K. Hazardous drinkers and drug users in hmo primary care: prevalence, medical conditions, and costs. Alcohol Clin Exp Res. 2005;29(6):989–98.
Cherpitel CJ. Alcohol and injuries: a review of international emergency room studies since 1995. Drug Alcohol Rev. 2007;26(2):201–14.
Cheng V, Inaba K, Johnson M, Byerly S, Jiang Y, Matsushima K, Haltmeier T, Benjamin E, Lam L, Demetriades D. The impact of pre-injury controlled substance use on clinical outcomes following trauma. J Trauma Acute Care Surg. 2016;81(5):913.
Anand P, Springer SA, Copenhaver MM, Altice FL. Neurocognitive impairment and hiv risk factors: a reciprocal relationship. AIDS Behav. 2010;14:1213–26.
Loftis JM, Matthews AM, Hauser P. Psychiatric and substance use disorders in individuals with hepatitis c: epidemiology and management. Drugs. 2006;66:155–74.
Kelly TM, Daley DC. Integrated treatment of substance use and psychiatric disorders. Soc Work Public Health. 2013;28(3–4):388–406.
Gan WQ, Buxton JA, Scheuermeyer FX, Palis H, Zhao B, Desai R, Janjua NZ, Slaunwhite AK. Risk of cardiovascular diseases in relation to substance use disorders. Drug Alcohol Depend. 2021;229:109132.
Peppin JF, Cheatle MD, Kirsh KL, McCarberg BH. The complexity model: a novel approach to improve chronic pain care. Pain Med. 2015;16(4):653–66.
Schaible H-G. Mechanisms of chronic pain in osteoarthritis. Curr Rheumatol Rep. 2012;14:549–56.
Kalaria RN, Akinyemi R, Ihara M. Stroke injury, cognitive impairment and vascular dementia. Biochim Biophys Acta (BBA) - Mol Basis Dis. 2016;1862(5):915–25.
Leys D, Hénon H, Mackowiak-Cordoliani M-A, Pasquier F. Poststroke dementia. Lancet Neurol. 2005;4(11):752–9.
Howard G, Cushman M, Kissela BM, Kleindorfer DO, McClure LA, Safford MM, Rhodes JD, Soliman EZ, Moy CS, Judd SE, et al. Traditional risk factors as the underlying cause of racial disparities in stroke: lessons from the half-full (empty?) glass. Stroke. 2011;42(12):3369–75.
Wadi NM, Asantewa-Ampaduh S, Rivas C, Goff LM. Culturally tailored lifestyle interventions for the prevention and management of type 2 diabetes in adults of black african ancestry: a systematic review of tailoring methods and their effectiveness. Public Health Nutr. 2022;25(2):422–36.
Rees DC, Williams TN, Gladwin MT. Sickle-cell disease. Lancet. 2010;376(9757):2018–31.
Platt OS. Prevention and management of stroke in sickle cell anemia. ASH Educ Program Book. 2006;2006(1):54–7.
Ohene-Frempong K, Weiner SJ, Sleeper LA, Miller ST, Embury S, Moohr JW, Wethers DL, Pegelow CH, Gill FM, and the Cooperative Study of Sickle Cell Disease. Cerebrovascular accidents in sickle cell disease: rates and risk factors. Blood J Am Soc Hematol. 1998;91(1):288–294.
Feigin VL, Norrving B, Mensah GA. Global burden of stroke. Circ Res. 2017;120(3):439–48.
Kittner SJ, White LR, Losonczy KG, Wolf PA, Hebel JR. Black-white differences in stroke incidence in a national sample: the contribution of hypertension and diabetes mellitus. Jama. 1990;264(10):1267–70.
Delord M, Ashworth M, Douiri A. Does ethnicity alter the risk of stroke in patients with modifiable cardiometabolic risk factors? medRxiv. 2024;2024.04.08.24305017 [Preprint]. https://doiorg.publicaciones.saludcastillayleon.es/10.1101/2024.04.08.24305017.
Glynn LG, Valderas JM, Healy P, Burke E, Newell J, Gillespie P, Murphy AW. The prevalence of multimorbidity in primary care and its effect on health care utilization and cost. Fam Pract. 2011;28(5):516–23.
Dong J-Y, Zhang Y-H, Tong J, Qin L-Q. Depression and risk of stroke: a meta-analysis of prospective studies. Stroke. 2012;43(1):32–7.
Everson-Rose SA, Roetker NS, Lutsey PL, Kershaw KN, Longstreth WT Jr, Sacco RL, Diez Roux AV, Alonso A. Chronic stress, depressive symptoms, anger, hostility, and risk of stroke and transient ischemic attack in the multi-ethnic study of atherosclerosis. Stroke. 2014;45(8):2318–23.
Georgiev T, Angelov AK. Modifiable risk factors in knee osteoarthritis: treatment implications. Rheumatol Int. 2019;39(7):1145–57.
Corlateanu A, Stratan I, Covantev S, Botnaru V, Corlateanu O, Siafakas N. Asthma and stroke: a narrative review. Asthma Res Pract. 2021;7(1):1–17.
van Rooy M-J, Pretorius E1. Obesity, hypertension and hypercholesterolemia as risk factors for atherosclerosis leading to ischemic events. Curr Med Chem. 2014;21(19):2121–9.
Atiquzzaman M, Karim ME, Kopec J, Wong H, Anis AH. Role of nonsteroidal antiinflammatory drugs in the association between osteoarthritis and cardiovascular diseases: a longitudinal study. Arthritis Rheumatol. 2019;71(11):1835–43.
Appleton SL, Ruffin RE, Wilson DH, Taylor AW, Adams RJ, North West Adelaide Cohort Health Study Team, et al. Cardiovascular disease risk associated with asthma and respiratory morbidity might be mediated by short-acting \(\beta\)2-agonists. J Allergy Clin Immunol. 2009;123(1):124–130.
Zeiger R, Sullivan P, Chung Y, Kreindler JL, Zimmerman NM, Tkacz J. Systemic corticosteroid-related complications and costs in adults with persistent asthma. J Allergy Clin Immunol Pract. 2020;8(10):3455–65.
Cazzola M, Rogliani P, Calzetta L, Matera MG. Bronchodilators in subjects with asthma-related comorbidities. Respir Med. 2019;151:43–8.
Menaa F. Stroke in sickle cell anemia patients: a need for multidisciplinary approaches. Atherosclerosis. 2013;229(2):496–503.
Barnett K, Mercer SW, Norbury M, Watt G, Wyke S, Guthrie B. Epidemiology of multimorbidity and implications for health care, research, and medical education: a cross-sectional study. Lancet. 2012;380:37–43.
Hart RG, Diener H-C, Coutts SB, Easton JD, Granger CB, O’Donnell MJ, Sacco RL, Connolly SJ. Embolic strokes of undetermined source: the case for a new clinical construct. Lancet Neurol. 2014;13(4):429–38.
Lang S-J, Abel GA, Mant J, Mullis R. Impact of socioeconomic deprivation on screening for cardiovascular disease risk in a primary prevention population: a cross-sectional study. BMJ Open. 2016;6(3):e009984.
Wallace E, Salisbury C, Guthrie B, Lewis C, Fahey T, Smith SM. Managing patients with multimorbidity in primary care. BMJ. 2015;350.
Acknowledgements
We are grateful to Professor Anne Stephenson, Dr. Niki Jakeways, and Dr. Rini Paul from King’s Undergraduate Medical Education in the Community for their support with this project.
Funding
This project is funded by King’s Health Partners / Guy’s and St Thomas Charity ‘MLTC Challenge Fund’ (grant number EIC180702) and support from the National Institute for Health and Care Research (NIHR) under its Programme Grants for Applied Research (NIHR202339).
Author information
Authors and Affiliations
Contributions
AD and MD designed the study (conceptualization and methodology), AD, MA, XS and MD conducted investigations, XS, AL and MD carried out data curation, MD provided computer code and performed formal analysis including data visualisation and validation, MD drafted the manuscript, and AD supervised the research. MA, AD, VC, XS, and CW reviewed and edited the manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
This study was conducted in accordance with the Declaration of Helsinki. Data were provided by the Lambeth DataNet upon approval for the analysis of fully anonymised data by the Lambeth Clinical Commissioning Group. All patients were informed through ’Fair Processing Notices’ of the potential use of collected electronic health records in ’secondary data analysis’ and were given the option to opt out of data sharing scheme. Accordingly, this study was exempt from ethical committee approval and individual consent requirements as per the National Health Institute (NHI) Health Research Authority (HRA) guidelines for research using anonymised primary care data in the United Kingdom.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Delord, M., Sun, X., Learoyd, A. et al. Patient-oriented unsupervised learning to uncover the patterns of multimorbidity associated with stroke using primary care electronic health records. BMC Prim. Care 25, 419 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12875-024-02636-6
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12875-024-02636-6