Predicting heart failure onset in the general population using a novel … – Nature.com

Study design and participants

This was a retrospective observational study in Japan, and the informed consent for the retrospective study was waived with the Ethical Guidelines for Medical and Biological Research Involving Human Subjects issued by Ministry of Education, Culture, Sports, Science and Technology, Ministry of Health, Labour and Welfare, and Ministry of Economy, Trade and Industry in Japan. We analyzed healthcare insurance claims data obtained from the Japan Medical Data Center (JMDC) in Tokyo. The database contains standardized eligibility and claims data provided by health insurance societies for approximately 4.5 million insured individuals, including data from employees of general corporations and their family members. The database contains all medical treatments received by insured individuals at all treatment facilities and a comprehensive record of all treatments administered to a given patient. We removed the decoding indexes and analyzed the personal data with unlinkable anonymization.

The study protocol was approved by the ethics committee of the National Cerebral and Cardiovascular Center (M22-49, M24-51). On the basis of the Japanese Clinical Research Guidelines, the committee decided that patient informed consent was not essential for inclusion in this study because of the retrospective observational nature of the study. Instead, JMDC made a public announcement in accordance with the ethics committees request and the Japanese Clinical Research Guidelines. The study was performed following the principles of the Declaration of Helsinki and the Japanese Ethical Guidelines for Clinical Research.

In the database of the approximately 4.5 million people, we set the entry criteria of (1) the presence of the complete dataset of the consecutive period of 4years following 2010, (2) no diagnosis of HF at the first year, (3) evidence of the presence or absence of diagnosis of HF during 4years. Under these entry criteria, we narrowed down the database and obtained the complete dataset of 308,205 people. We then randomly allocated 32,547 people for the analysis cohort (Protocol 1) and remaining of 275,658 people for the validation cohort (Protocol 2) using the random numbers table. Because we have had several general experiences that about 30,000 people with 1% of incidence are enough to obtain the significant combinations of factors, and indeed we have experienced that we need several thousands people with about 10% of incidence to obtain the sufficient combinations of factors to identify the worthening of HF10, we selected about 30,000 people to find the meaningful and sufficient combinations of factors to detect the onset of HF for Protocol 1.

In 32,547 Japanese cohort, we obtained 288 clinical, medical, habitual, and physical variables at 2010, including sex; age; urinary sugar levels (borderline, 1+, 2+, 3+, and 4+); urinary protein levels (borderline, 1+, 2+, 3+, and 4+); plasma LDL and HDL cholesterol and triglyceride levels (mg/L); plasma HbA1c levels (%); body mass index; systolic and diastolic blood pressure (mmHg); plasma uric acid levels (mg/L); fasting plasma glucose levels (mg/dL); plasma ALT, AST, and -GTP levels (IU/L); abdominal circumference length (cm); red blood cell number (104/L) and blood Hb levels (g/dL); chest XP findings (A=normal, B=slight changes but no need for observation, C=need for observation, and H=need for treatment); ECG findings (A=normal, B=slight changes but no need for observation, C=need for observation, and H=need for treatment); work using visual display terminals (VDT) (A=need for observation, B=slight changes but no need for observation, and C=normal); interview regarding life habits (smoking at present: yes or no; more than 30min exercise per day: yes or no; changes in body weight more than 2kg over 1year: yes or no; drinking alcohol at present: every day, not every day but sometimes, or none); and prescription details. We carefully performed data cleaning for all data. People were monitored for HF until 2014. HF was diagnosed by cardiologists and general practitioners using the Framingham Criteria of Congestive Heart Failure11, plasma BNP levels12 and echocardiogram13, which seems to be reliable to precisely and accurately diagnose the several types of HF.

We separated people with and without the occurrence of HF over 4years (Table 1).

We employed the novel LAMP method for our data-mining analysis to identify rules consistent with single factors or combinations of factors that significantly affected the occurrence of cardiovascular events8. A person was represented by both individual clinical factors and the class labels of groups with or without the occurrence of HF, and this set of populations was used to form a data table in which each row represented a person. This data table D consisted of N rows, each of which consisted of M factors and a positive or negative class label for each object. LAMP uses Fishers exact tests to draw conclusions from a complete set of statistically significant hypotheses regarding a class label. Here, the hypothesis was based on a combination of class labels and conditions defined as a subset of the M factors in D. As the condition of the uncovered significant hypothesis may include any number of factors from 1 to M, the term limitless-arity has been used to describe this method. Accordingly, LAMP applies a highly efficient search algorithm to quickly and completely derive significant hypotheses from 2M candidates.

If k is the number of all hypotheses for which the conditions exceed or remain equal to objects in D (

$$ f(sigma ) = {{left( {begin{array}{*{20}c} {n_{p} } \ sigma \ end{array} } right)} mathord{left/ {vphantom {{left( {begin{array}{*{20}c} {n_{p} } \ sigma \ end{array} } right)} {left( {begin{array}{*{20}c} N \ sigma \ end{array} } right)}}} right. kern-0pt} {left( {begin{array}{*{20}c} N \ sigma \ end{array} } right)}}. $$

Here, np is the number of objects with positive class labels in D (np/kD (). Because f() is antimonotonic for and /kD () is monotonic, LAMP selects * to balance f(*) and /kD (*). The selected value of * yields the smallest number of candidate hypotheses without applying the tests or missing any significant hypotheses.

For practical reasons, we were interested in a hypothesis that held true for at least 10 people. As all hypotheses involving more than four factors failed to meet this criterion, we limited our LAMP-based search to a maximum of four factors. This limitation further reduced the number kD (*) of the candidate hypotheses and increased the level /kD (*) in LAMP. After all significant hypotheses regarding single clinical factors or combinations of factors were obtained, we excluded each hypothesis for which the condition was a superset of conditions from other simpler hypotheses as the significance of the former would be trivial in comparison with the significance of the latter.

For a larger cohort of 275,658 general people, the clinical characteristics in 2010 were investigated (Table 1) and the occurrence of HF until 2014 was determined. The number of combinations of factors matching the predictive combinations of factors for the onset of HF obtained in Protocol 1 in each of the 275,658 people was determined. To prove the idea that the onset of HF is predictable using clinical variables, we tested the hypothesis that the number of combinations of factors matching the predictive combinations of factors for 2010 is linked to the actual occurrence of HF over 4years. In detail, we checked how many combinations of factors for the prediction of the onset of HF discovered in Protocol I in each of individuals in Protocol II, and we classified 275,658 people of Protocol II into six groups who had 0, 150, 51100, 101150, 151200 or 201250 predictive combinations of factors discovered in Protocol 1 for the onset of HF.

It is a potentially informative censoring due to death from cardiovascular and non-cardiovascular deaths in this study, however this study did not analyze these clinical outcomes because no information of cardiovascular or non-cardiovascular death in the present data set is available.

Descriptive statistics of continuous variables are presented as means with standard deviations. We tested the significant levels of all of the combinations of factors with no more than four clinical mutable variables among 288 variables observed in the present study, and LAMP automatically created multiple comparisons of 470,700 t-tests. We corrected the significance P value using Bonferroni correction in Protocol 1: We multiplied the P values by 470,700 and we used the multiplied P values for statistical analysis. In Protocol 2, since LAMP tests a significant combination based on the binarized data of 0 or 1, we used the Cochran's Q test, one of the nonparametric methods to perform multiple comparison to test whether the probability of the onset of HF increased as the number of combinations of factors matching the predictive combinations of factors increased. We also performed the KaplanMyer Analysis to test the time dependency of the results in Protocol 2. All P values were two-sided, and a P value of<0.05 was considered statistically significant.

See the original post:

Predicting heart failure onset in the general population using a novel ... - Nature.com

Related Posts

Comments are closed.