DEVELOPMENT OF DATA MINING METHODOLOGIES AND MACHINE LEARNING MODELS TO UNDERSTAND CARDIOVASCULAR DISEASE MECHANISMS
Abstract
World Health Organization (WHO) reported that in 2016, 31% (17.9 million) of
the total deaths in the world were caused by Coronary Artery Disease (CAD) and
it is estimated that around 23.6 million people will die from CAD in 2030. In the
following years, this disease will cause millions of more deaths and the diagnosis
and treatment will cost billions of dollars. CAD, which is a sub-category of
Cardiovascular Disease (CVD), is the inability to feed the heart with blood as a
result of the accumulation of fatty matter called atheroma on the walls of the
arteries. With the development of machine learning and data mining techniques,
it became possible to diagnose Cardiovascular Diseases (CVD), especially CADs,
at a lower cost via checking some physical and biochemical values. To this end,
in this thesis, for CVD diagnosis problem, different computational feature selection (FS) methods, dimension reduction, and different classification
algorithms have been evaluated; and a domain knowledge-based FS method, an
ensemble FS method and a probabilistic FS method have been proposed. Via
experimenting on two publicly available data sets, i.e., UCI Cleveland and ZAlizadehsani,
this thesis aims to generate a robust model for the diagnosis of
CVD, at a lower cost. In our experiments, our proposed solution achieved 91.78%
accuracy and 93.50% sensitivity on the diagnostic tests.