APPLICATION OF k- NEAREST NEIGHBOUR CLASSIFICATION IN MEDICAL DATA MINING IN THE CONTEXT OF KENYA

H S Khamis

Abstract


Medical data is an ever-growing source of information from hospitals in form of patient records. When mined, the information  hidden  in  these  records  is  a  huge  resource  bank  for  medical  research.  This  data  contains  hidden patterns  and  relationships,  which  can  lead  to  better  diagnosis.  Unfortunately,  discovery  of  these  patterns  and relationships often goes unexploited. Studies have been carried out in medical diagnosis to predict heart diseases, lungs  diseases,  and  various  tumors  based  on  the  past  data  collected  from  patients.  However,  they  are  mostly limited to domain-specific systems that predict diseases restricted to their area of operations.  In retrospect, the performance of the k-nearest neighborhoods (k-NN) classifier is highly dependent on the distance metric used to identify the k nearest neighbors of the query points. The standard Euclidean distance is commonly used in practice. This study uses vast storage of information so that diagnosis based on historical data can be made. It focuses on computing the probability of occurrence of a particular ailment by using a unique algorithm. This k-NN algorithm increases the accuracy of such diagnosis. The algorithm can be used to enhance the automated diagnoses, which include  diagnosis  of  multiple  diseases  showing  similar  symptoms.  To  validate  the  experimental  results,  a hypothesis  was  tested  for  the  following  variables:  accidents,  age,  allergies,  blood  pressure,  smoking  habit,  total cholesterol, diabetes and hypertension, family history of heart disease, obesity, and lack of physical activity. It was evident  that  there  was  a  strong  relationship  between  the  above  variables  to  the  causes  of  common  chronic diseases like: heart ailment, diabetes and cancer.

Key words: k-NN, classification, algorithm


References



Full Text: PDF