APPLICATION OF k- NEAREST NEIGHBOUR CLASSIFICATION IN MEDICAL DATA MINING IN THE CONTEXT OF KENYA
Abstract
Medical data is an ever-growing source of information from hospitals in form of patient records. When mined, the information hidden in these records is a huge resource bank for medical research. This data contains hidden patterns and relationships, which can lead to better diagnosis. Unfortunately, discovery of these patterns and relationships often goes unexploited. Studies have been carried out in medical diagnosis to predict heart diseases, lungs diseases, and various tumors based on the past data collected from patients. However, they are mostly limited to domain-specific systems that predict diseases restricted to their area of operations. In retrospect, the performance of the k-nearest neighborhoods (k-NN) classifier is highly dependent on the distance metric used to identify the k nearest neighbors of the query points. The standard Euclidean distance is commonly used in practice. This study uses vast storage of information so that diagnosis based on historical data can be made. It focuses on computing the probability of occurrence of a particular ailment by using a unique algorithm. This k-NN algorithm increases the accuracy of such diagnosis. The algorithm can be used to enhance the automated diagnoses, which include diagnosis of multiple diseases showing similar symptoms. To validate the experimental results, a hypothesis was tested for the following variables: accidents, age, allergies, blood pressure, smoking habit, total cholesterol, diabetes and hypertension, family history of heart disease, obesity, and lack of physical activity. It was evident that there was a strong relationship between the above variables to the causes of common chronic diseases like: heart ailment, diabetes and cancer.
Key words: k-NN, classification, algorithm
References
Full Text: PDF