Classifying Allergic Rhinitis Subjects and Identifying Single Nucleotide Polymorphisms Using a Support Vector Machine Approach
by Jason Chan
Abstract – Allergic rhinitis is a common respiratory disease that affects a large proportion of the population and is associated with a loss of work productivity and economic losses. There has been known to be a genetic link in the onset of allergic rhinitis, so we aimed to identify correlated SNPs using a novel support vector machine (SVM) method. We gathered our genetic data from a publicly available database and one-hot encoded the SNP files. Then, we created sparse matrices to reduce random access memory (RAM) and ran a SVM to classify individuals on the basis of allergic rhinitis, as well as identify key SNPs. Our model achieved moderately high accuracy/macro F1 score and identified 736 genome-wide significant SNPs. Analyzing these SNPs further, we found a common gene associated with many of the discovered allergic rhinitis-associated SNPs. This study furthered the knowledge in understanding the onset of allergic rhinitis and introduced using SVMs in analyzing the genetic implications of allergic rhinitis.