The IYRC
  • Home
  • 2023 Conference
    • Submission & Presentation Guidelines
    • Paper Submission
    • FAQ
  • 2023 Summer Program
    • About
    • Application
    • Pre-orientation Program
  • Past Conferences
    • IYRC Fall 2022
    • IYRC Fall 2021 >
      • Authors
      • IYRC Fall 2021 Proceedings
      • Schedule
      • Keynote Speakers
    • IYRC Spring 2021 >
      • Authors
      • Scholarship
      • Schedule
      • Keynote Speakers
    • IYRC 2020 >
      • IYRC 2020 Proceedings
      • Guest Speakers
      • Events >
        • HCJI Panel
        • Guest Speaker - Molly Edwards
        • Guest Speaker - Paul Lewis
    • IYRC 2019 Proceedings
    • IYRC 2018 Proceedings
  • Past Summer Programs
    • IYRC Summer 2022
    • IYRC Summer 2021
  • About
    • Programs
    • Partnerships
    • Who we are
    • Gallery
    • Testimonials
  • Contact Us

Escalating the Quantity of Medical Data Using CTGAN: Diabetes Dataset

by Jihyung Kim
Category: STEM
Abstract – The number of diabetes diagnoses is increasing sharply in the United States. It is a life- long disease that can cause serious symptoms such as blurred visions. Collecting medical data requires a consent form and goes through complicated procedures, which makes it harder. Conditional Generative Adversarial Network(CTGAN) can help to solve this problem. GAN is a Deep Learning model that manufactures synthetic data. CTGAN is basically GAN because it goes through very similar procedures, but CTGAN is for table data. We checked how accurate the fake data was to the real data using various machine learning models and deep learning. Logistic Regression(LR), Decision Tree(DT), KNN, Gradient Boosting(GB), Light Gradient Boosting Machine(LGBM), Support Vector Classifier(SVC), Gaussian, and Deep Neural Network(DNN) got 40.55%, 38.1%, 44.5%, 39%, 35.35%, 44.05%, 53.65%, 39.25%, and 34.2%, respectively. We applied GridSearch on two models: Random Forest(RF) and Light Gradient Boosting Machine(LGBM). Random Forest(RF) showed a bit better accuracy by performing 77.85% while Light Gradient Boosting Machine(LGBM) performed 76.65%. Then we decided to create a new dataset combining the fake data with a bit of real data. When we compared the new dataset with the pure real data, the accuracy scores from all models almost doubled, 100%, 77.3%, 88.5%, 93.9%, 93.05%, 88%, 93.9%, 75.05%, and 65.8%, respectively. Although we had to modify the model in order to reach a satisfactory result, CTGAN can become a very significant model for researchers who need a large amount of data.
  • PAPER
  • PRESENTATION VIDEO
<
>
Download PDF
Contact us: ​info@the-iyrc.org
  • Home
  • 2023 Conference
    • Submission & Presentation Guidelines
    • Paper Submission
    • FAQ
  • 2023 Summer Program
    • About
    • Application
    • Pre-orientation Program
  • Past Conferences
    • IYRC Fall 2022
    • IYRC Fall 2021 >
      • Authors
      • IYRC Fall 2021 Proceedings
      • Schedule
      • Keynote Speakers
    • IYRC Spring 2021 >
      • Authors
      • Scholarship
      • Schedule
      • Keynote Speakers
    • IYRC 2020 >
      • IYRC 2020 Proceedings
      • Guest Speakers
      • Events >
        • HCJI Panel
        • Guest Speaker - Molly Edwards
        • Guest Speaker - Paul Lewis
    • IYRC 2019 Proceedings
    • IYRC 2018 Proceedings
  • Past Summer Programs
    • IYRC Summer 2022
    • IYRC Summer 2021
  • About
    • Programs
    • Partnerships
    • Who we are
    • Gallery
    • Testimonials
  • Contact Us