Search published articles


Showing 3 results for Machine Learning

Mahsa Saadati, Arezoo Bagheri,
Volume 7, Issue 3 (9-2019)
Abstract

Background and objectives: Application of statistical machine learning methods such as ensemble based approaches in survival analysis has been received considerable interest over the past decades in time-to-event data sets. One of these practical methods is survival forests which have been developed in a variety of contexts due to their high precision, non-parametric and non-linear nature. This article aims to evaluate the performance of survival forests by comparing them with Cox-proportional hazards (CPH) model in studying first birth interval (FBI).
Methods: A cross sectional study in 2017 was conducted by the stratified random sampling and a structured questionnaire to gather the information of 610, 15-49-year-old married women in Tehran. Considering some influential covariates on FBI, random survival forest (RSF) and conditional inference forest (CIF) were constructed by bootstrap sampling method (1000 trees) using R-language packages. Then, the best model is used to identify important predictors of FBI by variable importance (VIMP) and minimal depth measures.
Results: According to prediction accuracy results by out-of-bag (OOB) C-index and integrated Brier score (IBS), RSF outperforms CPH and CIF in analyzing FBI (C-index of 0.754 for RSF vs 0.688 for CIF and 0.524 for CPH and IBS of 0.076 for RSF vs 0.086 for CIF and 0.107 for CPH). Woman’s age was the most important predictor on FBI.
Conclusions: Applying suitable method in analyzing FBI assures the results which be used for making policies to overcome decrement in total fertility rate.

Hediye Shariaty , Fatemeh Bagheri ,
Volume 13, Issue 1 (9-2025)
Abstract

Background: Diabetes is a prevalent condition with no definitive cure, often referred to as a” silent killer.” Diabetes is primarily categorized into three types: Type I, Type II, and gestational diabetes. In Type I diabetes, the body's immune system attacks and damages the insulin-producing cells. Conversely, Type II diabetes, which is more common than Type I, occurs when the body does not respond adequately to the insulin being produced, resulting in elevated blood sugar levels. Effectively treating pre-diabetes can prevent its progression to full-blown diabetes.
Methods: In the present research, a semi-supervised approach is proposed to predict diabetes. Improved missing value imputation (MVI) is achieved by utilizing Gaussian mixture model (GMM) clustering. The proposed classifier integrates GMM with a machine learning algorithm, specifically random forest (RF), thereby inducing a more robust predictive model via the fusion of clustering and classification techniques.
Results: The proposed method achieves an accuracy of 84%, a precision of 82.03%, a recall of 69.75%, and an F1-score of 75.12% base on experiments conducted on the PIMA Indian population.
Conclusion: Employing GMM to fill in missing values provides the advantage of replacing invalid data with the most similar records, thereby enhancing the quality of the dataset. The proposed classifier also exhibits strong predictive capabilities in identifying diabetes. By integrating this combined approach, this study offers an effective method for predicting diabetes, making a significant contribution to healthcare analytics as a whole.

Mina Rahmati , Masoud Arabfard ,
Volume 13, Issue 1 (9-2025)
Abstract

Background: Stroke is a leading cause of disability and mortality worldwide, with ischemic strokes comprising the majority of cases. Despite advances in neuroimaging, there is a pressing need for supplementary diagnostic tools to enhance accuracy. This study explores the application of machine learning (ML) techniques to predict ischemic stroke using RNA-seq data from the GEO database (GSE22255).
Methods: We developed and evaluated various machine learning models, including Random Forest, K-Nearest Neighbors (KNN), and CHAID (Chi-squared Automatic Interaction Detection), based on their accuracy, precision, specificity, and sensitivity. The analysis utilized a dataset comprising 54,676 genes across 40 samples (20 cases and 20 controls). All modeling was conducted using IBM SPSS Modeler version 18.
Results: The models were assessed based on their classification accuracy, performance evaluation scores, and AUC/Gini AUC metrics. The Random Forest model achieved the highest accuracy (96.67% in training, 80% in testing), while the CHAID algorithm provided interpretable results with key variables (TP53, CYP1A1, and CYP2D6) identified. The KNN model exhibited strong performance with notable confidence in its predictions.
Conclusion: This study demonstrates the potential of ML techniques, particularly Random Forest, to enhance stroke diagnosis and provide insights into stroke pathology, offering a novel approach to improving clinical decision-making. However, the study is limited by the small sample size, and future work should focus on validation with larger datasets and integration with other omics data for clinical application.


Page 1 from 1     

© 2025 CC BY-NC 4.0 | Jorjani Biomedicine Journal

Designed & Developed by : Yektaweb