Data Mining Approaches to Suicide and Suicidal Behaviors

Rumi Kato Price, Ph.D., M.P.E.

Project Overview:

What methods can be used to improve prediction of suicide and suicidal behavior and develop efficient methods for detecting very high-risk individuals?

The study seeks to select the most predictive measures from domains of risk and protective factors; to maximize the predictive power of the selected measures from each domain and across domains; to assess the generality of the findings and examine the structure of the associations among the most predictive measures; and to access the degree of improved performance with results obtained from more commonly used statistical models

This study utilizes three large data sets found in the public domain: the National Comorbidity Survey, the National Longitudinal Alcohol Epidemiological Survey and the National Mortality Followback Survey. These data sets will be analyzed with genetic algorithms, neural nets, and tree based regressions. The analysis will be conducted to select the most predictive measures, maximize the predictive power of the input measures, and interpret the constructed neural nets. The analysis will be cross-validated to test generality and compared to other, more standard, methods of analysis.

Final Report:

We proposed to improve prediction of suicide and suicidal behavior by employing computer-intensive “data-mining” techniques focusing on the genetic algorithms (GA), the artificial neural networks (ANN), and the tree-based regression (TBR) techniques. Although suicide is now considered preventable, predicting who will commit or attempt suicide and when such an act will occur, nonetheless, remains in the realm of intuition among clinicians or family members, thus making targeted and economical prevention difficult. So-called data-mining techniques can overcome limitations inherent in more traditional parametric-oriented approaches. The three techniques proposed in this application are complementary to each other with their combined use maximizing the strengths of each. We proposed to use three data-mining techniques to: a) select the most predictive measures of suicide and suicidal behavior using the GA; b) examine the patterns of interaction among the most predictive measures chosen by GA; c) maximize the predictive power of the selected measures using ANN and compare the results with those by other methods; and d) to examine the structure of associations among the most predictive measures using the information stored in the trained ANNs.

To read the full Final Report, please click here.