Cyril Grouin is the author of 'A Combined Approach to Emotion Detection in Suicide Notes', recently published in the Biomedical Informatics Insights Supplement. We asked Dr Grouin to tell us about the background of this paper.
To start please tell us about the challenge this year. Why did you decide to become involved, and what goals did you and/or your team expect to accomplish?
In our team, we consider evaluation, in particular the evaluation paradigm as instantiated in open collaborative evaluation campaigns working on common data, as a powerful means to support research. Evaluation campaigns speed-up the information exchange, help establish a common ground for results comparison and are a powerful incentive for researchers to reach beyond the accepted average performance level or address new areas. They also contribute to creating bridges between the different research communities, thus fostering interdisciplinary research.
In addition to organizing evaluation campaigns, our team also participates regularly in evaluation campaigns in natural language processing and bio-informatics. Our goal for this participation was to test our algorithms on real data with human validated gold-standards, trying to achieve the best possible results. The exploratory nature of task: suicide notes analysis and its potential development into helping diagnose suicide recurrence were extra incentives for us. While working on a parser, it is much more rewarding when you know that one day; it may contribute to save a life.
In writing this paper what were the particular challenges you faced? How did you overcome these challenges?
The hardest challenge we faced was the noisiness of the training data and the lack of information about the annotation model that had been used to build the gold standard. The participants in the campaign had only the training data available, no annotation guidelines, and no kappa statistics about the annotations. Furthermore, the different types of annotation had widely varying sizes in the training data, for the less represented annotation we did not have enough information available to build a reliable model from the training data. We decided to use this disadvantage to our favour by precisely describing and mapping the noisiness of the data, thus establishing clear scientific grounds on which to assess the merits of our approach.
What has been the major benefit for you in the work discussed in your article? How has it contributed to our knowledge of the area?
The major benefits are several, among them:
The access to the gold standard annotation in opinion mining and sentiment analysis. Annotated data is very scarce and difficult to get, so participating in this evaluation has been an occasion to gain access to rare reference data.
Joining the group of people working on suicide notes at its creation, as it was the very first campaign on the topic.
Testing our approach on real data with a real application in mind
As many of the articles appearing in the supplement are quick to acknowledge, suicide is a distressingly common cause of death particularly among younger people. Has this work changed your view of suicide: do you find yourself more or less understanding or sympathetic of people who commit suicide and those they leave behind?
Particularly at the beginning of the campaign, we were hit by the reality of the tragedies reported in the notes when we started opening the files and looking at the material we were supposed to analyse. What was most striking was the everyday-life details which were present in the notes. This made us realize that suicide can happen to anybody and that it can be an act committed by people who give all the signs of a complete recovery and well-being, and it is not as people generally think, restricted to those who give the signs of extreme distress.