Media Reporting and Suicide

Natural language processing evaluation and monitoring

Dr. Ray, Yu Cheng Hsu

Overview

  • Project Task
  • Project Background
    • Suicide
    • Suicide reporting and its impact
  • Key Technical Skills
    • Overview of natural language processing
    • Large language model (LLM)
    • Instructing LLM
    • Chain of thought

Project task

Project task

Does a machine learning model have the potential to evaluate suicide news reports?

  • Specific Tasks
  1. Coding Hong Kong local suicide news reports
  2. Developing an LLM prompt to examine whether an LLM can evaluate suicide news reporting style

Suicide

Suicide

  • Global Statistics

    • ~800,000 deaths annually (WHO, 2023)
    • Suicide is the 4th leading cause of death among 15-29-year-olds
  • Ideation-to-Action Framework

    Klonsky and May (2015) proposed the framework from ideation (suicidal thoughts) to action (behaviors).

  • Key Risk Factors

    • Psychological factors: Depression, Schizophrenia
    • Socioeconomic factors: Relationships, access to lethal means
    • Biological factors: Cancer

Biological perspective of suicide

  • Control of Impulsive Behavior
    • Associated with the prefrontal cortex (Mann 2002).
  • Heritability

Implications for suicide prevention

From a population-level perspective, interventions targeting the whole community usually focus on the following aspects:

  1. Means control
  2. Health promotion

Discussion

Have you seen any suicide prevention efforts in your daily life?

Overview of suicide in Hong Kong

Suicide rate in Hong Kong

  • Hong Kong’s age-standardized suicide rate: 9.5 per 100,000 people
  • The historical peak was in 2003 during the SARS outbreak and financial crisis
  • Heterogeneity within subgroups

Hong Kong suicide rate

Hong Kong suicide rate

Media Reporting on Suicide

Media Impact on suicide

  • Arguments on Media Effects

Media reporting can influence suicidal behavior through imitation, protection, or neutrality.

There are arguments about different effects:

  • Papageno effect (Positive effect)
  • No effect
  • Werther effect (Negative effect)

Papageno effect

  • Definition Media stories of coping, hope, and recovery can reduce suicidal ideation and encourage help-seeking.

  • Evidence

    • Niederkrotenthaler et al. (2010)
      • Short-term decrease in suicides
      • Reporting with coping and encouraging content
    • Carmichael and Whitley (2019)
      • Reporting with treatment information
  • Counterarguments A systematic review (Sisask and Värnik 2012) does not confirm the effect.

Werther effect

  • Definition Increased suicides following detailed, sensationalist media reports, especially of celebrities. This promotes imitation via social learning theory.

  • Evidence

    • Promoting suicide methods: Chan et al. (2005), P. Yip et al. (2022)
    • Imitating celebrities: Hegerl et al. (2013), P. S. Yip et al. (2006)

Debating on the Effects

  • What is the Effect?
    • Can we establish a causal relationship between media reporting and suicide?
    • Individual studies show positive/negative effects, but what then?

Discussion

  • Have you read any local suicide news?
  • What did you feel after reading it?
  • What have you learned from the news?

Responsible reporting style

WHO guidelines on ethical reporting

The news media’s impact on suicide depends on how it is reported. A responsible reporting style is needed.

  • WHO (2023) developed guidelines for good reporting styles.

Dos

  • Provide
    • helpful information
    • facts on suicide and prevention efforts
  • Acknowledge that reporters may be affected
  • Report how to cope and seek help
  • Exercise caution with
    • Celebrity suicides
    • Interviewing bereaved family and friends

WHO responsible reporting guideline

WHO responsible reporting guideline

Don’ts

  • Content
    • Methods
    • Details
  • Style
    • Over-simplifying reasons
  • Language use
    • Sensational language
    • Romanticizing
  • Additional content
    • Visual and audio content
    • Social media links

WHO responsible reporting guideline

WHO responsible reporting guideline

Assessing the Quality of News Reporting

  • John et al. (2014) developed a scale (PRINTQUAL) to quantitatively evaluate the quality of reporting suicide news
  • Jarvis et al. (2025) proposed a web version for online news reporting
  • Challenges in the online media year
    • Different modality
    • Social media
    • Comment threads

Natural language processing

Natural language processing

Definition A field using computers to understand and analyze human languages.

You might also hear the term computational linguistics, which uses quantitative approaches to study human linguistics, and most of the time both terms are interchangeable.

Discussion

If a natural language processing task can fall into a traditional (or even deep learning) scope, then why is it so special that it is usually discussed separately?

Scope of NLP We can still use machine learning taxonomy to categorize tasks:

  • Unsupervised learning
    • Encoding language
    • Ranking/Searching documents
  • Supervised learning
    • Classification of parts of speech, sentiment, etc.
  • Generative models
    • Machine translation
    • Text-to-speech
    • Chatbots

Why modeling text is so different

  • Modeling perspective
    • Dynamic input shape
    • Context awareness
    • Training time
    • Evaluation
  • Data perspective
    • Ambiguity
    • Dirty data

For the rest of the slides, we will discuss NLP in these three aspects

  1. Data processing
  2. Modeling
  3. Evaluation

Our whole universe was in a regression

We start from a regression setting, and we define the input and outcome pair \((x_i,y_i)\) with total \(n\) observations, where

  • \(x_i = [x_{i1}\dots x_{ip}]\) : feature vectors with size \(p\), and
  • \(y_i\) : outcome vectors:
    • \(y_i \in [0,1] \rightarrow\) Logistic regression, other classification methods
      • Sentiment analysis
      • Spam detection
      • Language Identification
    • \(y_i \in \mathbb{R} \rightarrow\) other GLMs
      • Product rating

Fantastic Features and where to find them

An intuitive way is to set the p-th feature of document i as

\[\begin{align*} x_{ip} & = 1 \text{ if the i-th document contains word "XXXX"} \\ & = 0 \text{ if the i-th document does not contain word "XXXX"} \end{align*} \]

There are also other alternatives

\[\begin{align*} x_{ip} \in \mathbb{Z} \text{ How many times that "XXXX" occurs in the document i } \end{align*} \]

Discussion

  • What is the implied distribution of \(x_{ip}\)
  • What is the limitation of regression approach

Then neural network expansion started. Wait..

The key questions at this moment lies in two aspects

  • Understanding the context
    • Recurrent neural network (RNN)
    • Long-short term memory (LSTM)
    • Transformer
    • LLM
  • Better feature engineering
    • Autoencoder
    • Multi-layer perceptron(MLP)
    • convolution neural network (CNN)

Multi-layer perceptron(MLP)

Adjustment of input

  • Treat text as a time-series \[ \begin{align} & x_{ijp} = 1 \text{ if the j-th words in i-th } \\ & \text{document contains word in index p} \end{align} \]

  • We need to create a dictionary index \(\{p:"XXX"\}\)

stateDiagram 
i : Input text

state i{
T0 : Text A1
T1 : Text A2
}
r : Hidden layers
state r{
RNN1 : Node1
RNN2 : Node2
RNN3 : Node3
RNN4 : Node4

}
o : Output layers

state o{
o1 : Output layer 1
o2 : Output layer 2
}
 T0 --> RNN1
 T1 --> RNN1
 T0 --> RNN2
 T1 --> RNN2
 T0 --> RNN3
 T1 --> RNN3
 T0 --> RNN4
 T1 --> RNN4
 RNN1 --> o1
 RNN2 --> o1
 RNN3 --> o1
 RNN1 --> o2
 RNN2 --> o2
 RNN3 --> o2

RNN and LSTM

  • What did MLP and CNN not achieve
    • Context
    • Amount of parameter
  • Solution
    • Asking nodes to memorizing context
    • Using this Memory as an additional input

stateDiagram 
i : Input text

state i{
T0 : Text A1
T1 : Text A2
T2 : Text A3
}
r : Hidden layers
state r{
RNN1 : Node1
RNN2 : Node2
RNN3 : Node3
}
o : Output layers

state o{
o1 : Output layer 1
o2 : Output layer 2
}

 T0 --> RNN1
 T1 --> RNN2
 T2 --> RNN3

 RNN1 --> o1
 RNN2 --> o1
 RNN3 --> o1

 RNN1 --> o2
 RNN2 --> o2
 RNN3 --> o2

 RNN1 --> RNN2
 RNN2 --> RNN3


Attention, transformer and LLM

  • Attention mechanism introduced in Vaswani et al. (2017) Attention is All You Need, which uses self-attention mechanisms to process input data efficiently.

Attention mechanism

Attention mechanism
  • Instead of memorizing previous state, attention mechanism dynamically calculates the importance of each input based on the context.

stateDiagram 
i : Input text

state i{
T0 : Text A1
T1 : Text A2
T2 : Text A3
}

e : Embedding
state e {

Query1 : Query A1
Key1 : Key A1 
Key2 : Key A2
Key3 : Key A3
Value1 : Value A1
Value2 : Value A2
Value3 : Value A3
}

w : Attention Weight
state w {

weight1 : Weight A1
weight2 : Weight A2
weight3 : Weight A3

}


 T0 --> Query1
 T0 --> Key1
 T0 --> Value1
 T1 --> Key2
 T1 --> Value2
 T2 --> Key3
 T2 --> Value3

  Query1 --> weight1
  Key1 --> weight1

  Query1 --> weight2
  Key2 --> weight2

  Query1 --> weight3
  Key3 --> weight3

  weight1 --> output
  Value1 --> output

  weight2 --> output
  Value2 --> output

  weight3 --> output
  Value3 --> output

All I need is data

Training an NLP model requires a huge amount of data. For example, OpenAI revealed (Brown et al. 2020) that the majority of the training data sources are English data from:

  • Web pages
  • Social media and forum posts
  • Books
  • Wikipedia
  • User feedback (for GPT-4 and onwards)

Discussion

What are the implications of these data sources in terms of

  • Performance
  • Privacy

Pre-processing data

At the first step of data analysis, pre-processing is always the key to quality model and better performance.

  • Cleaning data
    • Pre-processing issue
  • Encoding text into vector space
    • Tokenization
    • Embedding
  • Labeling data

Cleaning data

  • Mannual typo, non-text elements
  • Stopwords
    • Function word (e.g. articles, pronouns, conjunctions, …)
    • Overly used verb
  • Declension (Tense, gender, plural)

Discussion

  • Why can LLM accept our typos?
  • If LLM can comprehend typos, then do I need to clean data?

Tokenizing text

Tokenize means breaking down sentences to small pieces of words

A straight forward way is word tokenization

New public housing with enlarged toilets and kitchens for the convenience of wheelchair users.

新公屋廁廚加大便利輪椅戶

  • Sentence segmentation
    • English: 14 tokens
      • rule-based could work nearly perfect
    • Chinese: ?? tokens
      • Dynamic programming approach
      • Deep-learning approach

Subword Tokenization

Encoding based on the word are potentially prone to

  • Mis-spelling
  • Hard to cater to grammatical variations

Subword Tokenization Breaking word into some smaller subwords, in addition to address above issue

  • Higher chance to guess out of dictionary terms

Subword tokenization from Tiktokenizer

Subword tokenization from Tiktokenizer

Embedding

Count-based embedding

  • Sentence level: we create a vocabulary (actually token) corpus and create an adjacency matrix \(|V|\times |V|\) where
  • Each element \(v_{ij}\) means how many times word \(i\) and \(j\) is next to each other (1-gram)

Embedding from Jurafsky and Martin (2025)

Embedding from Jurafsky and Martin (2025)

Word2vec

Another brutal way is let the machine embedding themselves through self-supervise learning and auto-encoder.

  • Input: Sentence + word
  • Output: Word next to them within 2 words

Discussion

It is obvious that these tokens were represented in high dimension and sparse format. What are the potential implication if we directly used these embedding for analysis?

Large language model

Why or Why not LLM

Discussion

  • If doing a data science job on language is so troublesome, why not just throw away those old fashioned methods and embrace LLM?

Techniques of customizing LLM

Count-based embedding

  1. Prompting
    • Instructing LLM the task
  2. Retrieval-Augmented Generation (RAG)
    • Giving external reference
  3. Fine-tuning
    • Retraining model

Choice of RAG and fine-tuning from Towards Data Science

Choice of RAG and fine-tuning from Towards Data Science

Consideration of different customization

  • Budget GPT-5 quotation price at 2025-09-01

  • Scalability Does the model need to be very generalizable

    • Small, precise model v.s. large generalizable model

Prompt engineering

  • Structured Prompts
  • Chain-of-Thought (CoT)
  • Few-Shot
    • Provide a few examples of compliant/non-compliant news.
  • Divide and Conquer

Structured prompt

Overall, structured instruction provides steadier and more reliable outcomes

Structured prompt

  • Role

  • Task

  • Context

  • Format

    Few-shot learning

  • Examples

Xu et al. (2024) proposed a structured prompt to address prediction tasks in mental health-related topics using LLMs:

\[ \begin{aligned} \text{Prompt} = \text{Analyzed text} + \\ \text{Outcome and context} + \text{Question} + \\\text{Constraints} \end{aligned} \]

Nori et al. (2024) argues that few-shot learning does not always get better results

Discussion

  • Why does
    • structured prompt
    • few-shot learning
    usually have better outcome?

Chain of thought (Wei et al. 2022)

Idea Provide illustration of how you solve the problem in the prompt to let LLM understand how to tackle the problem

Prompt without CoT

Jane was born on the last day of February in 2020. Today is her 15-year-old birthday in the common law. What is the date tomorrow in MM/DD/YYYY?

Prompt with CoT

Example question: Jane was born on the last day of February in 2020. Today is her 15-year-old birthday in the common law. What is the date tomorrow in MM/DD/YYYY? Example Answer: In common law practice, for leap year babies the legal birthday in common years is March 1st, and 2020+15=2035. So today is 03/01/2035 and tomorrow is 03/02/2035 Please answer the following question: Question: Ray was born on the last day of February in 2016. Today is his 15-year-old birthday in the common law. What is the date tomorrow in MM/DD/YYYY?

Method gpt-4o-mini (version:2024-07-18) Grok3 Grok 4 Standard answer
W/ CoT 03/01/2035 09/03/2025 03/02/2035 03/02/2035
CoT 03/02/2031 03/02/2031 03/02/2031 03/02/2031

How does text being generated from LLM

  • Greedy decoding - local optimal strategy

Choose the highest token (\(w_t\)) given input \(\textbf{w}_{<t}\) (the prompt and your previous output)

\[ \hat{w_t} = \arg\max_{w \in V} P(w | \textbf{w}_{<t}) \]

  • Random sampling - probabilistic approach

Instead of returning optimal solution, we allow the flexibility of generation through probability sampling.

\[ w_{t+1} \sim P(w_{t+1} | \textbf{w}_{<t}) \]

Discussion

Neither greedy decoding nor random sampling was used in LLM, what are the disadvantages of these algorithms

Temperature sampling

  • Adjusting the chance of being selected in the random sampling.

Recalling \(w \sim \text{Mult}(1,p)\)

Therefore, we can using a bayesian to influence this distribution.

  • Some thoughts
    • Uniform dirichlet prior => It will be normalized at the end
    • Informative prior => how to generate informative prior?

Solution: Exponentially scaling the logit with factor \(\frac{1}{\tau}\)

Temperature factor from Jurafsky and Martin (2025)

Temperature factor from Jurafsky and Martin (2025)

Top-p and Top-K sampling

Top-K

Random sampling based on the first \(K\) tokens

  1. Identifying the \(K\) highest probability tokens
  2. Drawing samples from these K tokens proportional to their probability

Top-p

Random sampling based on the cumulative distribution

  1. Identifying the highest \(n\) probability tokens that their cumulative probability is less than \(p\)
  2. Drawing samples from these \(n\) tokens proportional to their probability

Self-consistency CoT and Tree of Thought (ToT)

Self-consistency CoT Wang et al. (2022) Random forest of LLM

  1. Generate more possible outcomes
  2. Majority voting of the results

Tree of Thought Yao et al. (2023) Decision tree

  1. LLM decomposes the task into several strategies and develops strategies
  2. Evaluating the output whether it has answered the original question

CoT family from Yao et al. (2023)

CoT family from Yao et al. (2023)

Evaluation and implementation

Evaluating NLP model performance

Evaluation of NLP largely depends on the task itself. The overall idea is modifying the problem back to MLE/classification-like evaluation

Commonly seen indicators

  • Classification and regression task
    • Traditional metrics
  • Machine translation
    • Bilingual evaluation understudy (BLEU) score
  • Generation
    • Perplexity
  • Summarization
    • ROUGE-N

Perplexity

Idea: Model \(\theta\) has the highest probability to get the (i.e. \(\arg\max_{\theta} p_{\theta}(w_{1:n})\))

Issue: MLE approach subject to the length of a text, and in favor of short text

Modification: Perplexity function (Geometric mean of the probability)

\[ \text{Perplexity}_{\theta}(w_{1:n})=\Biggl(\prod_{i=1}^n\frac{1}{P_{\theta}(w_i|w_{<i})}\Biggr)^{(1/n)} \]

Implementing NLP task

Discussion

What are the consideration of selecting model to address your NLP task?

  • Output complexity
    • Classification, regression problem
    • Generation tasks
  • Input complexity
    • Scope of the input text
    • Domain knowledge involved
  • Data availability
    • Amount of available data
  • Cost
    • Financial feasibility
    • Time feasibility

Ethical and Safety Issues

Bias

Data source is the major contributor of the model bias. The model gets the inference based on your input text, which could convey social and cultural bias.

Hallucination

It is prominent in LLM that it generates text based on a probabilistic way instead of facts and references. It can create new things that do not exist at all.

Safety use

Sensitive content, violence, sex, death, could occur if you intentionally instruct them to do so. Safely using these tools is critical

ChatGPT, encouraged him to take his own life. Yousif (2025)

ChatGPT, encouraged him to take his own life. Yousif (2025)

Bibliography

Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” Advances in Neural Information Processing Systems 33: 1877–1901.
Carmichael, Victoria, and Rob Whitley. 2019. “Media Coverage of Robin Williams’ Suicide in the United States: A Contributor to Contagion?” PloS One 14 (5): e0216543.
Chan, Kathy PM, Paul SF Yip, Jade Au, and Dominic TS Lee. 2005. “Charcoal-Burning Suicide in Post-Transition Hong Kong.” The British Journal of Psychiatry 186 (1): 67–73.
Hegerl, Ulrich, Nicole Koburger, Christine Rummel-Kluge, Christian Gravert, Martin Walden, and Roland Mergl. 2013. “One Followed by Many?—Long-Term Effects of a Celebrity Suicide on the Number of Suicidal Acts on the German Railway Net.” Journal of Affective Disorders 146 (1): 39–44.
Jarvis, Ruby Rose, Agatha Anet Alves, Kangning Zheng, Monica Hawley, Amanada Marchant, Keith Hawton, Ann John, and Alexandra Pitman. 2025. “Development of the PRINTQUAL-Web Tool for Assessing the Quality of Online News Reporting of Suicide: Adaptation of the PRINTQUAL Tool for Newspaper Reporting.” Crisis: The Journal of Crisis Intervention and Suicide Prevention.
John, Ann, Keith Hawton, Keith Lloyd, Ann Luce, Stephen Platt, Jonathan Scourfield, Amanda L Marchant, Phil A Jones, and Mick S Dennis. 2014. “PRINTQUAL–a Measure for Assessing the Quality of Newspaper Reporting of Suicide.” Crisis.
Jurafsky, Daniel, and James H. Martin. 2025. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, with Language Models. 3rd ed. https://web.stanford.edu/~jurafsky/slp3/.
Klonsky, E David, and Alexis M May. 2015. “The Three-Step Theory (3ST): A New Theory of Suicide Rooted in the ‘Ideation-to-Action’ Framework.” International Journal of Cognitive Therapy 8 (2): 114–29.
Mann, J John. 2002. “A Current Perspective of Suicide and Attempted Suicide.” Annals of Internal Medicine 136 (4): 302–11.
Niederkrotenthaler, Thomas, Martin Voracek, Arno Herberth, Benedikt Till, Markus Strauss, Elmar Etzersdorfer, Brigitte Eisenwort, and Gernot Sonneck. 2010. “Role of Media Reports in Completed and Prevented Suicide: Werther v. Papageno Effects.” The British Journal of Psychiatry 197 (3): 234–43.
Nori, Harsha, Naoto Usuyama, Nicholas King, Scott Mayer McKinney, Xavier Fernandes, Sheng Zhang, and Eric Horvitz. 2024. “From Medprompt to O1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond.” arXiv Preprint arXiv:2411.03590.
Pedersen, Nancy L, and Amy Fiske. 2010. “Genetic Influences on Suicide and Nonfatal Suicidal Behavior: Twin Study Findings.” European Psychiatry 25 (5): 264–67.
Sisask, Merike, and Airi Värnik. 2012. “Media Roles in Suicide Prevention: A Systematic Review.” International Journal of Environmental Research and Public Health 9 (1): 123–38.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30.
Wang, Xuezhi, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2022. “Self-Consistency Improves Chain of Thought Reasoning in Language Models.” arXiv Preprint arXiv:2203.11171.
Wei, Jason, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. 2022. “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” Advances in Neural Information Processing Systems 35: 24824–37.
WHO. 2023. Preventing Suicide: A Resource for Media Professionals, 2023 Update. World Health Organization.
Xu, Xuhai, Bingsheng Yao, Yuanzhe Dong, Saadia Gabriel, Hong Yu, James Hendler, Marzyeh Ghassemi, Anind K Dey, and Dakuo Wang. 2024. “Mental-Llm: Leveraging Large Language Models for Mental Health Prediction via Online Text Data.” Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 8 (1): 1–32.
Yao, Shunyu, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. “Tree of Thoughts: Deliberate Problem Solving with Large Language Models.” Advances in Neural Information Processing Systems 36: 11809–22.
Yip, Paul SF, King-Wa Fu, Kris CT Yang, Brian YT Ip, Cecilia LW Chan, Eric YH Chen, Dominic TS Lee, Frances YW Law, and Keith Hawton. 2006. “The Effects of a Celebrity Suicide on Suicide Rates in Hong Kong.” Journal of Affective Disorders 93 (1-3): 245–52.
Yip, Paul, Yunyu Xiao, Yucan Xu, Evangeline Chan, Florence Cheung, Christian S Chan, and Jane Pirkis. 2022. “Social Media Sentiments on Suicides at the New York City Landmark, Vessel: A Twitter Study.” International Journal of Environmental Research and Public Health 19 (18): 11694.
Yousif, Nadine. 2025. Parents of Teenager Who Took His Own Life Sue OpenAI — Bbc.com.” https://www.bbc.com/news/articles/cgerwp7rdlvo.