“I enjoyed writing. Perhaps it was because I hardly heard the sound of my own voice. My written words were my voice, speaking, singing, … I was there on the page” – Jenny Moss
We have a natural desire for expressive writing to hear the voice deep inside ourselves in difficult times. Although previous studies have proven therapeutic effects of expressive writing, most of them studied the activity in controlled labs where the writing was guided by a researcher. We think that therapeutic expressive writing happens spontaneously in the real world as well. We focus on spontaneous expressive writing on CaringBridge, an online platform for people to write and share their or their beloved ones’ health journeys and get support. Our goal was developing a computational model to infer whether a post does or does not contain expressive writing in order to help people get more benefit from using online health communities like CaringBridge.
One major challenge we encountered to achieve this goal is that there is no past data on therapeutic expressive writing in the wild. To address this challenge we thought about how we could adapt expressive writing data that was collected in the lab. We looked at 47 past lab studies and what they could tell us about expressive writing. Turns out that the writing that was counted as “expressive” in these studies, shared some common characteristics: it used emotion and cognitive words a lot more than the writing that was not “expressive”. We used a clever statistical model (more details in the paper) to look at each CaringBridge post and tell us how much it matched those characteristics. The research team also looked at 200 posts ourselves to see how often our model would come to the same conclusion as the research team as to whether a blog post constituted “expressive writing”. We agreed about 67% of the time, so there’s obviously a lot of room for improvement (we assume that humans are generally right and that the algorithm needs to improve how well it recognizes these posts).
Despite the limitations of the model, it provides the first ever opportunity to understand how often expressive writing may occur in the wild. We applied our model to the dataset of 13 million CaringBridge journal posts and inferred 22% were expressive and 78% were not expressive. This provides evidence for spontaneous expressive writing in the wild.
To sum up, our paper has three contributions. First, it demonstrates a way to use aggregated empirical data. In cases where no data are available, we could use common characteristics reported in past studies to study the group we are interested in, as we did in the paper. Second, it provides a baseline model to infer expressive writing and to be improved upon. Future research could use more sophisticated features and models by constructing a gold standard dataset or transferring knowledge from a related task that has already been learned. Third, it identifies expressive writing as a potential measure for online health communities. How much an individual engages with spontaneous expressive writing not only reveals their current writing practices, but also reveals the difficult times they are going through. Online health communities can then target their messaging by sending emotional support to those in difficult times and providing writing tips to those who are less expressive so that people can gain the most benefits from their writing.