Reflecting on Consent at Scale

By and on

In the era of internet research, everyone is a participant.

A PhD stood at the front of a crowded conference hall.  They’d just presented their paper on social capital in distributed online communities. As the applause settled, an audience member scuttled to the microphone, eager to ask the first question.

A professor from University College. Thank you for the great talk. It was refreshing to attend a talk with such rigorous methods. You scrapped data from so many different subreddits and made such a compelling argument for how these results will generalize to other online spaces. My question is less about the research and more about your experiences with data contributors. How did the various subreddit community members react when you talked to them about this exciting work?

What kind of question is this? The PhD thinks to themself. It’s not feasible to get consent from every user. We got an IRB exemption, got approval from subreddit moderators, and followed all the API terms of use and regulations for researcher access. Do other researchers really ask for consent at scale? Did I get consent…?

You may be in a similar situation now! Using social media data for research is a common method that has massive potential for large-scale analyses in both quantitative and qualitative research. However, it can be frustrating to simultaneously hold individual, affirmative consent as the golden standard and recognize its limitations as a viable option for many researchers. To that end, we’ve made a reading list about getting individual consent at scale, particularly in research settings. We hope this reading list serves as a provocation for discussion rather than a list of solutions to this problem.

Normative Papers

1. The “Ought-Is” Problem: An Implementation Science Framework for Translating Ethical Norms into Practice. Our resident ethicist (Leah Ajmani) loves this paper so much! It basically uses informed consent as a case to describe the larger translational effort needed to move from normative prescriptions to actual implementation.

2. Yes: Affirmative Consent as a Theoretical Framework for Understanding and Imagining Social Platforms. A contemporary classic in CHI,  this paper does a really good job of describing affirmative consent as the ideal situation but then using the “ideal” for explanatory and generative purposes. There is merit to having an ideal, even if it is not perfectly attainable!

HCML Papers

We’re obviously biased because she’s a GroupLenser, but Stevie Chancellor does a great job at describing consent at scale as an ethical tension rather than a “must-have.” It is something researchers need to navigate with justified reasoning.

1. A Taxonomy of Ethical Tensions in Inferring Mental Health States from Social Media

2. Toward Practices for Human-Centered Machine Learning

Design Papers

These papers are both critical of current consent design and do a great job of discussing alternatives, even if it is outside of a research context.

1. (Un)informed Consent: Studying GDPR Consent Notices in the Field

2. Limits of Individual Consent and Models of Distributed Consent in Online Social Networks

From grappling with moral nuance to designing better consent procedures, these readings can take our discussions of individual consent at scale from a theoretical ideal to an operationalizable goal. So, let’s embrace difficult discourse about how to move forward and continue to traverse the space between the idyllic and the feasible. Comment or tweet which papers you would add to this list!

Wordy Writer Survival Guide: How to Make Academic Writing More Accessible

By and on

As GroupLensers received CHI reviews back, many of us were told our papers were “long,” “inaccessible,” and even “bloated.” These critiques are fair. Human-Computer Interaction (HCI) research should be written for a broad and interdisciplinary audience. However, inaccessible writing can be hard to fix, especially if it is your natural writing style. Here’s some advice from GroupLens’s very own Stevie Chancellor (computer science professor, PhD advisor, and blogger about everything writing-related)

Sentence Structure

  • Sentence Length: How long are your sentences, and how many comma-dependent clauses are going on per paragraph? Long sentences are more complicated to read and, therefore, harder to parse. Some people say to split any sentence with more than 25 words. Eh. 30-35 should be fine for academic writing, but longer is worser. 
  • Commas, Commas, Commas: Comma-separated clauses are painful to follow. A comma is a half-stop in writing and momentarily pauses trains of thought. While some commas are grammatically necessary (see the one that follows this parenthesis), too many commas chop your sentences into pieces. Therefore, too many commas interrupt your reader’s comprehension of your idea.
  • Sentence Cadence: How are you varying your cadence of the writing? Do you use short sentences, then longer sentences, and vary the structure and placement of comma clauses? Using ONLY long sentences gets repetitive and, therefore, more challenging to read.
  • Topic Sentence and Transition Clarity: Topic and “transition” sentences should be crystal clear in their simplicity. Interior sentences can be more elaborate/have more “meat.”

Word Choice

  • Simple Words are Better: Are we using as simple words as possible to describe what we mean? For example: do not write “utilize” as a synonym for “use”. Just say “use”. 
  • Active vs. Passive Voice: Are you overly using the passive voice and not active? Passive voice is occasionally correct, especially when needed to soften a claim (e.g., “Research has suggested that….”). But too much passive voice is hard to read.
  • Filler Words: Look for words that contribute nothing to the idea but make your sentence longer. Adverbs and fluffy adjectives are common culprits of this. Adverbs like “very”, “fairly”, and “clearly” provide almost NO substance to writing but lengthen the sentence.
  • Weasel Words: Inspired by Matt Might, check your writing for “weasel words” that augment the clarity of your sentence. Do you need to say an experiment was “mostly successful, but had limitations?” Or can you say, “The experiment was successful in X and Y with less success in Z”?
  • Citations vs. Names: Be judicious with \citet{} in your writing. Invoking someone’s name is equivalent to inviting that person to a dinner party and forces the reader to pay attention to the “who’s who” of your writing. Who do you want to invite to your home? Remember, you’re in charge of maintaining conversation during the party and providing food for everyone, so be careful who you invite.

Pragmatic Decisions/Actions

  • Read Aloud: Read “dense” or “inaccessible” sections out loud. Say them with your mouth. Long, poorly structured paragraphs become obvious when read out loud.
  • Use a Friend or Colleague To Kill Your Darlings: Friends and colleagues with no emotional connection to the paper are great for removing self-indulgent yet non-essential writing. Ask a friend to read a section to go in and “kill your darlings.”
  • Use AI Tools Judiciously: Tools such as Grammarly Pro, Writefull, or ChatGPT/Bard/LLM du jour can do first passes for wordiness and phrasing. For example, Grammarly Premium provides swaps for too-long phrases (and is free if you have a SIGCHI membership). LLMs can trim your writing by 10%. Just be cautious in the accuracy of the edits and maintain the same tone and argumentation.
  • Ctrl + F Is Your Friend: Recognize your writing “quirks” and ctrl + f to search for and cut them. Stevie’s writing quirks include using adverbs in initial drafts, meaning that searching for “very” and “ly” returns many words to cut.

From managing sentence structure to choosing simple words, these tips can take your writing from “in the clouds” to a reader-friendly and enjoyable experience. Remember, the goal is not just brevity but clarity, ensuring that our work resonates with a broad and interdisciplinary audience. So, let’s embrace these tips, Ctrl + F our way through, and invite our readers to a well-organized and engaging intellectual dinner party. Cheers to more accessible and impactful HCI research!

Page Protection: The Blunt Instrument of Wikipedia

By on

Wikipedia is a 22 year-old, wonky, online encyclopedia that we’ve all used at some point. Currently (2023), Wikipedia has a dizzying amount of information in numerous languages. The English language of Wikipedia alone has over 6 million articles and 40,000 active editors. The allure of Wikipedia articles is that they are highly formatted and community-governed; while anyone can contribute to a Wikipedia article, there’s a vast infrastructure of admins, experienced editors, and bots who maintain the platform’s integrity. Wikipedia’s About page reads:

Anyone can edit Wikipedia’s text, references, and images. What is written is more important than who writes it. The content must conform with Wikipedia’s policies, including being verifiable by published sources […] experienced editors watch and patrol bad edits.”

Our research aims to understand the tension between open participation and information quality that underlies Wikipedia’s moderation strategy. In other words, how does maintaining Wikipedia as a factual encyclopedia conflict with the value of free and open knowledge? Specifically, we look at page protection–an intervention where administrators can “lock” articles to prevent unregistered or inexperienced editors from contributing.

We used quasi-causal methods to explore the effects of page protection. Specifically, we created two datasets: (1) a “treatment set” of page-protected articles and (2) a “control set” of unprotected articles that were similar to a treated article in terms of article activity, visibility, and topic. We then ask: does page protection affect editor engagement consistently?

Our findings show that page protection dramatically but unpredictably affects Wikipedia editor engagement. Above is the kernel density estimate (KDE) of the difference between the number of editors before page protection versus after protection. We evaluated this metric across three time windows: seven, fourteen, and thirty days. Not only is this spread huge, but it also spans both a negative and positive difference. In essence, we cannot predict whether page protection decreases or increases the number of people editing an article. 

Are heavy-handed moderation interventions necessary for a complex platform such as Wikipedia? How can we design these non-democratic means of control to maintain a participatory nature? Check out our paper for discussions on these questions or come to my talk on October 16, 2023 at 4:30pm!