What even is AI justice? Proposing a Theory of Justice at FAccT 24

By on

In 2022, Politico reported that Crisis Text Line (CTL)–a non-profit SMS suicide hotline–-used one-on-one crisis conversations to train a for-profit customer service chatbot. In response, CTL maintained that they did not violate reasonable expectations—hotline users “consented” to a lengthy Terms of Service (TOS), which specified that CTL could use data for business purposes. 

This uninformed consent procedure and irresponsible use of data is an obviously egregious move in many research circles, such as my own. However, in the eyes of the law, CTL, and many others, CTL didn’t violate any reasonable user expectations. The consensus was that, while TOS consent procedures are imperfect, it’s the cost of having free technology services. In my paper, I ask: are we still okay with paying this price?

The price here is justice—our ability to give everyone their due right. In offline settings, injustice evokes the image of human rights violations and massive inequities. However, the picture is less clear in AI settings. As we all contribute our data, knowledge, and time to AI systems, what do we deserve as a matter of justice? To answer this question, we formulated a precise theory of justice that captures current tensions between users and tech companies in AI/ML settings.

To quote John Rawls, 

“A theory however elegant and economical must be rejected or revised if it is untrue; likewise laws and institutions no matter how efficient and well-arranged must be reformed or abolished if they are unjust.”

In other words, a theory of justice is an attempt to appropriately represent the state of the world, articulate societal values, and inform future change. When we have a true theory of justice, we do not need to rely on subjective moral intuitions. Rather, we can agree upon an ethical compass for building our laws and institutions.

Our paper proposes data agency theory (DAT). Data agency is one’s capacity to shape action around the data they create. For example, individual privacy settings on Google increase one’s data agency by allowing users to opt into (i.e., consent to) data sharing with third-party advertisers. Data agency theory argues two premises. First, consent procedures outline data agency systematically and, therefore, are institutional. Second, inspired by justice scholars such as Rawls and Young, this institutional way of outlining data agency is a matter of justice. In sum, justice in a predictive system demands considering how institutional routines (i.e., consent procedures and terms of services) transform agency at a group level. Concisely, data agency is a contributor to justice and a product of consent policies in a predictive system.  

DAT is a data-centric theory of justice that directly translates to next steps for achieving ML & AI ethics goals. As ML/AI systems need larger datasets, ethicists in the field have made numerous calls for better data management throughout the pipeline—involving questions of consent, data storage, and responsible use of predictive outcomes. However, these calls have been mostly unanswered; many social media sites still use dense Terms of Service agreements that have been criticized for over a decade. Here, we argue that this lack of action to increase a user’s data agency is not just a moral imperfection, it is injustice.

 By raising the stakes of problematic consent procedures, we hope to catalyze action. In the paper, we reimagine consent procedures in two salient ML/AI data contexts: (1) social media sites and (2) human subjects research projects. For example, we imagine affirmative consent on social media sites, sustained efforts by researchers to reaffirm consent, and the ability to withdraw one’s data from benchmark datasets. AI justice demands consent procedures that proactively solve systemic information and power gaps around one’s data. This paradigm shift is crucial to evaluating current consent procedures and generating better ones.

What does a just world with AI look like? How can we evaluate the justice of our AI systems before they cause material harm? Check out our paper for discussions of these questions, or come to my talk at FAccT during the Towards Better Data Practices session at 3:45pm (UTC-3) on June 4th.

PhD Lessons from Running and Escaping Rooms

By and on

The PhD students at GroupLens have a variety of hobbies! From knitting to playing video games, we all have non-research activities that contribute to our lives. In this article, we asked two PhD students, Leah Ajmani and Alexis Tarter, how their hobbies have helped them become more successful researchers. What does distance running and working an escape room have to do with research? Read below to find out

Leah and her dog, Yogi, finishing their first half-marathon

Lessons from Running

As a very unathletic kid, I didn’t pick up running until I was a PhD student. I’ve run numerous community races in the past two years, including a half marathon! Here’s what I’ve learned:

There’s no such thing as “junk miles”

Sometimes our runs suck. Similarly, sometimes our writing sucks, our research is going slow, or we have to miss a deadline. However, there is no such thing as junk mileage in research. All research, even the research that doesn’t end up in papers, builds our capacity to do research. In that sense, it is useful!

You have lots of different paces; the key is to switch between them

You may have heard “it’s a marathon, not a sprint” about your PhD. The advice here is to go slow and not burn out. Running has taught me to extend the metaphor one step further. I have a marathon pace, but I also have a sprinting pace. I even have a 5k and 10k pace for things in the middle! The key to not burning out is to use the right pace at the right moment.

Think about how you would run a marathon. You may have a “marathon pace,” a goal for each mile to run a certain marathon finish time. For example, my goal is to run a half-marathon in under 2 hrs and 15 min. In theory, I would need to run a 10-min mile 13.1 times. In practice, though, my first few miles would be >10 minutes so I can get into a rhythm. Then, each mile gets progressively faster so I can “ramp up.” The idea is not to just run slow. It’s to run slow at the beginning so that you can go ham and truly race in those last few miles. In a PhD, paper deadlines require you to have enough energy left in your tank to race those last few miles, so be judicious with your pacing.

Your mind will want to quit early and often

In running, we say, “Your mind will want to quit before your body does.” Obviously, if you’re injured or battling physical limitations, you should STOP RUNNING. But if the reason you want to stop running is because your mind is telling you to quit, it’s probably best to keep going.

In research, I use the “quit-three rule.” If I’m reading a paper, writing, or even running, I tell myself that I have to think, “I should quit doing this,” three distinct times before I’m allowed to give up on the task at hand. This rule gives me the ability to pivot off of things that simply will not happen at the moment while still building the resilience to do the things I want to. It’s not perfect! Sometimes, I’m phoning in the task, but it’s a good way to practice focus. 

Alexis and friends finishing an escape room!

Lessons from Escape Rooms

I once worked at an escape room, and it turns out it was helpful for my PhD (and it was more entertaining than watching TV). 

Ask for help earlier than you think you need to.

Often, groups would come to complete an escape room and be overconfident about their abilities. Maybe they had completed plenty of rooms in the past. Maybe they really enjoyed solving puzzles. Maybe they just thought they were particularly smart. But more often than not, those groups would perform worse than others. Why? Because they didn’t ask for help early and burned precious time on simple problems. 

The same flawed thinking can occur in a PhD program. Rather than admitting to your advisor or peers that you are stuck, you may be tempted to battle against a problem all by yourself. Don’t be like those overconfident escape room groups! Asking for help and being vulnerable with others can help you tackle a problem and connect with those around you. 

Answers can come from unlikely places.

“Wait…I think I’m Janet!” said my colleague. Turns out a fantastic group had been calling the escape room employee “Janet”, an all-knowing being from the show The Good Place. While, unfortunately, none of us can ask a not-a-girl, not-a-robot character the answer to any question in the universe, we can find answers in places we least expect them. 

The same is true during a PhD program. While courses and your advisors are key sources for support, engaging with experiences that bring you joy is also vital. Maybe the family member you are trying to describe your project to can help you find a way to frame your research question. Maybe the crafting event on campus introduces you to another student with whom you can collaborate. Maybe an intramural pickleball game clears your mind, and you discover the next direction for your dissertation topic. A PhD program is a time to explore not only intellectually but also personally.

Communication is key.

Escape rooms are all about effective communication, whether it is joining hands to close a circuit, yelling out numbers from around a corner, or describing what’s inside a hidden room. It is astounding how many times I’ve seen a teammate find the key to solving a puzzle and put it silently in their pocket. As one of my favorite characters states, you have to “talk it through, as a crew.”Stede Bonnet, Our Flag Means Death. 

And, as I am sure you’ve noticed the trend by now, the same applies to a PhD program! Unfortunately, some lab and campus cultures discourage meaningful connections and collaborations. It is crucial in those situations to find people you can communicate with such as a support office or friends and loved ones. However, in all situations, it is how we talk to ourselves and with others that often determines how successful we can be.

Whether it’s running, escaping, or even knitting, inspiration is everywhere for being a successful researcher! Which hobbies have helped you as a PhD?

Reflecting on Consent at Scale

By and on

image by Freepik

In the era of internet research, everyone is a participant. Picture this…

A PhD stood at the front of a crowded conference hall.  They’d just presented their paper on social capital in distributed online communities. As the applause settled, an audience member scuttled to the microphone, eager to ask the first question.

A professor from University College. Thank you for the great talk. It was refreshing to attend a talk with such rigorous methods. You scrapped data from so many different subreddits and made such a compelling argument for how these results will generalize to other online spaces. My question is less about the research and more about your experiences with data contributors. How did the various subreddit community members react when you talked to them about this exciting work?

What kind of question is this? The PhD thinks to themself. It’s not feasible to get consent from every user. We got an IRB exemption, got approval from subreddit moderators, and followed all the API terms of use and regulations for researcher access. Do other researchers really ask for consent at scale? Did I get consent…?

You may be in a similar situation now! Using social media data for research is a common method that has massive potential for large-scale analyses in both quantitative and qualitative research. However, it can be frustrating to simultaneously hold individual, affirmative consent as the golden standard and recognize its limitations as a viable option for many researchers. To that end, we’ve made a reading list about getting individual consent at scale, particularly in research settings. We hope this reading list serves as a provocation for discussion rather than a list of solutions to this problem.

Normative Papers

1. The “Ought-Is” Problem: An Implementation Science Framework for Translating Ethical Norms into Practice. Our resident ethicist (Leah Ajmani) loves this paper so much! It basically uses informed consent as a case to describe the larger translational effort needed to move from normative prescriptions to actual implementation.

2. Yes: Affirmative Consent as a Theoretical Framework for Understanding and Imagining Social Platforms. A contemporary classic in CHI,  this paper does a really good job of describing affirmative consent as the ideal situation but then using the “ideal” for explanatory and generative purposes. There is merit to having an ideal, even if it is not perfectly attainable!

HCML Papers

We’re obviously biased because she’s a GroupLenser, but Stevie Chancellor does a great job at describing consent at scale as an ethical tension rather than a “must-have.” It is something researchers need to navigate with justified reasoning.

1. A Taxonomy of Ethical Tensions in Inferring Mental Health States from Social Media

2. Toward Practices for Human-Centered Machine Learning

Design Papers

These papers are both critical of current consent design and do a great job of discussing alternatives, even if it is outside of a research context.

1. (Un)informed Consent: Studying GDPR Consent Notices in the Field

2. Limits of Individual Consent and Models of Distributed Consent in Online Social Networks

From grappling with moral nuance to designing better consent procedures, these readings can take our discussions of individual consent at scale from a theoretical ideal to an operationalizable goal. So, let’s embrace difficult discourse about how to move forward and continue to traverse the space between the idyllic and the feasible. Comment or tweet which papers you would add to this list!

Wordy Writer Survival Guide: How to Make Academic Writing More Accessible

By and on

As GroupLensers received CHI reviews back, many of us were told our papers were “long,” “inaccessible,” and even “bloated.” These critiques are fair. Human-Computer Interaction (HCI) research should be written for a broad and interdisciplinary audience. However, inaccessible writing can be hard to fix, especially if it is your natural writing style. Here’s some advice from GroupLens’s very own Stevie Chancellor (computer science professor, PhD advisor, and blogger about everything writing-related)

Sentence Structure

  • Sentence Length: How long are your sentences, and how many comma-dependent clauses are going on per paragraph? Long sentences are more complicated to read and, therefore, harder to parse. Some people say to split any sentence with more than 25 words. Eh. 30-35 should be fine for academic writing, but longer is worser. 
  • Commas, Commas, Commas: Comma-separated clauses are painful to follow. A comma is a half-stop in writing and momentarily pauses trains of thought. While some commas are grammatically necessary (see the one that follows this parenthesis), too many commas chop your sentences into pieces. Therefore, too many commas interrupt your reader’s comprehension of your idea.
  • Sentence Cadence: How are you varying your cadence of the writing? Do you use short sentences, then longer sentences, and vary the structure and placement of comma clauses? Using ONLY long sentences gets repetitive and, therefore, more challenging to read.
  • Topic Sentence and Transition Clarity: Topic and “transition” sentences should be crystal clear in their simplicity. Interior sentences can be more elaborate/have more “meat.”

Word Choice

  • Simple Words are Better: Are we using as simple words as possible to describe what we mean? For example: do not write “utilize” as a synonym for “use”. Just say “use”. 
  • Active vs. Passive Voice: Are you overly using the passive voice and not active? Passive voice is occasionally correct, especially when needed to soften a claim (e.g., “Research has suggested that….”). But too much passive voice is hard to read.
  • Filler Words: Look for words that contribute nothing to the idea but make your sentence longer. Adverbs and fluffy adjectives are common culprits of this. Adverbs like “very”, “fairly”, and “clearly” provide almost NO substance to writing but lengthen the sentence.
  • Weasel Words: Inspired by Matt Might, check your writing for “weasel words” that augment the clarity of your sentence. Do you need to say an experiment was “mostly successful, but had limitations?” Or can you say, “The experiment was successful in X and Y with less success in Z”?
  • Citations vs. Names: Be judicious with \citet{} in your writing. Invoking someone’s name is equivalent to inviting that person to a dinner party and forces the reader to pay attention to the “who’s who” of your writing. Who do you want to invite to your home? Remember, you’re in charge of maintaining conversation during the party and providing food for everyone, so be careful who you invite.

Pragmatic Decisions/Actions

  • Read Aloud: Read “dense” or “inaccessible” sections out loud. Say them with your mouth. Long, poorly structured paragraphs become obvious when read out loud.
  • Use a Friend or Colleague To Kill Your Darlings: Friends and colleagues with no emotional connection to the paper are great for removing self-indulgent yet non-essential writing. Ask a friend to read a section to go in and “kill your darlings.”
  • Use AI Tools Judiciously: Tools such as Grammarly Pro, Writefull, or ChatGPT/Bard/LLM du jour can do first passes for wordiness and phrasing. For example, Grammarly Premium provides swaps for too-long phrases (and is free if you have a SIGCHI membership). LLMs can trim your writing by 10%. Just be cautious in the accuracy of the edits and maintain the same tone and argumentation.
  • Ctrl + F Is Your Friend: Recognize your writing “quirks” and ctrl + f to search for and cut them. Stevie’s writing quirks include using adverbs in initial drafts, meaning that searching for “very” and “ly” returns many words to cut.

From managing sentence structure to choosing simple words, these tips can take your writing from “in the clouds” to a reader-friendly and enjoyable experience. Remember, the goal is not just brevity but clarity, ensuring that our work resonates with a broad and interdisciplinary audience. So, let’s embrace these tips, Ctrl + F our way through, and invite our readers to a well-organized and engaging intellectual dinner party. Cheers to more accessible and impactful HCI research!

Page Protection: The Blunt Instrument of Wikipedia

By on

Wikipedia is a 22 year-old, wonky, online encyclopedia that we’ve all used at some point. Currently (2023), Wikipedia has a dizzying amount of information in numerous languages. The English language of Wikipedia alone has over 6 million articles and 40,000 active editors. The allure of Wikipedia articles is that they are highly formatted and community-governed; while anyone can contribute to a Wikipedia article, there’s a vast infrastructure of admins, experienced editors, and bots who maintain the platform’s integrity. Wikipedia’s About page reads:

Anyone can edit Wikipedia’s text, references, and images. What is written is more important than who writes it. The content must conform with Wikipedia’s policies, including being verifiable by published sources […] experienced editors watch and patrol bad edits.”

Our research aims to understand the tension between open participation and information quality that underlies Wikipedia’s moderation strategy. In other words, how does maintaining Wikipedia as a factual encyclopedia conflict with the value of free and open knowledge? Specifically, we look at page protection–an intervention where administrators can “lock” articles to prevent unregistered or inexperienced editors from contributing.

We used quasi-causal methods to explore the effects of page protection. Specifically, we created two datasets: (1) a “treatment set” of page-protected articles and (2) a “control set” of unprotected articles that were similar to a treated article in terms of article activity, visibility, and topic. We then ask: does page protection affect editor engagement consistently?

Our findings show that page protection dramatically but unpredictably affects Wikipedia editor engagement. Above is the kernel density estimate (KDE) of the difference between the number of editors before page protection versus after protection. We evaluated this metric across three time windows: seven, fourteen, and thirty days. Not only is this spread huge, but it also spans both a negative and positive difference. In essence, we cannot predict whether page protection decreases or increases the number of people editing an article. 

Are heavy-handed moderation interventions necessary for a complex platform such as Wikipedia? How can we design these non-democratic means of control to maintain a participatory nature? Check out our paper for discussions on these questions or come to my talk on October 16, 2023 at 4:30pm!