Siblings in the Digital Divide: Navigating Communication Challenges and Opportunities for Large Age Gap Relationships

By on

Figure 1. Author and her brother

This photo captured a bittersweet moment in my life. It was taken on my first day when I came to the university, brimming with excitement and anticipation for the journey ahead. At that time, I was 18 years old, and my younger brother was 12 years younger than me — he was only six years old. After that day, I started my own life at Shandong University in Jinan, which is a city about 300 miles away from my hometown. So during the next following years we lived separately, my brother and I had to rely on video and audio calls to stay in touch. 

But it’s still not easy to maintain a close sibling relationship. As the years went by, the distance between us grew, and we missed out on so many precious moments that we could never get back. In fact, not only me, but my parents also tried to make my brother and I get in touch with each other, like handing the phone to my brother.  But somehow we still kind of have nothing to talk about. Many times, after one or two minus small talk, my brother transfer the phone back to my parents. That is really frustrating because there are many technologies and tools that have made it easier for us to stay connected, but why we still cannot feel we are close to each other?

This experience led me to wonder – why “large gap” sibling relationships are particularly difficult to support. Before we dive into this question, I am going to talk about some important background information. 

  • Why sibling relationships are critically important?
  • Why are sibling relationships different from other types of family connections?

Why sibling relationships are critically important?

There are a number of reasons for the importance of siblings. First, sibling relationships are an important aspect of child development. Although we tend to focus more on parent relationships, prior work indicated that sibling relationship also significantly affects how children develop, particularly socially and emotionally. Second, the relationship with siblings is of extremely long duration. Contact with siblings is maintained by almost all adults throughout their lives. Thirdly, sibling relationships are pervasive relationships. Most of us have brothers and sisters. In fact, a study showed that an estimated 80 to 90% of individuals grow up with a sibling.

Why sibling relationships are different from other types of family connections?

Unlike parents or primary caregivers who generally act as a secure base, siblings are thought to fulfill the social needs of children and are more often sought out for fun and playful interactions rather than support and comfort. Also, it tends to be more equal than family members of other generations. It is also different from the roles of peers. Because of the more co-constructed experiences and contact frequency, there is an important role of shared experience in sibling learning and communication.

Figure 2. Why large-gap sibling relationships are particularly important and difficult to support?

OK, here comes our key question, 

Why “large gap” sibling relationships are particularly difficult to support?

I know there are lots of older brothers or sisters who have similar problems to me. With maturity, given the number of life changes that occur, for example, like me, going to the university, when we have a totally different life circle and timetable, this distancing is not surprising. For children, using audio or video calls also hard to engage them to maintain a long-distance relationship. So even though sibling attachment bonds are still important for each party, being an adult suggests a decrease in contact and proximity. It makes it a challenge that connects the older sibling as an adult, and the younger sibling as a child. 

The good news is that we currently have more options to connect with remote families.  Lots of technology emerged in both industrial and academic fields that offer at least a partial solution to the problem of long-distance families. I am not going to spend more time talking about all these existing tools. Some of them you may already be very familiar with.  Also, lots of designs emerged in the HCI research, aiming to connect remote families.

Figure 3. Examples of commercial tools to help family connect together
Figure 4. Examples of research to help family connect together

However, although there is a growing interest in distant family communication, in the literature on designing for remote families, the sibling relationship has not received much attention. That is, even some systems are designed for the whole family including siblings. Because of the specialty of the siblings’ relationship mentioned before, few prior works examined how technology might influence siblings’ relationships. None of the prior work explicitly investigates the specific challenges in large-gap sibling relationships’ communication. 

To truly understand the intricacies of communication between siblings separated by a significant age gap, I utilized a mixed-method approach. Two weeks of diary study for older siblings and remote, semi-structured interviews with both siblings and one parent formed the basis of my research. The data collected was analyzed through a thematic analysis that involved open coding and clustering codes into themes. We recruited families which at least two siblings and those age differences are more than 5 years old. The younger sibling’s age is between 6 to 14 years old. The elder sibling has lived separately from the family for more than half a year. They need to have experience living together and have regular direct or indirect communication. 

Figure 5. Methods used in our study

The results of this study revealed the unique needs and challenges faced by stakeholders involved in remote communication between large age gap siblings. Specifically, we found that the relationships between large age-gap siblings consist of older-to-younger companionship and care, with older siblings also taking on a pseudo-parental role. At the same time, there is a younger-to-older rivalry that can create tension between siblings and reduce the quality of family communication.

Our findings also highlighted the role of older siblings in initiating communication, engaging younger siblings, and providing technical support. Meanwhile, parents help to enrich siblings’ communication and provide logistical facilitation. However, there are challenges in managing conflicting values between parents and older siblings, promoting child-led conversations, and navigating technology obstructions.

To address these challenges, we identified three design opportunities for technology to better support the needs and practices of different stakeholders in remote sibling communication. First, technology can support co-present involvement for different stakeholders’ requirements and needs in remote settings. Second, it can scaffold child-led conversations under asymmetric relationship expectations. Lastly, technology can help negotiate value conflicts between older siblings and parents, which affect siblings’ communication and their relationships.

As we move further into the digital age, the importance of sibling relationships remains as critical as ever. However, as our research has shown, maintaining strong connections between large age-gap siblings can be challenging. By leveraging the power of technology and designing solutions that address the unique needs and practices of different stakeholders, we can bridge the gap between remote siblings and create more meaningful connections.

Find more information in our paper here –– coming to CHI 2023! 

Cite this paper:

Qiao Jin, Ye Yuan, Svetlana Yarosh. 2023, Socio-technical Opportunities in Long-Distance Communication Between the Siblings with a Large Age Difference. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). https://doi.org/10.1145/3544548.3580720

Reference

  • (Toet et al., 2021)Toet, Alexander, et al. “Augmented reality-based remote family visits in nursing homes.” ACM International Conference on Interactive Media Experiences. 2021.
  • (Shakeri and Neustaedter, 2021) Shakeri, Hanieh, and Carman Neustaedter. “Painting Portals: connecting homes through live paintings.” Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 2021.
  • (Inkpen et al, 2013)  Inkpen, Kori, et al. “Experiences2Go: sharing kids’ activities outside the home with remote family members.” Proceedings of the 2013 conference on Computer supported cooperative work. 2013.
  • (Nunez et al., 2019) Nunez, Eleuda, et al. “Effect on Social Connectedness and Stress Levels by Using a Huggable Interface in Remote Communication.” 2019 8th International Conference on Affective Computing and Intelligent Interaction (ACII). IEEE, 2019.
  • (Jarusriboonchai et al., 2020) Jarusriboonchai, Pradthana, et al. “Always with Me: Exploring Wearable Displays as a Lightweight Intimate Communication Channel.” 
  • (Judge et al., 2011) Judge, Tejinder K., et al. “Family portals: connecting families through a multifamily media space.” Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2011.

Stakeholders, Rationales, and Challenges of Virtual Reality in Education: How Will VR Enter Classrooms?

By on

We are living in a thrilling age where commercial VR headsets are no longer a luxury but an affordable reality. The ability of virtual reality to transform education has been a hot topic in recent times, with a wealth of articles, studies, videos, applications, and books dedicated to the subject. The possibilities of VR as an educational tool have captured the imagination of many, with some claiming it will have a profound impact on how we learn and educate. However, it begs the question, if the prospect of learning in VR is so exhilarating, then why isn’t it more prevalent (or even present!) in higher education? Who is putting the brakes on this exciting new learning tool?  Are there hidden challenges beyond what we see in published research or studies, and do stakeholders beyond instructors and students influence the decision to embrace VR in education?

Figure 1. Who influences technology adoption in higher education?

It’s time to delve deeper into the complex world of virtual reality in education and explore the untold stories behind its adoption. For larger organizations or complex contexts such as universities, there is usually more than one type of stakeholder who works together to guide the technology’s adoption decisions. For example, prior work has identified a group of stakeholders (e.g., technology staff, financial staff and administrators) in higher education who will interact with each other to affect the strategies and decisions of a university. With this in mind, we pose three research questions: 

  • Who are the stakeholders we need to consider for using VR in the classroom?
  • What is the rationale for VR use in higher education?
  • What challenges do major stakeholders face in using VR technology in educational activities?
Figure 2. Three research questions we posed in our study

In order to get a more holistic view to answer the research questions, this study applied a multi-method approach with semi-structured interviews followed by two participatory design workshops with university students and instructors. We followed up with another round of interviews with other major stakeholders identified by the workshops. Then, we chose to have a data-driven process to analyze our data from the interviews and workshops.

Figure 3. Methods used in our study

Who are the stakeholders we need to consider for using VR in the classroom?

Through our first round of interviews, it became apparent that there are more people, beyond instructors and students, that we should consider as stakeholders when integrating VR in higher education. The university can be seen as an educational ecosystem, where instructors may be collaborating with other types of experts or services to facilitate their courses. Stakeholders identified by our participants under university systems include co-teaching instructors, TAs, teaching support staff, classroom designers, IT staff, and so on. There were also some stakeholders beyond the campus, including VR content creators/developers, funding providers, and industrial companies. 

We found that different stakeholders at higher education institutions have the power to accelerate the integration of VR technology into traditional classrooms. Most notably, institutional support can promote sustainability and maximize efficiency in many aspects in the long term, including but not limited to management, deployment, and content creation. 

Figure 4. Stakeholders who may influence VR adoption

What is the rationale for VR use in higher education?

Our data revealed five reasons why people choose to use VR in higher education. 

  • Increasing Social Presence
  • Accessing Otherwise Inaccessible Learning Contexts,
  • Understanding and Remembering Visual and Spatial Knowledge
  • Supporting Embodied Learning
  • Attracting Students through Novelty

I am going to talk about the first one — social presence. Our work points out the importance of collaborative social experiences that VR can achieve in students’ learning process. Most participants identified the ability to create a realistic social environment that supports collaboration as one key benefit of VR. Compared with some other benefits of VR, such as the engagement and interest that are brought by its novelty and would eventually fade away, the social presence is a long-lasting benefit because it is derived from the nature of virtual reality.  As one of our participants commented, “Virtual avatars and environment made it easy to get social cues, from facial expressions to body language, without worrying about privacy leaking like showing surroundings in the background on the video.

What challenges do major stakeholders face in using VR technology in educational activities?

We also identified several challenges of using VR in higher education:

  • Course design investments. 
  • Financial consideration.
  • Learning curve. 
  • Technology management (e.g., storage, maintenance, distribution, and in-class management).
  • Health concerns. 

The optimistic predictions about introducing immersive VR into the classroom are based on the fact that the hardware is now much better and cheaper. Health issues are one of the most important challenges and it’s relatable to all disciplines. Motion sickness or cybersickness, eye strain, and headache were the most frequently mentioned health concerns in the interviews. As our participant mentioned, it is extremely important to create an inclusive class and make VR accessible to people in different conditions or capabilities.

Our findings demonstrate that no matter how excited people are about using immersive VR in the classroom now, in most situations instructors can only include this as a small optional experience because of fundamental barriers to equity. For example, if one student experienced a severe sickness, most instructors in our study would choose to no longer use VR. More importantly, when these issues are not randomly distributed in the population, the situation will become more serious. Take gender differences as an example, earlier studies have shown that an advantage of men over women with regard to cybersickness in VR. We can imagine how using VR will hurt gender equity, especially in those already male-dominated fields such as computer science.

Takeaways from this article

  • Collaboration experience is critical for educational VR
  • Ensuring that VR is accessible is the most important challenge to the adoption
  • It’s not about just instructors, it’s about the whole community

Find more information in our paper here

Cite this paper: 

Qiao Jin, Yu Liu, Svetlana Yarosh, Bo Han, and Feng Qian. 2022. How Will VR Enter University Classrooms? Multi-stakeholders Investigation of VR in Higher Education. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 563, 1–17. https://doi.org/10.1145/3491102.3517542

Identifying Outreach Windows in Online Opioid Recovery

By on

Note: Rudy Berry completed a summer Research Experiences for Undergraduate (REU) program at the University of Minnesota with Professor Stevie Chancellor in the summer of 2022. This blog post summarizes his project outcomes. Way to go, Rudy! 

Summary: Identifying when relapse has occurred is a key factor to consider when determining how to reach out to individuals with Opioid Use Disorder. Information like time elapsed since a previous relapse influences the type of resources and language that should be presented. In this project, I wrote a script that successfully identifies the date of incidence of relapse from a relapse disclosure post in an opioid addiction recovery community on Reddit. With this information, we were able to determine the amount of time that has passed since an individual’s self-disclosed relapse and the time they reported it to the community. The ability to extract this kind of information from recovery posts may be a valuable tool for the future development of context-sensitive outreach systems.  

Overview: Opioid Use Disorder (OUD), colloquially known as Opioid Addiction, is a highly stigmatized health issue that has fueled the growing opioid crisis in the United States for over two decades. The CDC reports that Opioids were responsible for about 75% of all U.S. drug overdoses in 2020. Opioids have been linked to over 500,000 deaths since 1999 (CDC, 2021). In response to this crisis and the difficulty of finding support, there has been growing engagement in online recovery forums for substance abuse. These communities give members an anonymous space to seek advice, share success stories, and vent frustrations. Members of online addiction recovery communities frequently share feelings of shame and guilt (Mudry et al., 2012). So, the ability to detach oneself from a real-world identity is a major draw of these forums. The popular discussion website Reddit is home to a large online recovery community–r/opiatesrecovery.

In this project, our research goal was to identify the date that someone had relapsed in their OUD recovery journey. Identifying when relapse has occurred is key to aiding in the recovery process because advice is dependent on when someone has relapsed. If an individual has relapsed very recently, it is important to direct them to resources that can provide more urgent forms of harm reduction in the moment. If a relapse occurred in the distant past, it may be more appropriate to provide them with resources focused on long-term sobriety tips or maintenance care. The existence of online recovery communities presents a unique opportunity in HCI for researchers to develop technology that could provide additional support and resources to individuals with OUD beyond what community members already provide.

Therefore, the primary goal of this project was to write a script that could identify the date of incidence of relapse from the context of a relapse disclosure post. The project focused on two specific data sets; a set of posts and a set of comments all gathered from r/opiatesrecovery on Reddit. The ability to extract this kind of contextual information from recovery posts would allow outreach systems to provide more context-sensitive resources and messaging to individuals in OUD recovery based on an estimated date of relapse. We also wanted to determine the average window size between the incidence date of relapse and the postdate across all relapse posts and comments on the subreddit. 

What We Did: The first step we took was identifying posts where an individual had disclosed the occurrence of a relapse. Working with another team member, we created a regular expression that matches phrases that indicated relapse, like “I relapsed” or “I just relapsed”. This was done in collaboration with another ongoing project in the lab to identify people who disclose that they have relapsed. This allowed us to create reduced datasets of relapse posts and comments from a larger general dataset from across the subreddit. 

Once we collected the relapse posts, the next step was to identify nearby temporal expressions from the relapse time frame such as “yesterday” or “a week ago”. To do so we employed the SUTime library, a tool from the Stanford CoreNLP pipeline. SUTime is a powerful temporal tagging library that identifies temporal expressions by tokenizing text. It provides tags for four categories of temporal expressions: “Time”, “Duration”, “Set”, and “Interval”. When SUTime identifies a temporal expression it returns the expression text, type, date in reference to a passed in value or the system date, and the start and end position of the expression in the string of text. 

For this project, we were particularly interested in the text of the type “Time” since this allowed for the extraction of the most specific dates. However, we realized that a handful of posts in our dataset were matching the type “Duration”. This included posts with phrases like “I relapsed for a week” or “I relapsed for 5 days”. These phrases were typically found in longer posts with many details and much more context to consider. We took this into account in our validation process and included durations to establish the limitations of our system. We wanted to know whether a human reader could identify a relapse date from the context surrounding a duration. To analyze this, we took a sample of twenty posts where relapse dates were identified and a sample where none were identified and replicated this with and without durations. We then hand annotated the text to identify false positive and negative identifications. 

The second part of our validation process involved experimenting and evaluating the size of the character window around the relapse window to effectively identify relevant time words.  We picked three different window sizes and analyzed the entire post dataset using accuracy. We wanted to know how many posts our script was able to accurately identify the day, week, or month of relapse for each character count. 

*The number of posts at each step in the identification process

 Findings:

The first part of the validation process revealed that the time tagging system was much more accurate when excluding duration temporal types. A negative sample (posts with no relapse dates identified) of twenty posts with durations included revealed that there was only one post where a human reader would be able to establish a relapse date. The system correctly identified that no relapse date was discernible from the other nineteen posts. However, when excluding durations, our system correctly identified that no relapse date could be identified for all twenty posts in the negative dataset. Within a positive dataset (posts with relapse dates identified), the inclusion of durations had a more dramatic effect on the results. In the positive sample with durations included there were eleven posts where the system correctly identified that a relapse date could be identified from the text. However, there were nine posts where the system incorrectly identified the beginning of durations as possible relapse dates. So, for almost half the sample the script would identify a relapse date, while a human reader would not be able to. This can be attributed to the fact that durations were typical of posts with more complexity to consider. For instance, in an example like “I got out of rehab then relapsed for five months”, the system would incorrectly identify the relapse date as five months prior to the post date. In this case a human reader would have to analyze the entire post to make a more accurate relapse date approximation. The results of the positive dataset without durations were better, with only five posts being incorrectly labeled as posts where a relapse date could be determined. Based on this outcome we decided to work only with “time” temporal types and exclude durations.

         During the second part of our validation process, we selected character counts of 100, 150, and 200 around our regular expression. The best performance was at one hundred, with an accuracy of 73.4% for the entire dataset of posts. This was verified by reading each post and identifying the correct relapse date. The issue with wider character windows was the inclusion of many temporal expressions. Our script is written to return the first expression it finds. In text like “I started my recovery journey a year ago and today I relapsed”, the relapse date would be incorrectly identified as a year ago. Alternatively, in a phrase like “Starting all over again today after I started relapsing again last month”, the relapse date would be incorrectly identified as the post date or “today”. A window size of 100 fails for both cases, and instances like these are more frequent past one hundred characters. Further testing is necessary to determine the best way for the script to choose between multiple time expressions.

*This histogram shows the number of comments corresponding to certain window sizes in the dataset. For instance, the first bar shows that there were over 200 posts where relapse was disclosed to the subreddit within 0-10 days of occurrence.
This histogram shows the number of posts corresponding to certain window sizes in the dataset.
This histogram shows the portion of the comment histogram from 10-200 days.

The histogram data we collected reveals spikes in relapse disclosure within the first ten days of relapse as well as at the one-month, two months, one-year, and two-year marks. The post dataset had a mean window size of 64.6 days with a median of 7.0 days. The comment dataset had a mean window size of 177.8 days, with a median window size of 30.0 days.

Overall, the script we created can extract information about relapse incidence dates and could be easily replicated and improved for an outreach system. This system could use the window size in conjunction with other information such as sentiment and prior relapse disclosures to send an individual a message with context-sensitive resources and word choice. 

One finding from the identifier I found particularly interesting was how many people reached out to online communities to disclose relapse so soon after it had occurred. This highlights a need for these systems to focus on how to support individuals during the immediate aftermath of a relapse. In the future, further modifications could be made to address the contextual limitations of durations and multiple time expressions. Through this project I learned a lot about the benefits of anonymity in online spaces. It was interesting to see people being open about their setbacks and experiences in real-time. This work has made me more curious about the role that anonymous online communities play in de-stigmatizing OUD as well as mental health risks like anxiety and depression, and the types of systems that can safely facilitate them. 

https://journals.sagepub.com/doi/full/10.1177/1049732312468296

https://www.cdc.gov/drugoverdose/epidemic/index.html#:~:text=The%20number%20of%20drug%20overdose,rates%20increased%20by%20over%206%25.

What Wikipedians Want (but Struggle) to Prioritize

By on

The English version of Wikipedia contains over 6.5 million articles… but only 0.09% of them have received Wikipedia’s highest quality rating. In other words, there’s still a lot of work to be done.

But where to start?

A group of highly experienced Wikipedia editors tried to answer that question. Through extensive discussion and consensus-building, they manually compiled lists of Vital Articles (VA) that should be prioritized for improvement. We analyzed their discussions to try to identify values they brought to the table in making those decisions. We found––among other things––a desire for Wikipedia to be “balanced”, including along gender lines. Wikipedia has long been criticized for its gender imbalance, so this was encouraging!

But how is this value reflected in the actual prioritization decisions in the lists of Vital Articles these editors developed?

Not so much.

Figure 4 from our paper shows what would happen if editors were to use Vital Articles to  prioritize work on biographies: the proportion of highest quality biographies about women would decrease––from 15.4% to 14.7%. By contrast, using pageviews (which indicate reader interest) to prioritize work would result in an increase in the proportion to 21.4%.

In short, if you want more gender balance, just prioritize what readers happen to read––not what a devoted group of editors painstakingly curated over several years with gender balance as one of the goals in mind. So what gives? Are Wikipedians just pretending to care about gender balance?

Not quite.

As it happens, only 7.5% of VA’s participants self-identify as women. For reference, that figure was 12.9% on all of English Wikipedia at the time we collected our data. Prior work gives plenty of evidence to help explain why a heavily male-skewed group of editors might have failed to include enough articles about women despite good intentions. Some of the reasons are quite intuitive too; as one Wikipedian put it, “On one hand, I’m surprised [the Menstruation article] isn’t here, but then as one of the x-deficient 90% of editors, I wouldn’t have even thought to add it.”

The takeaway: when it comes to prioritizing content, skewed demographics might prevent the Wikipedia editor community from fully enacting its own values. However, this effect is not the same for all community values; we find that VA would actually be a great prioritization tool for increasing geographical parity on Wikipedia. As for why? We have some ideas…

But for more on that (and other cool findings from our work), you’ll have to check out our research paper on this topic––coming to CSCW 2022! You can find the arXiv preprint here.

How Child Welfare Workers Reduce Racial Disparities in Algorithmic Decisions

By on

I sat in a gray cubicle, next to a social worker deciding whether to investigate a young couple reported for allegedly neglecting their one-year-old child. The social worker read a report aloud from their computer screen: “A family member called yesterday and said they went to the house two days ago at 5pm and it was filthy, sink full of dishes, food on the floor, mom and dad are using cocaine, and they left their son unsupervised in the middle of the day. Their medical and criminal records show they had problems with drugs in the past. But, when we sent someone out to check it out, the house was clean, mom was one-year sober and staying home full-time, and dad was working. But, dad said he was using again recently.” The social worker scrolled down past the report and clicked a button; a screen popped up with “Allegheny Family Screening Tool” at the top and a bright red, yellow, and green thermometer in the middle. “The algorithm says it’s high risk.” The social worker decided to investigate the family.

Image: Allegheny County Department of Human Services

Workers in Allegheny County’s Office of Children, Youth, and Families (CYF) have been making decisions about which families to investigate with the Allegheny Family Screening Tool (AFST), a machine learning algorithm which uses county data including demographics, criminal records, public medical records, and past CYF reports to try to predict which families will harm their children. These decisions are high-stakes: An unwarranted Child Protective Services (CPS) investigation can be intrusive and damaging to a family, as any parent of a trans child in Texas could tell you now. Investigations are also racially disparate: Over half of all Black children in the U.S. are subjected to a CPS investigation, twice the proportion for white children. One big reason why Allegheny County CYF started using the AFST in 2016 was to reduce racial biases. In our paper, How Child Welfare Workers Reduce Racial Disparities in Algorithmic Decisions, and its associated Extended Analysis, we find that the AFST gave more racially disparate recommendations than workers. In numbers, if the AFST fully automated the decision-making process, 68% of Black children would’ve been investigated and only 50% of white children from August 2016 to May 2018, an 18% disparity. The process isn’t fully automated though: the AFST gives workers a recommendation, and the workers make the final decision. Over that same time period from 2016 to 2018, workers (using the algorithm) decided to investigate 50% of Black children and 43% of white children, a lesser 7% disparity.

This complicates the current narrative about racial biases and the AFST. A 2019 study found that the disparity between the proportions of Black and white children investigated by Allegheny County CYF fell from 9% before the use of the AFST to 7% after it. Based on this, CYF said that the AFST caused workers to make less racially disparate decisions. Following these early “successes,” CPS agencies across the U.S. have started using algorithms just like the AFST. But, how does an algorithm giving more disparate recommendations cause workers to make less disparate decisions?

Last July, my co-authors and I visited workers who use the AFST to ask them this question. We showed them the figure above and explained how the algorithm gave more disparate recommendations and that they reduced those disparities in their final decisions. They weren’t surprised. Although the algorithm doesn’t use race as a variable, most workers thought the algorithm was racially biased because they thought it uses variables that are correlated with race. Based on their everyday interactions with the algorithm, workers thought it often scored people too high if they had a lot of “system involvement,” e.g. past CYF reports, criminal records, or public medical history. One worker said, “if you’re poor and you’re on welfare, you’re gonna score higher than a comparable family who has private insurance.” Workers thought this was related to race because Black families often have more system involvement than whites.

The primary way workers thought they reduced racial disparities in the AFST was by counteracting these patterns of over-scoring based on system involvement. A few workers we talked with said they made a conscious effort to reduce systemic racial disparities. Most, however, said reducing disparities was an unintentional side effect of making decisions holistically and contextually: Workers often looked at parents’ records to piece together the situation, rather than as an automatic strike against the family. For example, in the report I mentioned at the top of this article, the worker looked at criminal and medical records only to see if there was evidence that the parents abused drugs. The worker said, “somebody who was in prison 10 years ago has nothing to do with what’s going on today.” Whether they acted intentionally or not, workers were responsible for reducing racial disparities in the AFST.

For a more in-depth discussion, please read our paper, How Child Welfare Workers Reduce Racial Disparities in Algorithmic Decisions, and our Extended Analysis. All numbers in this blog are from the Extended Analysis. The original paper will be presented at CHI 2022. This work was co-authored with Hao-Fei Cheng, Anna Kawakami, Venkatesh Sivaraman, Yanghuidi Cheng, Diana Qing, Adam Perer, Kenneth Holstein, Steven Wu, and Haiyi Zhu. This work was funded by the National Science Foundation. Also see our concurrent work, Improving Human-AI Partnerships in Child Welfare: Understanding Worker Practices, Challenges, and Desires for Algorithmic Decision Support. We recognize all 48,071 of the children and their families on whom the data in our paper was collected and for whom this data reflects potentially consequential interactions with CYF.