Are Bots Ravaging Online Encyclopedias?
By abbynewcomb on
Wikipedia is the online encyclopedia that anyone can edit. However, you probably didn’t know that “bots” (software tools) also edit Wikipedia! Human editors (“Wikipedians”) work together with bots to keep Wikipedia up to date. However, bots’ edits can conflict with each other; some have even written about bot editing wars “raging on Wikipedia’s pages”! But is this true? If it were really happening, bot conflict would be a big deal: bots automatically enforce the encylopedia’s rules, so identifying bot conflict can help Wikipedians refine editing processes. However, previous researchers have used overly-simple approaches to quantify conflict, ignoring the context of specific bot edits. To understand what bots were actually doing, we conducted a qualitative analysis of the context in which bots make edits. We found no evidence of bot conflict, though we did find some malfunctioning bots.
We are Abby Newcomb (St. Olaf College) and Sokona Mangane (Bates College), participants in the University of Minnesota’s 2021 Computer Science REU. In this blog post, we’re going to talk about the editing patterns of four bots on Wikipedia: AvicBot, Cyberbot I, RonBot, and AnomieBOT. Using these bots as a case study, we will show that what may appear to be conflict is routine and expected when examined in context.
How do bots edit Wikipedia?
Bots are automated or semi-automated software agents programmed to carry out various tasks. According to the Wikipedia page Wikipedia:Bots, bots carry out tasks that are “repetitive and mundane” in order to maintain the encyclopedia. Bots adhere to the Wikipedia bot policy and are approved by human editors in the Bot Approvals Group before they are allowed to edit any Wikipedia pages. Most bots do not make edits on actual encyclopedia articles, but take care of housekeeping tasks necessary to keep the encyclopedia running.
Each time a user (human or bot) makes a change to a Wikipedia page, other users have the option to undo that change. An edit undoing or reversing the edits of another user, partially or completely, is called a revert. This type of interaction between users is interesting because the revert could indicate conflict: a disagreement over what’s included on a page. However, the original edit could have been a mistake, the original edit could be a temporary one meant to be reverted later, or Wikipedia practices could have changed and rendered the original edit unnecessary. Determining if a revert actually indicates conflict can be difficult.1
Bot-bot reverts—when a bot edit is reverted by another bot—are common on Wikipedia. A number of routine processes that bots do demand that they revert each other’s edits.2 However, the research paper “Even Good Bots Fight” by Tsvetkova et al. considers reverts to be a strong indicator of conflict. Their study concluded that the many cases of bots reverting each other indicates Wikipedia’s lack of control over its bots and led to media coverage of raging bot wars. Geiger and Halfaker’s replication study critiqued the association of bot reverts as necessarily indicating bot conflict. Through a mixed-methods approach, they argue that bot-bot reverts are primarily routine work, with the vast majority of bots acting as intended and in collaboration with each other. We build on their research by inspecting how four of the most prolific bots use reverts to interact with both bot and human editors. By studying these bots and their reverted edits, we also found that reverts didn’t indicate conflict, which indicates the importance of looking at reverts in context.
To conduct our analysis, we looked at edits made to English Wikipedia in the first three weeks of January 2019. We looked at a random sample of 10 edits for each bot, using our conclusions and other summary figures to choose samples for further analysis.3 In our study, we considered a revert to be any edit that causes the page to be an exact match of a previous version of the page within 24 hours. Thus we only look at reverts that completely remove the content of the original edit. By this definition, multiple edits can be undone by a single revert; in fact, 71% of reverts undo multiple edits at once. We define a self-revert as a revert where the original edit and the reverting edit were made by the same user. In the next four sections, we’ll dive into four different bots as case studies to understand whether bots’ reverts indicate conflict.
AvicBot
One of the bots that we looked at extensively is AvicBot, which is among the top 5 self-reverters. AvicBot is run by the user Avicennasis and has been operating since 2011.4 The bot has not made an edit since June 26, 2020 and was officially marked inactive on April 20, 2021. AvicBot performed a total of 11 tasks, including but not limited to: maintaining interwiki links, fixing redirects, tagging certain pages, maintaining a list of certain categories and more. Based on our analysis, AvicBot appears to be primarily a “listifying” bot because it maintains several tracking categories.5
AvicBot reverts itself while doing its routine listifying work. For example, when we look at this revision and its corresponding revert, the edit summary indicates that the revision is creating a list from a category of pages flagged for deletion. Another user was added to the page and 45 minutes later, after several intervening edits, the revision that reverted the original edit removed 4 entries, including the one added during the original edit. As we can see here, multiple of AvicBot’s edits were undone by a single revert. The bot is self-reverting here to periodically update the category its tracking!
As many of AvicBot’s tasks involve maintaining tracking categories, we can infer that most of AvicBot’s edits will probably look similar to the edits we looked at. All of AvicBot’s reverted edits are self-reverts, because it’s constantly updating several categories and reverting its own revisions. Thus we have concluded that the vast majority of AvicBot’s self-reverts are routine work and do not indicate conflict
Cyberbot I
Cyberbot I has the highest percentage of reverted edits of the 4 bots, with 13% reverted edits. Cyberbot has been running since April of 20126 and remains active as of July 2021. The bot is maintained and operated by Wikipedia user Cyberpower678.7 Based on our data, Cyberbot I’s primary task appears to focus on updating various tables of statistics to keep Wikipedians updated of activity on the encyclopedia.8
97% of Cyberbot I’s edits are self-reverted: why? To find out, we selected at random 20 self-reverted edits. 18 of those edits were on the Cyberpower678/Tally page, which Cyberbot I appears to use to keep track of the current number of votes in RfA and RfB discussions, though the purpose of the page is not clearly stated. The edits present as reverts because the bot seems to repeatedly delete its own content from the page, then add the same content again. The other 2 edits in the self-reverted sample were on the RfX Report page, where Cyberbot I also deletes its own content just to add it again seconds later in order to update the page. These edits are not problematic because the bot appears to be doing its job and functioning as-intended. Not all of Cyberbot I’s reverted edits are due to self-reverts: 26 edits were reverted by humans. The vast majority of these edits did not seem problematic and were often related to Cyberbot’s Sandbox cleaning task.9
Out of all Cyberbot I’s edits, we found 7 edits that are possibly problematic, either as malfunctions of the bot or disagreements over what the bot should be doing. The first problematic edit occurred when Cyberbot I updated a user’s admin stats but the edit count went down instead of up, which is impossible. This edit appeared to go unnoticed for the 24 hours it was visible. Cyberbot I also added an Articles for Deletion template to an article whose discussion had already closed, which was reverted by a human an hour later. The bot also deleted all content on the Changing username/Simple pagetwice, which was reverted by a human in about 2 hours each time. In a problematic sequence of 6 edits on the Template:RfX tally page, human users attempt to change the page and are reverted by the bot each time, reverting Cyberbot I in turn until a human moves the page to circumvent the bot. Overall, these represent the only instances of potential conflict we identified involving Cyberbot I; the vast majority of its edits are productive contributions.
RonBot
RonBot was frequently reverted by humans in our sample, coming in 3rd place in the list of bots most reverted by humans with 429 edits being reverted. It was run by Ronhjones. Due to the passing of its operator, the bot was recently deactivated and retired. We wanted to understand why RonBot was reverted by humans so often.
When we looked at 20 random edits reverted by humans, we found that 90% of these edits appeared to be caused by a malfunction of the bot. It appears that RonBot was adding articles to the “American footballers with no declared position” maintenance category10. However, users noticed that most of these footballers already had a position category listed in their articles, so these edits were reverted. We can see from the figure below that RonBot made the most edits on January 7, 87% of which were reverted. Of the 87% reverted edits, 20% of those were done by humans. Based on our qualitative analysis, it’s clear that RonBot was malfunctioning.
When a bot malfunctions, non-admin users can report them on this page. Users did report the bot’s malfunction, requesting that it be blocked or deactivated temporarily until the issue was resolved. About 2-3 days later, Ronhjones seemed to be working on repairing the bot. When we looked at another random sample of RonBot’s edits, the bot was removing this category11 from multiple pages (a number of which have been added on January 7th or months before) and fixing edits made from its malfunction, on January 8th, 16th, 19th. As we can see on the graph, after January 7th, these three dates have the highest number of total edits per day. These edits weren’t considered self-reverts, as one can see from the graph, because RonBot reverted them after 24 hours. In this special case, reverts were crucial to identifying this malfunctioning bot, and indicated malfunction rather than conflict. We suggest that a clearer reporting mechanism may be useful for improving bot governance.
AnomieBOT
AnomieBOT12 ranked 3rd on the list of bots frequently reverted by humans, so for this bot we primarily looked at article edits reverted by a human in order to understand the cause of these reverts. AnomieBOT has been editing Wikipedia since 2008 and is still active as of July 2021, operated by the user Anomie.
In a sample of 20 edits in the article namespace that were reverted by humans, we came across examples of 3 distinct tasks, as defined by the bot’s edit summary: dating maintenance tags,13 rescuing orphaned references,14 and fixing reference errors.15 None of these tasks are controversial in any way, since they are all routine maintenance. So why are these edits being reverted?
We consider at least 75% of these reverts to be caused by human conflict or human error. The sequence of events often starts with a human making an edit that others don’t like, perhaps adding incorrect or poorly formatted information. The bot then does its job and tries to help this first human by fixing some of the reference errors or adding a date to a maintenance tag. Later, a second human editor comes along and sees the mess created by the first human, and chooses to revert that human’s edit along with AnomieBOT’s edit. The fault lies with the first human: the bot’s edit is irrelevant to the human’s decision to revert. Thus, AnomieBOT is frequently reverted along with other edits made by humans because a human made a controversial edit and AnomieBOT was just caught in the crossfire of a human disagreement.
In the article namespace, 93% of AnomieBOT’s edits reverted by humans were reverted at the same time as a human edit. Because of this statistic and our qualitative observations, we believe that AnomieBOT is frequently reverted not because of conflict between the bot and humans, but because of conflict between humans and other humans. AnomieBOT is not in conflict with any other users.
Conclusion
In this blog post, we showed that reverts generally don’t indicate bot conflict. AvicBot and Cyberbot I both reveal that routine operation can involve self-reverting. RonBot was malfunctioning, which most people wouldn’t consider to be conflict. AnomieBOT reveals that just because a bot is being reverted doesn’t mean it’s involved in conflict; it may just be getting in the way of two human editors’ conflicts! Our research suggests that people attempting to quantify bot conflict need to develop more sophisticated methods than just counting reverts.
This research would not have been possible without the help of our mentors: Professor Loren Terveen and soon-to-be-PhD Zachary Levonian in the GroupLens Lab at the University of Minnesota. This work was also presented at the UMN Virtual Poster Symposium. Code for this work is available on GitHub.
Footnotes
- When reverting another user, most human editors will leave an edit summary to indicate why they made the change. An edit summary is a short explanation of an edit that is visible in the article’s edit history, shown in the top panel of the image below. Bots are also required by the bot policy to leave descriptive edit summaries, but their edit summaries are pre-programmed in their code. Thus if the bot is malfunctioning, the edit summary may not match what they’re actually doing.
- For example, DatBot reports users who break community guidelines. Meanwhile, HBC AIV helperbot 5 checks if reported users have been blocked, and if they are, the bot removes the entry. Hence, reverting DatBot’s edits is a part of HBC AIV helperbot5’s job.
- For more information on our sample, we have provided summary information here.
- For more info, check out its timeline of Requests for Approval
- A tracking category is used to make and maintain lists of pages. For a more in-depth description of AvicBot’s tasks, check out its user page.
- For more information on Cyberbot’s approvals, look at the Cyberbot Timeline of Requests for Approval.
- Cyberpower678 also operates a second bot, Cyberbot II. It appears that all of the code and tasks of Cyberbot I belonged to other bots previously, based on the bot’s approvals and user page, though the operator has rewritten and maintained the code.
- For example, Cyberbot I updates a separate Adminstats page for any user who requests these statistics. Other examples of tables maintained by the bot are the RfX Report page, which tracks any current discussions about requests for adminship or bureaucrat status, and the Requests for Unblock table, which keeps track of Wikipedians who would like to be unblocked from editing. In addition to maintaining these statistics pages, Cyberbot I clears various sandbox pages, which provide a space for users to experiment with editing tools without damaging Wikipedia articles; maintains several discussion pages, including Articles for Deletion and Changing username/Simple; and creates the current events page featured on the Wikipedia Main Page every day. Many more tasks are listed on Cyberbot I’s user page.
- The Sandbox functions as a sort of whiteboard, where Wikipedians can test out their editing skills as they wish. Later, a bot will come wipe the whiteboard clean so that the next person arrives to a clear editing space. Some Wikipedians like to keep their content in the Sandbox for a while though, so will revert Cyberbot to restore their content to the Sandbox.
- A maintenance category is specifically used so that Wikipedia contributors know that a given article needs some kind of maintenance. These categories are not visible in the article page, but must be included on the source code of each article. The “American footballers with no declared position” category is presumably used so that contributors will come to the article and add it to a position category, such as “Association football forwards.”
- The “American footballers with no declared position” category
- This bot operates out of 5 different accounts, each with various privileges and edit spaces, but our analysis was focused on the main AnomieBOT account.
- Also called maintenance templates, these tags allow editors to leave messages for others about problems with a given article. The tags can be dated so that editors know how long they have been on the page. An old tag may be deleted if it becomes out of date or no longer relevant to the article. AnomieBOT adds dates to these maintenance tags so that humans know when they were added to the article.
- The Wikitext language used to write Wikipedia articles requires the use of a reference template in order to cite information. References can be named so that they can be used multiple times in a document without having to copy the source information multiple times. An orphaned reference is a reference which has a name, but no accompanying reference information in the article. AnomieBOT attempts to recover information about the reference from the page history and add it to the article.
- Similar to rescuing orphaned references, reference errors are often caused by issues with the reference template required by Wikitext, the language used to write Wikipedia articles. When a human makes a reference error, AnomieBOT recognizes these errors and attempts to fix them so that the article has fewer problems.