December | 2007 | GroupLens

How Does Wikipedia Deal with Saboteurs?

By riedl on December 31, 2007

There was an interesting article in Fast Company’s April 2007 about Jimmy Wales new search venture, Wikia. (If you haven’t heard: he’s hoping to do for search what Wikipedia does for content, by using armies of volunteers to sift through search results.)

One quote from the article stuck out to me: “The way Wikipedia deals with saboteurs is to change them, not to
crush them,” says Tapscott. “They find something good about them and
embrace it. This attitude is what you need to make this work.” Is that really true? Does Wikipedia really convert saboteurs into converts, who add value to the encyclopedia? Seeing the quote from Tapscott in print helped me realize that I’ve had an unconscious assumption that saboteurs are mostly persistent bad guys, who may be discouraged, but not improved. One challenge to the traditional post hoc data analysis to try to figure out whether saboteurs are converting is that saboteurs are more likely than regular wikipedia contributors to contribute anonymously. How can this weakness in the data be overcome?

John

Genuinely Usable Java: Principles

By riedl on December 22, 2007

I’m proposing a new programming language, called Guava, for Genuinely Usable Java. Guava would be very like Java, but would be designed for usability by learners, not for safety in the hands of experts. This post is to suggest some ideas about the motivation for Guava, and to lay out some of the principles that would guide its development.

I’ve been talking to friends for many years about the problems with using Java as a language for teaching programming. Java is a difficult language, that requires the programmer to keep in mind many details simultaneously in order to produce successful programs. I just finished teaching our CS2 course, which teaches Java and data structures. While the students loved learning a "real" programming language, we all found it frustrating dealing with the many aspects of Java that make it a very difficult language for beginners.

There are many languages that are much nicer as a first programming language. Scheme has the advantages of grace, elegance, and one of the best books ever written about computer science. Python has the advantage of simplicity, and real-world power. C has the "advantage" of being close to the machine. (I am among those who believe that if the fundamental power of computer science is abstraction, we ought to begin with some deep abstractions!) However, most programmers are likely to eventually end up writing programs in Java, and all three of these languages suffer as beginning languages for someone whose trajectory is toward Java. (It is appropriate to continue the argument about whether Java is the "right" language for most programs to be written in. Certainly Java is neither elegant enough nor extensible enough that we should believe it will survive longer than the usual ten or twenty years of dominance. But: we academics need to recognize that we don’t get to make that choice. Java is what our students will have to program in, at least for the first part of their careers.)

Scheme is awkward as a first language for eventual Java programmers because the functional approach in Scheme does not prepare students well for the imperative style of Java. We may all love the style, but taking a detour before starting down the imperative path is an ivory tower exercise. Python is much more similar to Java — but has several unusual syntax choices that get in the way of easily migrating to Java. Further, once students have learned Python, they’re going to be frustrated with the strictures of Java programming. Better would be a path that would let them write much of their program in a language at the same level as Python, while having access to full-featured Java for those parts of their program that benefit from the rigor.

Those of you who have explored Groovy are already thinking that you have the answer. Groovy goes a long way to solve these problems. It runs on the Java VM, and smoothly interacts with Java programs. Classes can be written in either Groovy or Java and interact smoothly with classes written in the other language. But, Groovy makes several bizarre syntax choices that will make the transition to Java more difficult for students — such as freedom from semicolons, except of course when they’re needed — and has put many of us off with frustrating run-time errors caused by unnecessarily complicated language semantics. Groovy may well eventually be the solution, but for now I find it a frustrating waypoint on the path to a truly usable dialect of Java.

Of course, there are many other languages available for the Java VM. One I particularly like is Scala, which has many elegant language features from modern language theory. But, Scala is not really a language that wants to provide a graceful transition to Java. Scala is more useful as a demonstration of how powerful and elegant features from Ocaml could be brought to Java.

I propose a new dialect of Java, called Guava, for Genuinely Usable Java. (Yes, Guava isn’t really an acronym. Unless you think it’s kind of cool that it drops the J in Java, as a sign that usability often means removing features instead of adding them. (There is already a language called Guava, discussed in a 2000 SigPLAN paper, but I don’t see more recent articles on it, so I think we should take over the name.) (While we’re going crazy on parens: I’m not sure whether Guava will really qualify as a "dialect" of Java, since it will in many ways be a very different language. In particular, Guava will allow functions outside of a class, code outside of any function, and higher-order functions (functions as arguments to other functions), none of which Java allows. However, Guava will be in a very deep way Java-like: Guava language syntax will map directly to Java syntax so learnes can easily move back and forth between the two languages.)

Note that criticising the usability of existing languages for teaching beginners is not the same as criticising the languages directly. It may well be that Java has made the right trade-offs for expert programmers who are willing to devote years to developing mastery. It is possible that Guava will only ever be used by people learning to program, or by people writing very simple programs, such as are now written in scripting languages. Perhaps all important Guava programs will eventually be migrated to true Java. That’s okay! Guava is intended to be a simpler language usable for non-experts, with a direct path to deployment in full Java.

There are many careful decisions to be made on the path to Guava. These decisions will all be based on three core principles that form The Guava Manifesto:

1) Guava will be compatible with Java. Guava syntax will be like Java syntax. Guava will use exactly the Java object hierarchy, including Java Strings, Arrays, Lists, and Maps. Guava classes and Java classes will be completely interoperable.

2) Guava will support functional programming. Programmers will be able to pass functions as arguments without having to know how to create anonymous inner classes.

3) Guava will simplify the Java type system, transparently. Guava programs will be strongly typed. The difference between primitive and "boxed" types will be invisible to the programmer. Types that are developed in Guava will automatically be able to be compared and stored in maps with the expected semantics.

Of course, these principles don’t answer all — or even most — questions about how the language ought to work. For instance, should Groovy provide operator syntax for commonly used operations, like List or Map operations? On the one hand, beginners would benefit from simpler syntax than Java provides. On the other hand, directly using the Java "everything looks like a method call" style would provide a smoother transition to Java.

I’m not certain I’ve articulated the best set of principles yet. What do you think?

John

Surveillance Without Warrants

By riedl on December 20, 2007

The New York Times has an important article about the NSA’s efforts to have full access to the telecommunications infrastructure within the US. There are parts of this question that are difficult: should the NSA, with a warrant, be guaranteed the technical capability to wiretap anyone through the phone system switches? As a privacy nut, I’m skeptical even of this claim, but at least it seems a difficult question.

The easier question is what if they want to do the wiretapping without a warrant, or with a warrant that is non-specific. To me, it should be clear that this ought to be illegal, and that everyone involved ought to be liable, from the government officials to the private companies that were complicit.

What do you think?
John

Underwhelmed by Google’s Knol

By reid on December 15, 2007

A few days ago, as explained by Udi Manber, Google announced a new service, called Knol, which seems to have approximately the same goal as Wikipedia: to create a more or less comprehensive repository of useful knowledge. Because Google is the super-juggernaut du jour, there is a lot of speculation that Knol will be a Wikipedia killer.

I disagree (not a unique point of view). Frankly, I don’t find the Knol idea all that interesting, and if it wasn’t Google proposing it, I don’t think anyone would have noticed. The basic difference from the Wikipedia collective-authorship approach is that articles are "owned" by a single person. Others may rate, suggest content, etc., but the owner is the sole arbiter of what the article contains.

Here’s why I think Knol is uninteresting in 2007:

No microcontributions. It’s impossible to make a tiny contribution (e.g. fixing a typo). Sure, you can suggest that the typo should be fixed. But there’s a lot of value in the immediate gratification: people like to see that the article is better right away due to their (tiny) efforts. In aggregate, microcontributions have lots of value in and of themselves, but they are also a good way to lead people to making macrocontributions.
No effort at consensus. It is left to the reader to make sense of the several competing articles on a particular topic. One of the huge benefits of Wikepedia’s approach is that this onerous task is more or less done for you.
Single point of failure in article maintenance. If an author loses interest in an article, it’s difficult or impossible for others to take over and continue work.

These problems are orthogonal to whether Google is able to successfully incentivize authors with money or recognition (things Wikipedia can’t do).

I do agree with Manber that many people who have knowledge often don’t share it because sharing is hard. But, I do not think the right way to make it easier is to introduce a new Knol-style service. Rather, I think adapting the Wikipedia approach to be friendlier is much more promising, for example by implementing a WYSIWYG editor and making policy less byzantine.

(Particularly welcome in the comments are links to interesting analyses of Knol.)

The Issue: Hand-Selected Blog Posts

By riedl on December 14, 2007

Read/WriteWeb has a very interesting article about The Issue, which is an online collection of blog posts with several very interesting features:

1) Human-edited.
2) Issue-focused.
3) Neutral Point of View.

Read/WriteWeb sees the story as one about digging into the long tail. I’m not convinced by that: this technique of having human editors dig up the story seems fundamentally limiting. I wonder, however, if there is a mashup possible for The Issue and Slashdot that would let the masses dig up potential stories, and use a wisdom of crowds approach to peek get properties (2) and (3) above.

John

Monthly Archives: December 2007