Groundhog Day, Usability Testing, and Creativity

By on

There’s a lovely article about the movie GroundHog Day. The article talks about a lot of issues in the movie, but ends with the wonderful quote:

A/B testing is like sandpaper. You can use it to smooth out details, but you can’t actually create anything with it.

This thought reminds me of Don Norman’s comment that one of the risks for the field of CHI is that we become so focused on analysis that we never actually create anything new.

John

On Critical Mass

By on

I finally got around to carefully reading “A Theory of the Critical Mass…” by Oliver, Marwell, and Teixeira. Now I’m asking: what took me so long?

The article formalizes the notion of critical mass in collective action. It identifies two main independent variables that can influence the “probability, extent, and effectiveness of group actions in pursuit of collective goods”:

  • The form of the “production function” that relates “contributions of resources to the level of the collective good”. Two important categories of production functions are: (a) decelerating: the “first few units of resources contributed have the biggest effect on the collective good, and subsequent contributions progressively less”; (b) accelerating: “successive contributions generate progressively larger payoffs; therefore, each contribution makes the next one more
    likely.”
  • The “heterogeneity of interests and resources” in the population of potentially interested actors.

The authors then show that the problems and opportunities for collective action are very different for accelerating vs. decelerating production functions and for homogeneous vs. heterogeneous populations of actions. I’m not going to summarize the findings: the paper is a joy to read, so I mostly want to urge you to do that.

However, there were a couple of ideas that I found particularly relevant to issues in open content systems that I care about, so I did want to mention them.

First, this work looks at critical mass in “public” goods, where all the value is created by a group of people. This is true for many open content systems: Wikipedia and OpenStreetMap are two good examples. However, this isn’t true of other systems, including our Cyclopath bicycle routing system. Cyclopath began with a nearly complete transportation map created from Mn/DOT data and with a good objective route-finding algorithm that did not require user input. While we have shown that user input improves route-finding significantly and that algorithms based on user input are better than purely objective algorithms, I think it’s fair to say that most of the value of the Cyclopath “good” already was present before any user contributions were made. It’s interesting to consider how the concepts of this paper can be applied to a system like Cyclopath.

Second, Oliver at al. show that with decelerating production functions, the optimal outcome would be achieved if the *least* interested people contribute first and the *most* interested people contribute later. This obviously isn’t the way it usually works. They point out that one way to make this happen is for the most interested parties to “hold back”; perhaps they can offer “matching contributions” to entice less interested parties to contribute early in the process. This might suggest new strategies for intelligent-task-routing-like strategies to elicit participation in open content communities.

Third, many of the illustrative examples the authors give concern the different opportunities for collective action in “upper middle class” vs. “lower income” neighborhoods. I wonder: what’s the equivalent of an “upper middle class” open content system?

Fourth, the notion of “interest” presumed here is one of direct tangible personal benefit: if I give N dollars, I’m increasing the chances that I’ll receive M dollars (M >> N). However, we know that many contributors to open content systems (and many ‘volunteers’, too) contribute for other types of reasons, e.g., they “believe” in the public good, they are altruistic, or they want to build a reputation. For example, in Cyclopath, our most active editors don’t request many routes. For another example, other researchers have shown that there are many users in discussion forums who just answer questions and don’t ask any of their own.

Fifth, finally, and simply, I’d like to empirically measure the production function in various open content systems. I suspect that in many cases it is decelerating: i.e., early units of contribution are proportionally more valuable. I’d also like to measure this for individual users. Doing this calculation requires a way to measure the global quality of an open content system as well as the quality for a particular user. We can do both of these for Cyclopath. We can do the latter for MovieLens… not sure about the former.

Netbeans + Subversion + Windows XP

By on

For my teaching I’ve been using Netbeans this semester, which has overall been wonderful.  Overall Netbeans has been an even better experience than Eclipse for teaching — though both have a steeper learning curve than I’d prefer.

I’ve enjoyed Netbeans’ built-in subversion support.  (This is not a differentiator with Eclipse, just a comment.)  However, getting subversion working reliably with netbeans on a windows box is a bit fiddly, and the online documentation makes it seem easier than it is.  It’s easiest to break the setup into steps, and get each of them working before moving on to the next step.  (Part of what makes the documentation a bit complicated is that there are many alternatives.  I’m just going to describe one simple alternative, that assumes that you have a shell account on the Unix computer that contains your subversion repository.)  Here are the steps:

1. Get plink (from putty) working on your box.  Plink will be used by CollabNet to tunnel svn+ssh subversion connections.  First install the full putty from the web site.  Then create a .ssh key for putty using ssh-keygen, store it in a safe place on your Windows computer, and install the key in the authorized_keys file on your Unix server.  Then test with:

./PLINK.EXE -v -l <username> -i c:/path/to/key/file/id_rsa_putty.ppk <remote-host>

The result should be an ssh session to your remote host.  (plink is not a good client to actually use for ssh — prefer putty — but this is a simple test that it’s working.)  (I’m using forward slashes in the above because I run it in cygwin shells.  You’ll need backward slashes if you run it in the traditional unix command console.)

2. Install CollabNet’s Subversion Client.  They have a simple installer.

3. Look in your Application Data directory for the Subversion subdirectory.  (It’s possible you have to run the Subversion Client once to cause this directory to be created.)  Edit the config file in that directory.  Look for the section called “tunneling”. In that section, after all the comments, add a line:

ssh = c:/Program Files/putty-0.60/plink.exe -v -l <username> -i c:/path/to/keyfile/id_rsa_putty.ppk

Here you use forward slashes, because the Subversion Client will translate them.  The path to plink.exe should be changed to wherever you put plink. Adding this line to the config file tells the Subversion Client what command to use with URLs of the form svn+ssh.

4. Test the subversion client from the command line with:

./svn ls svn+ssh://<remote-host>/path/to/remote/svn-repo

If this works you have a working subversion client on windows, which is 80% of the battle!

5.In Netbeans go to Tools/Options/Miscellaneous/Versioning and set the Path to the SVN Client to:

C:\Program Files\CollabNet\Subversion Client

(or wherever you installed Subversion).

6. Right click on a directory and you should be able to use Subversion Update and Commit commands!

Occasionally when things are tricky the netbeans client gets confused.  I just use the command-line client to do an svn update, and all is usually well after that.

One issue to watch out for: subversion is very sensitive to version changes.  The working copy (checked out version) will be updated by the subversion client to the style that version of the client expects.  So if you use both a netbeans client and a command-line client you should make sure they’re the same “point” version number.  (E.g., They should both be 1.6.x, though they can have different xs.)

Good luck!

John

An Exciting Time for Cyclopath

By on

One of the premier research platforms around here is Cyclopath, a geowiki and route-finding service for Twin Cities bicyclists.

Now, we’ve expected Google’s announcement that they were getting into the bicycle routing business for some time. But that doesn’t mean yesterday was relaxed for us. 🙂

After sleeping on it, (and speaking for myself) I think this development is actually either neutral or good. We’re in a different niche than Google — we’re focused on open content and community, not just maps, and we’re strongly local with personal connections to the cycling community and local agencies. And on the plus side: almost all of the reactions from the community I saw on the social web were very supportive of us, and I’ve never seen so much passion at Cyclopath Headquarters as I did yesterday!

We’ll continue to write and publish consistent with our excellent track record (e.g., of the 5 papers we’ve submitted to top-tier conferences, 4 have been accepted on the first try and 2 have been nominated for Best Paper).

Details on what Google’s announcement means for Cyclopath, from the user perspective, are here.

Lastly, and off-topic, please follow @grouplens and @cyclopath_hq on Twitter!

Datasets and availability

By on

Occasionally, GroupLens receives requests for datasets that we possess. In many cases, we are able to provide this data as we have with the Movielens rating datasets. One of the data collections that we have is a 10% sample of Wikipedia page requests (essentially every 10th HTTP request), since April 2007. This data accumulates at a rate of about 5 GB/day, and we currently have around 4 TB of unprocessed compressed data. This is approximately 40 TB when uncompressed. While we sometimes get requests for this data, the sheer size of it makes it difficult for us to make it available for download.

Although we cannot make this data available for download, depending on your request and our availability, we may be able to collaborate with you by performing the analysis you need on our data.

Also, we are not the only ones who have view data of Wikipedia. There are several other sources that have data on page views. Here are some of these resources and the type of data that they have available:

  • stats.grok.se – Provides data on per-page view counts by month.
  • dammit.lt/wikistats – Has files containing hourly per-page view count snapshots, with archives that currently go back to October 2009.
  • Wikipedia Page Traffic Statistics on AWS – Hourly traffic statistics for a 7 month window (October 2008 – April 2009) are available on Amazon Web Services. This data was assembled from files that were available from dammit.lt/wikistats at the time.