GroupLens

Datasets and availability

By Song on March 3, 2010

Occasionally, GroupLens receives requests for datasets that we possess. In many cases, we are able to provide this data as we have with the Movielens rating datasets. One of the data collections that we have is a 10% sample of Wikipedia page requests (essentially every 10th HTTP request), since April 2007. This data accumulates at a rate of about 5 GB/day, and we currently have around 4 TB of unprocessed compressed data. This is approximately 40 TB when uncompressed. While we sometimes get requests for this data, the sheer size of it makes it difficult for us to make it available for download.

Although we cannot make this data available for download, depending on your request and our availability, we may be able to collaborate with you by performing the analysis you need on our data.

Also, we are not the only ones who have view data of Wikipedia. There are several other sources that have data on page views. Here are some of these resources and the type of data that they have available:

stats.grok.se – Provides data on per-page view counts by month.
dammit.lt/wikistats – Has files containing hourly per-page view count snapshots, with archives that currently go back to October 2009.
Wikipedia Page Traffic Statistics on AWS – Hourly traffic statistics for a 7 month window (October 2008 – April 2009) are available on Amazon Web Services. This data was assembled from files that were available from dammit.lt/wikistats at the time.

Getting Good Grades

By riedl on February 22, 2010

I just read an article called “Why study time does not predict grade point average across college students” by Plant, Ericsson, Hill, and Asberg. The article is an interesting look at past data on what predicts GPA, and a small-scale (88 student) study at one university. The authors are big fans of the “deliberate practice” model of learning, and focus on seeing if that translates into academic performance. Some of the interesting information (mostly from past studies):

studying without distraction predicts higher grades (no TV, no iPod, no study partners)
students who study without distraction study for *fewer* hours, but get *higher* grades
focused study is important. Just as many recreational tennis and golf players don’t get better over 20 years of playing, just “reading” isn’t enough. Deep thinking, analysis, and putting ideas together correlate with better grades.
scheduling is important. Planning ahead for getting school activities done, and studying at regularly scheduled times correlate with higher GPA.
going to class predicts higher GPA
working too many hours, and partying too many hours both predict lower GPA

Overall there weren’t a lot of big surprises, but I did find it interesting how important focused, uninterrupted study is. In fact, the total amount of study time did NOT predict good grades. A shorter amount of more focused study was more valuable. (Students tended to have to go to the library to get the more focused study time.)

What works for you?

John

Don’t bore me

By terveen on February 18, 2010

Doctors: do no harm.

Authors: keep the reader turning the page.

Speakers: keep the listener, uh, listening.

The title of this post and the third aphorism represent the sine qua non for a successful research talk (or any kind of public speech). Once the audience stops listening, you, the speaker, might just as well stop speaking.

I’ve been thinking about this ever since the CSCW conference last week. I saw quite a few talks on subjects I’m interested in, with good research, good content in the presentation, and good – i.e., fluent – delivery. I was engaged by the content in many cases and asked a lot of questions.

However, in reflecting on my experience, many of the talks began to seem, hmmmm…, monotonous. The speakers didn’t look animated. They didn’t use much of a dynamic range in their speaking: they weren’t loud sometimes and quiet others, fast sometimes and slow others. There weren’t too many jokes (shout out to Cliff and Reid, two speakers who did joke a bit). The slides too were pretty homogeneous: none that shouted “I’m important – notice me!”.

Again, the content was good – it wouldn’t have gotten in otherwise!

But speakers, lively up yourselves! It’ll keep your audiences’ ears open, so that your great content will get in. (And please: if you do a lively presentation with poor content or poor organization or poor slides, it’ll just seem … poor.)

Survey writing woes

By Katie on February 17, 2010

I’ve been spending a lot of time lately thinking about survey writing. In 2008, I took a short three day course with Jon Krosnick of Stanford University, which made me think about survey writing in a new way. In particular, I started realizing that the surveys that I wrote were poorly written.

Since then, it seems like I keep finding poorly written surveys everywhere I turn. Here are some examples I’ve found recently in my everyday life:

The US Postal Service sent me a Postal Customer Questionnaire lately because they were thinking about closing my branch. “If you now receive Post Office box service, you will be able to transfer your remaining box rent credit to another post office, or you may be eligible to receive a partial refund. How would you feel about consolidating the Dinkytown station with other postal stations? Better, Just as Good, No Opinion, or Worse” I answered No Opinion. Then I crossed it out and marked Worse. Then I crossed it out and marked No Opinion and wrote a three sentence explanation in the “Please explain” section. Why was this such a hard question to answer? Well, primarily because they’d never asked me about how I’d feel about the consolidation WITHOUT the refund. So now they were merging my opinions about the consolidation in with my feelings about the refund. Personally, I was mad that they were consolidating and I’d feel cheated if they didn’t refund my money, but really the refund wouldn’t change my opinion at all. No where on the survey did they ask me anything to this effect.

This second example isn’t exactly a survey, but is still getting at some of the problems with survey writing. I’m having problems with allergies and need to go see an allergy specialist. So the clinic sent me my paperwork so I could fill it out before my appointment. Leaving aside many of my other complaints (and there are many!), the first main page has a section entitled “Chief complaints of patient.” For each option you are supposed to check “Yes” or “No.” The options are Asthma, Rhinitis (Hay fever), Urticaria (Hives), Eczema, Sinusitis, Chronic recurrent bronchitis, Nasal polyps, Recurrent otitis media, Recurrent pneumonia, G.I. disturbances (colic, diarrhea, etc), Insect sting reaction, drug reactions, or blank lines. Now I’m a pretty smart person. I’ve been in school for a grand total of twenty-one years now, but I can’t tell you what many of those things are, and I can’t tell you which ones I should select. I have a runny nose and a cough. I’ve been diagnosed with something, but I forget what it is, and it didn’t include the second two symptoms, just the runny nose. Why on earth is this questionnaire that is obviously for the patient or patient advocate full of doctor jargon instead of patient jargon?

Now that I know better, I want to do my best to avoid writing bad survey questions, but at the same time, it’s incredibly difficult to write good survey questions. So what I’ve been doing is writing my same old, same old questions and then revising…and revising…and revising. Trying to revise them to turn them into good questions isn’t easy, but I try. I also ask for a lot of feedback and am very self-critical. One proof-reading pass doesn’t cut it for a survey, even if it’s only going out to 10 people. That would reflect poorly on me, my advisor, my lab, and my university…so I do more work. Hopefully if you take one of my surveys, you’ll see the result of this work, and if not, I hope you’ll take a moment to let me know.

Stack Overflow *

By riedl on October 12, 2009

This article on Read/Write Web describes how Stack Overflow, the tech Q&A site, will let other sties use their software, changing the look and feel, while keeping the Q&A goodness.

John

Blog