An Exciting Time for Cyclopath

By on

One of the premier research platforms around here is Cyclopath, a geowiki and route-finding service for Twin Cities bicyclists.

Now, we’ve expected Google’s announcement that they were getting into the bicycle routing business for some time. But that doesn’t mean yesterday was relaxed for us. 🙂

After sleeping on it, (and speaking for myself) I think this development is actually either neutral or good. We’re in a different niche than Google — we’re focused on open content and community, not just maps, and we’re strongly local with personal connections to the cycling community and local agencies. And on the plus side: almost all of the reactions from the community I saw on the social web were very supportive of us, and I’ve never seen so much passion at Cyclopath Headquarters as I did yesterday!

We’ll continue to write and publish consistent with our excellent track record (e.g., of the 5 papers we’ve submitted to top-tier conferences, 4 have been accepted on the first try and 2 have been nominated for Best Paper).

Details on what Google’s announcement means for Cyclopath, from the user perspective, are here.

Lastly, and off-topic, please follow @grouplens and @cyclopath_hq on Twitter!

Datasets and availability

By on

Occasionally, GroupLens receives requests for datasets that we possess. In many cases, we are able to provide this data as we have with the Movielens rating datasets. One of the data collections that we have is a 10% sample of Wikipedia page requests (essentially every 10th HTTP request), since April 2007. This data accumulates at a rate of about 5 GB/day, and we currently have around 4 TB of unprocessed compressed data. This is approximately 40 TB when uncompressed. While we sometimes get requests for this data, the sheer size of it makes it difficult for us to make it available for download.

Although we cannot make this data available for download, depending on your request and our availability, we may be able to collaborate with you by performing the analysis you need on our data.

Also, we are not the only ones who have view data of Wikipedia. There are several other sources that have data on page views. Here are some of these resources and the type of data that they have available:

  • stats.grok.se – Provides data on per-page view counts by month.
  • dammit.lt/wikistats – Has files containing hourly per-page view count snapshots, with archives that currently go back to October 2009.
  • Wikipedia Page Traffic Statistics on AWS – Hourly traffic statistics for a 7 month window (October 2008 – April 2009) are available on Amazon Web Services. This data was assembled from files that were available from dammit.lt/wikistats at the time.

Getting Good Grades

By on

I just read an article called “Why study time does not predict grade point average across college students” by Plant, Ericsson, Hill, and Asberg.  The article is an interesting look at past data on what predicts GPA, and a small-scale (88 student) study at one university.  The authors are big fans of the “deliberate practice” model of learning, and focus on seeing if that translates into academic performance.  Some of the interesting information (mostly from past studies):

  • studying without distraction predicts higher grades (no TV, no iPod, no study partners)
  • students who study without distraction study for *fewer* hours, but get *higher* grades
  • focused study is important. Just as many recreational tennis and golf players don’t get better over 20 years of playing, just “reading” isn’t enough. Deep thinking, analysis, and putting ideas together correlate with better grades.
  • scheduling is important.  Planning ahead for getting school activities done, and studying at regularly scheduled times correlate with higher GPA.
  • going to class predicts higher GPA
  • working too many hours, and partying too many hours both predict lower GPA

Overall there weren’t a lot of big surprises, but I did find it interesting how important focused, uninterrupted study is.  In fact, the total amount of study time did NOT predict good grades.  A shorter amount of more focused study was more valuable.  (Students tended to have to go to the library to get the more focused study time.)

What works for you?

John

Don’t bore me

By on

Doctors: do no harm.

Authors: keep the reader turning the page.

Speakers: keep the listener, uh, listening.

The title of this post and the third aphorism represent the sine qua non for a successful research talk (or any kind of public speech). Once the audience stops listening, you, the speaker, might just as well stop speaking.

I’ve been thinking about this ever since the CSCW conference last week. I saw quite a few talks on subjects I’m interested in, with good research, good content in the presentation, and good – i.e., fluent – delivery. I was engaged by the content in many cases and asked a lot of questions.

However, in reflecting on my experience, many of the talks began to seem, hmmmm…, monotonous. The speakers didn’t look animated. They didn’t use much of a dynamic range in their speaking: they weren’t loud sometimes and quiet others, fast sometimes and slow others. There weren’t too many jokes (shout out to Cliff and Reid, two speakers who did joke a bit). The slides too were pretty homogeneous: none that shouted “I’m important – notice me!”.

Again, the content was good – it wouldn’t have gotten in otherwise!

But speakers, lively up yourselves! It’ll keep your audiences’ ears open, so that your great content will get in. (And please: if you do a lively presentation with poor content or poor organization or poor slides, it’ll just seem … poor.)

Survey writing woes

By on

I’ve been spending a lot of time lately thinking about survey writing. In 2008, I took a short three day course with Jon Krosnick of Stanford University, which made me think about survey writing in a new way. In particular, I started realizing that the surveys that I wrote were poorly written.

Since then, it seems like I keep finding poorly written surveys everywhere I turn. Here are some examples I’ve found recently in  my everyday life:

The US Postal Service sent me a Postal Customer Questionnaire lately because they were thinking about closing my branch. “If you now receive Post Office box service, you will be able to transfer your remaining box rent credit to another post office, or you may be eligible to receive a partial refund. How would you feel about consolidating the Dinkytown station with other postal stations? Better, Just as Good, No Opinion, or Worse” I answered No Opinion. Then I crossed it out and marked Worse. Then I crossed it out and marked No Opinion and wrote a three sentence explanation in the “Please explain” section. Why was this such a hard question to answer? Well, primarily because they’d never asked me about how I’d feel about the consolidation WITHOUT the refund. So now they were merging my opinions about the consolidation in with my feelings about the refund. Personally, I was mad that they were consolidating and I’d feel cheated if they didn’t refund my money, but really the refund wouldn’t change my opinion at all. No where on the survey did they ask me anything to this effect.

This second example isn’t exactly a survey, but is still getting at some of the problems with survey writing. I’m having problems with allergies and need to go see an allergy specialist. So the clinic sent me my paperwork so I could fill it out before my appointment. Leaving aside many of my other complaints (and there are many!), the first main page has a section entitled “Chief complaints of patient.” For each option you are supposed to check “Yes” or “No.” The options are Asthma, Rhinitis (Hay fever), Urticaria (Hives), Eczema, Sinusitis, Chronic recurrent bronchitis, Nasal polyps, Recurrent otitis media,  Recurrent pneumonia, G.I. disturbances (colic, diarrhea, etc), Insect sting reaction, drug reactions, or blank lines. Now I’m a pretty smart person. I’ve been in school for a grand total of twenty-one years now, but I can’t tell you what many of those things are, and I can’t tell you which ones I should select. I have a runny nose and a cough. I’ve been diagnosed with something, but I forget what it is, and it didn’t include the second two symptoms, just the runny nose. Why on earth is this questionnaire that is obviously for the patient or patient advocate full of doctor jargon instead of patient jargon?

Now that I know better, I want to do my best to avoid writing bad survey questions, but at the same time, it’s incredibly difficult to write good survey questions. So what I’ve been doing is writing my same old, same old questions and then revising…and revising…and revising. Trying to revise them to turn them into good questions isn’t easy, but I try. I also ask for a lot of feedback and am very self-critical. One proof-reading pass doesn’t cut it for a survey, even if it’s only going out to 10 people. That would reflect poorly on me, my advisor, my lab, and my university…so I do more work. Hopefully if you take one of my surveys, you’ll see the result of this work, and if not, I hope you’ll take a moment to let me know.