Return to Basics in Computer Science

By on

Joe Konstan pointed me to this interesting debate around an essay by Dewar and Schonberg about the teaching of computer science.  Many of these essays are shallow: the authors have a particular style of programming that they think will solve all problems, and they advocate that all of us ought to spend more time teaching and learning their style.  Dewar and Schonberg’s presentation is much deeper, perhaps because of their experience both teaching and practicing computer science.

The basic three arguments in their article are:

1) Computer scientists should learn about many different language
paradigms (described in detail in the essay), because that gives them a
deep understanding of the approaches that might best solve any
particular problem.

2) Computer scientists should learn more about formal models and how they can be used in software construction.  In particular, computer scientists should learn more of the appropriate math.

3) Computer scientists should learn the discipline from the ground up.  The early use of a very high-level language such as Python — or even Java — means that students develop a too high-level understanding of programming, without the details that are sometimes crucial.

I think the authors are absolutely correct on (1), and they do a good job of presenting their arguments.  I’ll say no more on this issue, since their article does such a good job.

I think the issue of formal models is a trickier one.  I certainly agree with their assertion that it would be valuable for all computer scientists to understand how formal models can be part of the solution for achieving very high reliability.  On the other hand, most computer scientists will never work on such systems.  How much of their time should be spent studying such systems?  My bias would be to create a path for those students who find such work interesting, but to require only the basics for all students.

Finally, I think the authors are dead-wrong on the idea of teaching students to program from the hardware up.  I understand the temptation: that’s how we learned, and we’re all awfully good at what we do, so it must be the best way to learn.  But, this argument misses the most important skill for a computer scientist: effective abstraction.  The current approach of beginning with high-level languages starts students on the path to understanding the really deep issues of our discipline, rather than spending this precious formative time on problems only a few of them will face in practice.  The authors argue that high-level languages are much easier to outsource than lower-level thinking.  They have it exactly backwards: the most challenging problems in our discipline today — and the most difficult to outsource — are mapping from user needs to concrete requirements that are implementable.  A student who knows what a high-level language can do, along with the power of its attendant libraries, is much better prepared for this sort of work than a student who has spent years learning about machine architecture and machine language.

I agree with the authors that Java is the wrong language for the first course, but for nearly the opposite reason.  Java is a difficult language to learn because it requires that so many details be understood before interesting program can be built.  In particular, Java suffers with opaque syntax for simple things — like the basic list, and dictionary data structures — and the lack of a lambda expression to make higher-order programming accessible.  Beginning students would be much better off with a language like Python, which gives them the tools to explore both modern imperative programming and functional programming.

John

Powered by ScribeFire.

Diebold and New Hampshire

By on

Lots of folk online wondering about whether Diebold messed up the election results in New Hampshire.  For the record, I think that’s unlikely — but the key issue is that the use of closed source voting machines creates a dangerous lack of transparency in one of the key processes of democracy.  We should get rid of these machines until we have a more reliable way to guarantee they’re doing what they promise to do.

Meanwhile, this page is one of the more sober analyses of what happened in New Hampshire.  Bottom line: though hand counted results show Obama doing much better than machine-counted results, demographics seems a more likely cause of the difference than cheating.

John

Powered by ScribeFire.

Philosophy of Science if Physics is a Simulation

By on

There is an unusually fun discussion on Slashdot about a not very good paper that suggests that physicists should explore the question of whether the universe is a simulation by a sufficiently advanced civilization.  The argument is an old one: the basic premise is that if a civilization ever advances far enough that they are able to simulate a universe as complicated as ours, they’ll probably simulate it zillions of times, which implies there are more simulations than universes, which implies we’re likely in a simulation.  The argument from probability seems silly to me: starting with a premise that we have no way to analyze probabilistically (“ever advances far enough …”) and ending with a probabilistic argument is an awkward path, but …

The basic question is interesting from a philosophy of science perspective.  If we are in a simulation, can we ever know?  If so, can we affect the simulation?  There are good reasons to believe we could not learn if we’re in a simulation, because all of our a posteriori knowledge would be of the simulated universe.  How could we design an experiment that would measure something the is by definition outside of the things we can measure?  The paper cited on Slashdot suggests that we might seek information theoretic limitations of our current universe.  I’m not sure a discovery one way or the other would convince anyone: if there are limitations on the information processing ability of our universe, that might just be the physics of our universe, rather than evidence of an underlying computer a lot like our current computers.  If there are no such limits (that we can find), they might just be very hard to find — or the universe might be running on a very different type of computer, with different limits than those we are used to.

In any case, we’re probably best off continuing to work away at figuring out what the rules of our universe are.  If it is a simulation, it’s a pretty good one, and we probably can’t prevent the cosmic Ctrl-Alt-Del anyway.

Of course, we might be able to induce a Ctrl-Alt-Del if we want to.  Perhaps there’s a bug in the simulation software that we could exploit to cause the blue screen of death on a grand scale.  (Some people have suggested that nuclear weapons are such a bug, on a planetary scale.)

I’m reminded of an old story about an early disk drive.  The drive was large, nearly as tall as a person, and much wider.  The story is that the disk read/write assembly was massive enough that if the software “seeked” from side to side of the drive rhythmically, the drive could be made to tip slightly from side to side, and eventually to “walk” across the floor.  Perhaps a goal of physics should be to see if we can tip a cosmic disk drive over, to see what would happen.

John

Powered by ScribeFire.

How Does Wikipedia Deal with Saboteurs?

By on

There was an interesting article in Fast Company’s April 2007 about Jimmy Wales new search venture, Wikia.  (If you haven’t heard: he’s hoping to do for search what Wikipedia does for content, by using armies of volunteers to sift through search results.) 

One quote from the article stuck out to me: “The way Wikipedia deals with saboteurs is to change them, not to
crush them,” says Tapscott. “They find something good about them and
embrace it. This attitude is what you need to make this work.” 
Is that really true?  Does Wikipedia really convert saboteurs into converts, who add value to the encyclopedia?  Seeing the quote from Tapscott in print helped me realize that I’ve had an unconscious assumption that saboteurs are mostly persistent bad guys, who may be discouraged, but not improved.  One challenge to the traditional post hoc data analysis to try to figure out whether saboteurs are converting is that saboteurs are more likely than regular wikipedia contributors to contribute anonymously.  How can this weakness in the data be overcome?

John

Powered by ScribeFire.

Genuinely Usable Java: Principles

By on

I’m proposing a new programming language, called Guava, for Genuinely Usable Java.  Guava would be very like Java, but would be designed for usability by learners, not for safety in the hands of experts.  This post is to suggest some ideas about the motivation for Guava, and to lay out some of the principles that would guide its development. 

I’ve been talking to friends for many years about the problems with using Java as a language for teaching programming. Java is a difficult language, that requires the programmer to keep in mind many details simultaneously in order to produce successful programs.  I just finished teaching our CS2 course, which teaches Java and data structures.  While the students loved learning a "real" programming language, we all found it frustrating dealing with the many aspects of Java that make it a very difficult language for beginners.

There are many languages that are much nicer as a first programming language. Scheme has the advantages of grace, elegance, and one of the best books ever written about computer science. Python has the advantage of simplicity, and real-world power. C has the "advantage" of being close to the machine. (I am among those who believe that if the fundamental power of computer science is abstraction, we ought to begin with some deep abstractions!) However, most programmers are likely to eventually end up writing programs in Java, and all three of these languages suffer as beginning languages for someone whose trajectory is toward Java. (It is appropriate to continue the argument about whether Java is the "right" language for most programs to be written in. Certainly Java is neither elegant enough nor extensible enough that we should believe it will survive longer than the usual ten or twenty years of dominance. But: we academics need to recognize that we don’t get to make that choice. Java is what our students will have to program in, at least for the first part of their careers.)

Scheme is awkward as a first language for eventual Java programmers because the functional approach in Scheme does not prepare students well for the imperative style of Java. We may all love the style, but taking a detour before starting down the imperative path is an ivory tower exercise. Python is much more similar to Java — but has several unusual syntax choices that get in the way of easily migrating to Java. Further, once students have learned Python, they’re going to be frustrated with the strictures of Java programming. Better would be a path that would let them write much of their program in a language at the same level as Python, while having access to full-featured Java for those parts of their program that benefit from the rigor.

Those of you who have explored Groovy are already thinking that you have the answer. Groovy goes a long way to solve these problems. It runs on the Java VM, and smoothly interacts with Java programs. Classes can be written in either Groovy or Java and interact smoothly with classes written in the other language. But, Groovy makes several bizarre syntax choices that will make the transition to Java more difficult for students — such as freedom from semicolons, except of course when they’re needed — and has put many of us off with frustrating run-time errors caused by unnecessarily complicated language semantics. Groovy may well eventually be the solution, but for now I find it a frustrating waypoint on the path to a truly usable dialect of Java.

Of course, there are many other languages available for the Java VM. One I particularly like is Scala, which has many elegant language features from modern language theory. But, Scala is not really a language that wants to provide a graceful transition to Java. Scala is more useful as a demonstration of how powerful and elegant features from Ocaml could be brought to Java.

I propose a new dialect of Java, called Guava, for Genuinely Usable Java. (Yes, Guava isn’t really an acronym. Unless you think it’s kind of cool that it drops the J in Java, as a sign that usability often means removing features instead of adding them. (There is already a language called Guava, discussed in a 2000 SigPLAN paper, but I don’t see more recent articles on it, so I think we should take over the name.) (While we’re going crazy on parens: I’m not sure whether Guava will really qualify as a "dialect" of Java, since it will in many ways be a very different language.  In particular, Guava will allow functions outside of a class, code outside of any function, and higher-order functions (functions as arguments to other functions), none of which Java allows.  However, Guava will be in a very deep way Java-like: Guava language syntax will map directly to Java syntax so learnes can easily move back and forth between the two languages.)

Note that criticising the usability of existing languages for teaching beginners is not the same as criticising the languages directly.  It may well be that Java has made the right trade-offs for expert programmers who are willing to devote years to developing mastery.  It is possible that Guava will only ever be used by people learning to program, or by people writing very simple programs, such as are now written in scripting languages.  Perhaps all important Guava programs will eventually be migrated to true Java.  That’s okay!  Guava is intended to be a simpler language usable for non-experts, with a direct path to deployment in full Java. 

There are many careful decisions to be made on the path to Guava.  These decisions will all be based on three core principles that form The Guava Manifesto:

1) Guava will be compatible with Java.  Guava syntax will be like Java syntax.  Guava will use exactly the Java object hierarchy, including Java Strings, Arrays, Lists, and Maps.  Guava classes and Java classes will be completely interoperable.

2) Guava will support functional programming.  Programmers will be able to pass functions as arguments without having to know how to create anonymous inner classes.

3) Guava will simplify the Java type system, transparently. Guava programs will be strongly typed.  The difference between primitive and "boxed" types will be invisible to the programmer.  Types that are developed in Guava will automatically be able to be compared and stored in maps with the expected semantics.  

Of course, these principles don’t answer all — or even most — questions about how the language ought to work.  For instance, should Groovy provide operator syntax for commonly used operations, like List or Map operations?  On the one hand, beginners would benefit from simpler syntax than Java provides.  On the other hand, directly using the Java "everything looks like a method call" style would provide a smoother transition to Java. 

I’m not certain I’ve articulated the best set of principles yet.  What do you think?

John