Posts Tagged “C/C++”

I’ve been dredging up my C++ for a class recently, and I’m struck by just how weird it feels now that I spend most of my time with Ruby.

I was all proud of myself for remembering how to write a copy constructor. Then I ran into a situation like this:

MyClass a = foo;
MyClass c;
c = foo;

The first line was fine; the last segfaulted. What the heck?

I had hit upon the subtle difference between assignment at construction time and normal assignment. The former calls the copy constructor, the latter calls operator=.

MyClass a = foo;  // calls copy constructor MyClass(foo)
MyClass c;
c = foo;  // calls operator=

I had neglected to provide an operator= for MyClass, so the compiler invented one for me. Since MyClass contained pointers to other structures, that naturally led to problems pretty quickly.

Had I not known to look up the specific behavior of operator=, and then implement a correct one for MyClass, I would have been really confused. This sort of subtlety is what makes me think of C++ as a “hard” language and Ruby as an “easy” language.

To be sure, Ruby has its subtle quirks too, but they occur less frequently and usually around “advanced” topics like metaprogramming. In C++, even a fundamental operation like assignment can have strange, unpredictable properties.

Comments 1 Comment »

Perhaps I was premature worrying about how slow Ruby is. John Wiseman was benchmarking Montezuma, his Common Lisp port of Ferret/Lucene, and found out in the process that Ferret is 10 times faster than Java Lucene! As he says, Ferret gets help from about 65,000 lines of C code.

I’ve heard this before, perhaps not often enough to make a generalization, but at least enough to identify a trend: if you want performance from Ruby code, rewrite it in C. (The same is sometimes said of Python, or really any interpreted language.) The basic approach seems to be to extract the most performance-critical parts of your dynamic, interpreted language program and rewrite them in a static, compiled language, thus retaining most of the benefits of both.

It’s an interesting contrast to what I see as the Common Lisp approach to optimization, which is to keep everything in Lisp but add compiler declarations in hopes of speeding it up. Trouble is, unless you’re an expert on the inner workings of your compiler (or can read the disassembled code) it’s hard to know exactly what effects a particular declaration will have.

Eventually, I think manual optimization will become unnecessary. Experimental compilers like Stalin have been shown to produce faster machine code than hand-coded C. Stalin compiles a subset of Scheme down to a subset of C, making heavy use of type-inferencing and static analysis. If it can be done with Scheme, surely it can be done with Python, Ruby, or any other dynamic language.

Comments No Comments »

This is just an random idea that popped into my head. Tell me if I’m crazy.

Is there enough distinction between literals in code and values generated at runtime? In other words, what should be the difference between this:

x = "Hello, World!"

and this (syntax made up):

x = new String("Hello, World!")

In general, I understand literal values in code to be immutable, or at least it’s undefined what happens if you modify them. So should two identical literals be pointers to the same block of memory? Does this C code:

char x[] = "Hello!";
char y[] = "Hello!";
if (x == y) printf("True");
else printf("False");

print True? (I tried it, and the answer is no, it doesn’t.)

This gets more confusing when you can have object literals as in JavaScript:

myFriend = {
  name : "Bob",
  age : 32,
  friends : ["Jack", "Jill", "Joe"],
  talk : function(){alert("Hi, I'm "+myFriend.name)}
};

Is myFriend mutable? As it turns out, in JavaScript, the answer is yes. For an interpreted language embedded in web pages that makes sense. But what about a compiled language — is efficiency lost when values could be compiled in but aren’t because the compiler doesn’t know the value never changes? I suppose that’s the reason for all those annoying const declarations in C++, but is that a complete solution?

So my question is this: would it be worthwhile to have specific syntax for compile-time contants? C has preprocessor constants, but they’re text-based and don’t carry type information.

Comments No Comments »

Or, How the Lisp-n Shall Inherit the Earth

Humans like to name things. Like ourselves, Homo sapiens, Latin for “Primate that has taken leave of its senses.”

Then there are engineers. Engineers like to name things too. Like SCSI, pronounced “scuzzy.” Or WYSIWYG, pronounced “wizzy-wig.” Or TTY, pronounced (I couldn’t believe this at first) “titty.”

Then there are programmers, who are singularly uncreative when it comes to naming things. With Apache’s Gregor Samsa class as a brilliant exception, programmers tend to give their variables and functions dull, predictable names like string, number, or do_not_use_this_variable_ever.

Of course, in most programming languages a given name can refer to one thing and one thing only for ever and ever until your program crashes and dumps core Amen. To get around this sad limitation, programming boldly took advantage of the eighth-century’s greatest typographical innovation, lower case letters. Thus we have the Dada-esque beauty of case-sensitive code:

Widget* WIDGET = new Widget.widget();

As languages got more complex, adding first-class functions and first-class types, there were more and more things that needed to be called by the same name. Few programming languages picked up the idea of meaning based on context. So hierarchical namespaces were born:

java.awt.event.ActionEvent myEvent = new java.awt.event.ActionEvent();

My tendinitis starts acting up at the mere sight of that.

Common Lisp to the rescue! As many have pointed out, CL is not just a Lisp-2 (functions and variables in separate namespaces), but more of a Lisp-n. It has many overlapping namespaces — functions, variables, type specifiers, block names, tagbody tags. With a few macros and a hash table, one can easily add a new namespace to the language. For example, testing frameworks like FiveAM and LispUnit put test names in a separate namespace. A programmer using these libraries doesn’t have to think about the extra namespace. Just type symbols and Lisp will do the right thing.

I like to think of this approach as parallel namespaces. Everything is kept neatly separated, without endless qualifiers attached to every symbol. Macros determine how symbols will be evaluated — or not — in a block of code.

This isn’t perfect. For one thing, there’s no canonical way to add a new namespace to a language. Some macros require names to be quoted symbols, some not. Some use keywords. Sometimes an unquoted symbol will pollute the namespace of the current package; sometimes it won’t.

The solution, to my mind, would be something that lets us do this:


(define-namespace foo)
(defmacro-with-namespaces my-macro ((arg1 foo))
. . . )

Then my-macro would always treat its first argument as a symbol which is not evaluated, internally binding arg1 to whatever value the given symbol holds in the foo namespace. I’m not going to attempt to implement such a beast just now, but I’m sure an experienced macro-writer (i.e. someone else) could dash it off in half an hour.

Comments 3 Comments »

Ah, the loop, so fundamental to programming it’s hard to imagine a single program without one. After all, what’s the use of calculating just one thing? Usually you have a big pile of things you want to calculate, which is why you need a computer in the first place.

I think one of the quickest ways to get a feel for a language is to study its looping constructs. I make no pretense that this is a complete or even an accurate list, but these are some of the general iteration patterns I’ve noticed in different languages.

Counter Variable Loop

for ( int i = 0; i <= 10; ++i ) {
  do stuff with list[i];
}

The old C classic, with deeper roots, I believe, in FORTRAN or PASCAL or both. Successive values of an integer counter are used to retrieve input from an array. Very efficient for small data sets, but requires the entire input to be stored as an array in memory. Also requires the looping code to know about the structure of the input, so not very adaptable. But surprisingly resiliant: Perl and Java support the same syntax.

For-Each Loop

foreach item in list
   do stuff with item
done

Probably the most popular syntactic looping construct, and easy to see why: it’s very easy to read and understand. The for-each loop shows up in Perl, Python, Visual Basic, and a host of other languages. Because it’s usually built in to the language syntax, it can rarely be extended to non-standard container types.

Container Method Loop

list.each do |item|
   do stuff with item
end

In purely object-oriented languages like Smalltalk and Ruby (shown above), looping constructs can be implemented as methods of container classes. This has the great advantages that new looping constructs can be added and standard loops can be implemented for new container types. Since the code “inside” the loop is just an anonymous function that takes a single item as its argument, it doesn’t need to know anything about the structure or type of the container.

List Comprehensions

[ item.do_stuff() for item in list ]

Although Python (syntax above) has gotten a lot of press, both good and bad, for its adoption of list comprehensions, they’ve been around a lot longer. I believe they were originally developed to describe lists in a way that looks more like mathematics. For simple patterns list comprehensions are easy to understand, but I don’t yet grok their full significance. They can be nested and combined to produce complex looping patterns that would be awkward to write with C-style iteration.

Half-Nelson Functional Loop

(map  (lambda (item)  do stuff with item)  list)

(Update 26 July 2006: Replaced the idiotic example (map (lambda (item) (process item)) list) with the above. I’m talking about block structures in code here, using the body of the lambda like the body of a for loop in the other languages. Obviously, if your loop body was just a single function, you would just use (map #'process list). Same changes apply to the next example below.)

Functional languages, including most dialects of Lisp, usually have a map operator that takes a function and a list and applies that function successively to each element of that list. I call this the Half-Nelson Functional Loop because it’s not, to my mind, the ultimate of functional behavior. For that, we turn to…

Full-Nelson Functional Loop

((lift (lambda (item)  do stuff with item))  list)

This looks pretty much like the previous example. But here, instead of map, we have lift, which takes only one argument, a function, and returns a new function that applies that function to every element of a list. That new function is then applied (here, in a Scheme-like syntax) to the list argument. I learned about lift from this blog article.

Comments 6 Comments »