Archive for the “Ruby” Category


Here’s a question that’s been bugging me for a while: what’s the best way to store information that is a mixture of highly- and loosely-structured data? For example, a collection of documents like Project Posner. Certain attributes of each document like the title, date, and citation fit easily into a normalized relational database model. But the body can only be described with some kind of markup.

I could just use HTML, except for one problem: my documents have to handle footnotes, for which HTML does not provide a tag. (As an aside, footnotes are a pain whether you’re doing web design or typesetting.)

On Project Posner, I compromised: everything is stored in a MySQL database, and the documents table has a “body” column that contains my own made-up XML syntax.

I could, in theory, normalize everything, even individual paragraphs. But that would be a nightmare to code and deadly slow. I could also store everything as XML documents. But then I’d have to reinvent all the facilities that MySQL (and ActiveRecord) provide, like transaction handling, auto-incrementing IDs, and so on.

For another project, I’m trying to create a pseudo-database that stores everything as XML files and uses Ferret for searching. I was going to use Ferret for full-text search anyway, so my original thought was to save overhead by not bothering with MySQL indexes. It works, but looking over it I realize that most of the data could be normalized to fit into the standard relational model. I’d still need a blob of XML data somewhere, but it could be in the database as easily as a file. What have I really gained, besides an impressively large and complex pile of code?

Comments 3 Comments »

I used to be a big fan of Perl. It was the first programming language I really liked. I felt like it didn’t get in my way. CPAN was and still is the best collection of open-source libraries ever assembled.

Then I got into Ruby, and was very happy with the way it cleaned up Perl’s syntax but still didn’t get in my way. There aren’t nearly as many Gems as CPAN modules, but it’s a solid foundation.

Recently I went back to Perl for a small project. Aside from the predictable mistakes — I kept trying to put colons before my hash keys to make them like Ruby symbols — I realized how nice Ruby really is. Perl is definitely powerful, but once you get beyond the text munging it was designed for it can get irritating, especially the ass-backwards way it does OOP. And I found myself missing Ruby’s code blocks. Perl has code blocks too, but understanding them takes a whole book.

I also realized why Ruby is fairly unique among semi-mainstream programming languages: unlike Perl, Python, JavaScript, or VB, it is based around a small set of core concepts: objects, methods, and code blocks. They’re used everywhere, even for control structures and fancy meta-programming. This is good, because it reduces the number of concepts one has to think about to use the language and, paradoxically, makes it easier to add new features. Lisp started the same way with its core concepts: S-expressions and macros.

Comments 2 Comments »

I’ve just dived into Rails and Ruby in the past couple of months, but I’ve already benefited from it, so here’s my entry in the How has Ruby on Rails made you a better programmer contest.

 

1. I finally get Model-View-Controller

I’ve seen MVC before, once long ago in the Microsoft C++ Foundation Classes, later for the web in Perl’s Maypole and Catalyst frameworks. But I never quite saw the point. Sure, it’s a nice idea, and I tried to separate my data from my views, but to make everything work I had to tie the models and views so tightly together that they could never be separated. The “controller” didn’t seem to have any role to play. It was just a shadowy background figure, perhaps a GUI engine or a web server, not something that I as a humble application programmer would ever implement. Code examples often omitted it entirely. Maybe those were bad examples, but they were what I had at the time.

I wrote my first real controller classes in Rails, where they suddenly make sense. The direct mapping of a URL to a method makes it obvious how the controller, not the view, is the real public interface to my whole application. It defines areas of operation and what actions the user is allowed to carry out in each area. So I think about what URLs I want to handle, and that tells me what my action methods should be. This leads me to think about my application in terms of its API, about what some other programmer accessing the application via HTTP might need.

2. I finally get Object-Oriented Programming

No, I’m not kidding. Rails taught me Object-Oriented Programming. I first learned OOP in C++, not the greatest introduction. After spending one whole summer on a string-handling class, I hated it. I knew there was more out there: I played around with CLOS and read about Smalltalk.

But it was with Ruby that I said, “Oh, this is how OOP is supposed to work!” Being able to add new methods to built-in classes like Integer or String made me feel for the first time like OOP was helping me rather than getting in the way. Rails’ heavy use of this technique really opened my eyes to the possibility of thinking about numbers not as dumb literals but instead as intelligent entities that can tell me things, like the date 5.days.ago.

3. I write code for maximum legibility

I have read “source code is for people, not computers” often enough, but I didn’t follow the advice. I tried to think of the best way to represent a problem abstractly, in the domain of the computer, rather than the best way to represent it syntactically, for a human reader. Even worse, in the spirit of protecting my own carpals, I abbreviated everything. As a result, my code was unreadable even to me an hour after I wrote it. The problem is, I was thinking about the abstraction behind the code rather than the surface of what it actually said. If I couldn’t remember what I was thinking at the time I wrote a piece of code, it would be gibberish when I went back to it. Rails has shown me good examples of “writing on the surface.” Even simple practices like giving plural names to plural variables (arrays, tables) and using plain English names go a long way to bringing my abstract thinking closer to the surface of what I write.

My new goal is to write for maximum legibility, to write code that any competent programmer could read through once and immediately understand. I need to rewrite things a lot to achieve that, and I’m sure I fall short of the goal, but the extra effort is worth it for just being able to read my own code.

4. I learned how to deal with a database properly

Migrations were truly a revelation. They would have saved me a lot of headaches on past jobs, and now I would never attempt anything database-related without them.

5. My good habits are encouraged

One thing Rails didn’t have to sell me on is automated tests — I was already sold. But having them built into the framework is great validation of that belief. Now I feel guilty when I don’t have enough tests instead of guilty for spending valuable time writing them.

I’ve also learned the value of conventions, both having and following them. Rails’ conventions, particularly for naming, are flexible enough that I don’t feel a perverse desire to be different. So my code actually looks and behaves a lot like the documented examples!

Conclusion

I still have a lot to learn, both about programming and about Ruby on Rails, but I’m learning new things that make me a better programmer without taking the fun out of it. I’m enjoying programming again, the way I did when I wrote my first real GUI, my first Perl script, or even my first BASIC on a Timex Sinclair 1000.

Comments No Comments »

I like Lisp’s prefix syntax. It’s consistent, has natural structure, and makes code-manipulation macros possible. But it’s not always the easiest to read or write. For example, I often want to apply several successive transformations to the same chunk of text. In Perl, I could use the default variable $_ and then just write a bunch of regular expressions:

s/this/that/g;
s/old/new/g;
s/foo/bar/g;

Very succinct, but a tad cryptic. But the equivalent in Common Lisp, using the CL-PPCRE regular expression library, is much worse:

(regex-replace-all "foo"
		   (regex-replace-all "old"
				      (regex-replace-all "this" string "that")
				      "new")
		   "bar")

CL-PPRCE’s regex-replace-all function puts the original string in between the regex and replacement string in its argument list, which makes the syntax awkward. I usually avoid writing nested expressions like the one above and instead factor each replacement out into a separate function:

(defun replace-foo (string)
  (regex-replace-all "foo" string "bar"))

(defun replace-old ...)

(defun replace-this ...)

(replace-foo (replace-old (replace-this string)))

But who wants to define three extra functions just for one expression?

Now I’m exploring Ruby, and was pleased to find how easy it is to write this:

string.gsub('this','that').gsub('old','new').gsub('foo','bar')

This is succinct and reads easily from left to right. This sort of procedure is where the classic object.method(arguments) syntax really shines. At least for me, it makes sense because it’s how I tend to think about a problem: “Take this object, do this to it, then do something else to it, then give me back the result.”

The trouble I have with prefix syntax is that it feels backwards. To read Lisp code, even my own, I have to dig through the parentheses to find the innermost expression, then work my way back out again. Of course, that’s basically what a Lisp interpreter or compiler does.

I like to think it would be possible to combine the flexibility of Lisp’s S-expressions with the left-to-write readability of object.method, but I don’t know what that would be. I have little experience with Forth-style postfix syntax, but it seems even less readable. But I think this just goes to show that syntax does matter.

Comments 5 Comments »

Well, a new year, and (finally) a new post. In the past two weeks I have undertaken a complete rewrite of Project Posner from Common Lisp to Ruby on Rails. Now, before the Lispniks descend upon me with their sharp parenthetical barbs, allow me to explain. The Common Lisp version was never anything more than a cheap hack: a few hundred lines of code that crawls through a few tens of megabytes of plain-text documents and spits out about the same amount of HTML. It’s completely off-line, static, Web 0.5 stuff. For a search engine I used ht://Dig, whose last release was in 2004. All that being said, Common Lisp was a great language for doing it, and definitely made the process easier and more fun than it would have been in any other language.

But I want to move on with a more sophisticated search, more dynamic features (highlighting search terms and personal search histories to name just two), and, of course, AJAX! I could do all that in Common Lisp. Several people have successfully done so. But many hundreds more have done so with Ruby. With Rails, half the work is already done for me. I don’t have to think about how to connect to a database or even how to name my files. Someone else has already done that work. When I had a problem with Rails dropping MySQL connections on Ubuntu, Google delivered a one-command solution on the first try. Compare that with the endless speculation and one-upsmanship that might accompany such a query on comp.lang.lisp.

So any distaste I may have for Ruby’s syntax is completely overcome by my delight at Rails’ helpfulness. I’d still rather be working in Lisp, but Ruby is good enough, and Rails is better. It is not the path of least resistance — would that be PHP? — but it is the path of least work. As someone wrote, if Lisp’s audience had been harried sysadmins rather than AI researchers, it’d rule the world by now.

Comments No Comments »

Amazon has a beta up of an interesting little app called UnSpun. It’s a way to create and vote on “best of” lists for any subject. It’s a little like Reddit, but less news-oriented. Ruby currently leads Best Programming Language by a 7-to-1 margin, not surprising given that the site’s built on Rails. I’m glad to see that Lisp made it to number 6, but why is it right below APL??

Comments No Comments »

Perhaps I was premature worrying about how slow Ruby is. John Wiseman was benchmarking Montezuma, his Common Lisp port of Ferret/Lucene, and found out in the process that Ferret is 10 times faster than Java Lucene! As he says, Ferret gets help from about 65,000 lines of C code.

I’ve heard this before, perhaps not often enough to make a generalization, but at least enough to identify a trend: if you want performance from Ruby code, rewrite it in C. (The same is sometimes said of Python, or really any interpreted language.) The basic approach seems to be to extract the most performance-critical parts of your dynamic, interpreted language program and rewrite them in a static, compiled language, thus retaining most of the benefits of both.

It’s an interesting contrast to what I see as the Common Lisp approach to optimization, which is to keep everything in Lisp but add compiler declarations in hopes of speeding it up. Trouble is, unless you’re an expert on the inner workings of your compiler (or can read the disassembled code) it’s hard to know exactly what effects a particular declaration will have.

Eventually, I think manual optimization will become unnecessary. Experimental compilers like Stalin have been shown to produce faster machine code than hand-coded C. Stalin compiles a subset of Scheme down to a subset of C, making heavy use of type-inferencing and static analysis. If it can be done with Scheme, surely it can be done with Python, Ruby, or any other dynamic language.

Comments No Comments »

I continue to sweat (see previous entry) over the question of language choice for future iterations of Project Posner (and some as-yet-unnamed similar projects). Ruby on Rails is the obvious mainstream choice, mainstream at least compared to Lisp. But a part of me really wants to do it in Common Lisp, just to prove I can.

One concern I do have speed. Ruby is pooh-poohed for being slow, which, its true, is not really fair for a 1.x version scripting language, but the Programming Language Shootout does support the accusation. I tried comparing Ruby and SBCL on the Shootout. As I expected, SBCL is up to several hundred times faster than Ruby, but I did not expect that Ruby would use two to five times less memory.

Maybe Ruby’s data structures are very close to their C analogs, lacking the extra padding that Lisp needs for type identification? But no, Ruby is dynamically typed, too, so surely it needs just as many tag bits. Ah, I know: The test must be counting the large size of the SBCL runtime (over 20MB, I recall reading somewhere) compared to Ruby’s (less than 2MB). For a limited-duration algorithmic test, this would certainly dominate the results.

I wonder, though: over longer run times, which language would use less memory for actual data storage? I suspect that carefully optimized Lisp arrays would win, but Ruby’s arrays, the standard way to represent lists in Ruby, might fit in less space than a linked list structure, the standard way to represent lists in Lisp.

Comments 1 Comment »

Perl was the first programming language I really liked, the first language that made programming fun.

Perl has three basic types: “scalars” for atomic values, arrays for ordered sets, and hash tables for unordered sets. (Yes, there are others, but those are the popular ones.) I quickly discovered that these three types can be combined to produce most any data structure you might need. Need an ordered list of records? Use an array of hashes. Need a tree of named elements with attributes (e.g. XML)? Use nested arrays with hashes in them.

These basic types can also be conveniently mapped to external data. A CSV file can be represented as an array of hashes. A database table can be an array of arrays, an array of hashes, or a hash of hashes, whichever you prefer.

Python and Ruby both followed in Perl’s footsteps here. (Python calls them “lists” and “dictionaries.”) Lisp, predating all these new-fangled “scripting” languages, includes lists, arrays, hash tables, plus a whole raft of other built-in types. This is one of those areas that makes Common Lisp difficult for the beginner to grasp. When should you use a plist, an alist, or a hash table? When should you use arrays and when should you use lists? The answers to these questions delve into details of how the various structures are implemented. The only obvious criteria for choice is speed at handling a given data set, something a beginning programmer doesn’t want to worry about when designing a new piece of code.

At least one hacker has implemented generic get/set functions for all of Common Lisp’s data types, but to my knowledge no one has implemented abstract ordered/unordered set types that don’t care about their implementation. CL-Containers is a good foundation, but it further complicates the issue by adding a bunch of new data types.

What I want is a general-purpose “collection” class, of which instances can be declared ordered or unordered, numerically-indexed or key-indexed. Something like this:


(define-collection my-set
  :ordered t
  :index string)

Then, based on the data that I feed in to that class and the operations I perform on it, the compiler decides what sort of data structure to use for maximum efficiency. Or, if that’s too much magic to ask, at least let me change the underlying implementation without affecting any of the code that uses the collection:

(implement-collection my-set
  (array :resizable
         :elements (cons string object)))

Comments 4 Comments »

Ah, the loop, so fundamental to programming it’s hard to imagine a single program without one. After all, what’s the use of calculating just one thing? Usually you have a big pile of things you want to calculate, which is why you need a computer in the first place.

I think one of the quickest ways to get a feel for a language is to study its looping constructs. I make no pretense that this is a complete or even an accurate list, but these are some of the general iteration patterns I’ve noticed in different languages.

Counter Variable Loop

for ( int i = 0; i <= 10; ++i ) {
  do stuff with list[i];
}

The old C classic, with deeper roots, I believe, in FORTRAN or PASCAL or both. Successive values of an integer counter are used to retrieve input from an array. Very efficient for small data sets, but requires the entire input to be stored as an array in memory. Also requires the looping code to know about the structure of the input, so not very adaptable. But surprisingly resiliant: Perl and Java support the same syntax.

For-Each Loop

foreach item in list
   do stuff with item
done

Probably the most popular syntactic looping construct, and easy to see why: it’s very easy to read and understand. The for-each loop shows up in Perl, Python, Visual Basic, and a host of other languages. Because it’s usually built in to the language syntax, it can rarely be extended to non-standard container types.

Container Method Loop

list.each do |item|
   do stuff with item
end

In purely object-oriented languages like Smalltalk and Ruby (shown above), looping constructs can be implemented as methods of container classes. This has the great advantages that new looping constructs can be added and standard loops can be implemented for new container types. Since the code “inside” the loop is just an anonymous function that takes a single item as its argument, it doesn’t need to know anything about the structure or type of the container.

List Comprehensions

[ item.do_stuff() for item in list ]

Although Python (syntax above) has gotten a lot of press, both good and bad, for its adoption of list comprehensions, they’ve been around a lot longer. I believe they were originally developed to describe lists in a way that looks more like mathematics. For simple patterns list comprehensions are easy to understand, but I don’t yet grok their full significance. They can be nested and combined to produce complex looping patterns that would be awkward to write with C-style iteration.

Half-Nelson Functional Loop

(map  (lambda (item)  do stuff with item)  list)

(Update 26 July 2006: Replaced the idiotic example (map (lambda (item) (process item)) list) with the above. I’m talking about block structures in code here, using the body of the lambda like the body of a for loop in the other languages. Obviously, if your loop body was just a single function, you would just use (map #'process list). Same changes apply to the next example below.)

Functional languages, including most dialects of Lisp, usually have a map operator that takes a function and a list and applies that function successively to each element of that list. I call this the Half-Nelson Functional Loop because it’s not, to my mind, the ultimate of functional behavior. For that, we turn to…

Full-Nelson Functional Loop

((lift (lambda (item)  do stuff with item))  list)

This looks pretty much like the previous example. But here, instead of map, we have lift, which takes only one argument, a function, and returns a new function that applies that function to every element of a list. That new function is then applied (here, in a Scheme-like syntax) to the list argument. I learned about lift from this blog article.

Comments 6 Comments »