Posts Tagged “Perl”
Posted by: Stuart in Programming, tags: Perl, Ruby
An interesting tidbit: can your programming language parse a < b < c? Perl can’t. Ruby can, but returns an error “undefined method `>’ for false:FalseClass.” Interestingly, Python accepts it, and even gives the correct result. Something clever must be going on in the parser to make that work.
Update October 17: Although Lisp can’t parse the expression directly, it does correctly handle the equivalent S-expression (< a b c).
2 Comments »
Posted by: Stuart in Programming, tags: Perl, Ruby
I used to be a big fan of Perl. It was the first programming language I really liked. I felt like it didn’t get in my way. CPAN was and still is the best collection of open-source libraries ever assembled.
Then I got into Ruby, and was very happy with the way it cleaned up Perl’s syntax but still didn’t get in my way. There aren’t nearly as many Gems as CPAN modules, but it’s a solid foundation.
Recently I went back to Perl for a small project. Aside from the predictable mistakes — I kept trying to put colons before my hash keys to make them like Ruby symbols — I realized how nice Ruby really is. Perl is definitely powerful, but once you get beyond the text munging it was designed for it can get irritating, especially the ass-backwards way it does OOP. And I found myself missing Ruby’s code blocks. Perl has code blocks too, but understanding them takes a whole book.
I also realized why Ruby is fairly unique among semi-mainstream programming languages: unlike Perl, Python, JavaScript, or VB, it is based around a small set of core concepts: objects, methods, and code blocks. They’re used everywhere, even for control structures and fancy meta-programming. This is good, because it reduces the number of concepts one has to think about to use the language and, paradoxically, makes it easier to add new features. Lisp started the same way with its core concepts: S-expressions and macros.
2 Comments »
I like Lisp’s prefix syntax. It’s consistent, has natural structure, and makes code-manipulation macros possible. But it’s not always the easiest to read or write. For example, I often want to apply several successive transformations to the same chunk of text. In Perl, I could use the default variable $_ and then just write a bunch of regular expressions:
s/this/that/g;
s/old/new/g;
s/foo/bar/g;
Very succinct, but a tad cryptic. But the equivalent in Common Lisp, using the CL-PPCRE regular expression library, is much worse:
(regex-replace-all "foo"
(regex-replace-all "old"
(regex-replace-all "this" string "that")
"new")
"bar")
CL-PPRCE’s regex-replace-all function puts the original string in between the regex and replacement string in its argument list, which makes the syntax awkward. I usually avoid writing nested expressions like the one above and instead factor each replacement out into a separate function:
(defun replace-foo (string)
(regex-replace-all "foo" string "bar"))
(defun replace-old ...)
(defun replace-this ...)
(replace-foo (replace-old (replace-this string)))
But who wants to define three extra functions just for one expression?
Now I’m exploring Ruby, and was pleased to find how easy it is to write this:
string.gsub('this','that').gsub('old','new').gsub('foo','bar')
This is succinct and reads easily from left to right. This sort of procedure is where the classic object.method(arguments) syntax really shines. At least for me, it makes sense because it’s how I tend to think about a problem: “Take this object, do this to it, then do something else to it, then give me back the result.”
The trouble I have with prefix syntax is that it feels backwards. To read Lisp code, even my own, I have to dig through the parentheses to find the innermost expression, then work my way back out again. Of course, that’s basically what a Lisp interpreter or compiler does.
I like to think it would be possible to combine the flexibility of Lisp’s S-expressions with the left-to-write readability of object.method, but I don’t know what that would be. I have little experience with Forth-style postfix syntax, but it seems even less readable. But I think this just goes to show that syntax does matter.
5 Comments »
Posted by: Stuart in Programming, tags: Lisp, Perl
Hello, Lisp world! This is my first released Common Lisp code. Perl in Lisp is a Common Lisp interface to the Perl 5 API. It allows you to run a Perl interpreter embedded inside Lisp and evaluate Perl code. It does not require any C wrapper code — the API definitions are done with CFFI and the rest is pure ANSI Common Lisp.
In response to the obvious question, “Why on Earth would you want to do such a thing?” my best answer is “Why not?” I thought it would be fun. It ended up being more difficult than I expected — the Perl API is not for the faint of heart, nor for those unwilling to dig through source code. But it does work.
This was also an experiment to see if I could follow two “best practices” of software development — literate programming and unit testing — at the same time. It wasn’t always easy, and it tripled the amount of work I had to do, but the end result was definitely worth it. Thanks to the literate source, I understand what all of the code does. Thanks to the unit tests, I know that it works.
This is a beta release. It can evaluate strings of Perl code, call Perl functions, and convert between Lisp and Perl types. Callbacks from Perl to Lisp are not yet supported. Some Perl modules may not work, particularly if they depend on external C libraries.
See the project page for implementation compatibility notes, download links, and documentation.
Potentially, it could be very useful. CPAN has over ten thousand modules for doing all sorts of obscure things. Say you want to output an Excel spreadsheet from your CL application. Just use Spreadsheet::WriteExcel.
Jeremy Smith started a similar project for embedding Python: PythOnLisp.
2 Comments »
Perl was the first programming language I really liked, the first language that made programming fun.
Perl has three basic types: “scalars” for atomic values, arrays for ordered sets, and hash tables for unordered sets. (Yes, there are others, but those are the popular ones.) I quickly discovered that these three types can be combined to produce most any data structure you might need. Need an ordered list of records? Use an array of hashes. Need a tree of named elements with attributes (e.g. XML)? Use nested arrays with hashes in them.
These basic types can also be conveniently mapped to external data. A CSV file can be represented as an array of hashes. A database table can be an array of arrays, an array of hashes, or a hash of hashes, whichever you prefer.
Python and Ruby both followed in Perl’s footsteps here. (Python calls them “lists” and “dictionaries.”) Lisp, predating all these new-fangled “scripting” languages, includes lists, arrays, hash tables, plus a whole raft of other built-in types. This is one of those areas that makes Common Lisp difficult for the beginner to grasp. When should you use a plist, an alist, or a hash table? When should you use arrays and when should you use lists? The answers to these questions delve into details of how the various structures are implemented. The only obvious criteria for choice is speed at handling a given data set, something a beginning programmer doesn’t want to worry about when designing a new piece of code.
At least one hacker has implemented generic get/set functions for all of Common Lisp’s data types, but to my knowledge no one has implemented abstract ordered/unordered set types that don’t care about their implementation. CL-Containers is a good foundation, but it further complicates the issue by adding a bunch of new data types.
What I want is a general-purpose “collection” class, of which instances can be declared ordered or unordered, numerically-indexed or key-indexed. Something like this:
(define-collection my-set
:ordered t
:index string)
Then, based on the data that I feed in to that class and the operations I perform on it, the compiler decides what sort of data structure to use for maximum efficiency. Or, if that’s too much magic to ask, at least let me change the underlying implementation without affecting any of the code that uses the collection:
(implement-collection my-set
(array :resizable
:elements (cons string object)))
4 Comments »
Ah, the loop, so fundamental to programming it’s hard to imagine a single program without one. After all, what’s the use of calculating just one thing? Usually you have a big pile of things you want to calculate, which is why you need a computer in the first place.
I think one of the quickest ways to get a feel for a language is to study its looping constructs. I make no pretense that this is a complete or even an accurate list, but these are some of the general iteration patterns I’ve noticed in different languages.
Counter Variable Loop
for ( int i = 0; i <= 10; ++i ) {
do stuff with list[i];
}
The old C classic, with deeper roots, I believe, in FORTRAN or PASCAL or both. Successive values of an integer counter are used to retrieve input from an array. Very efficient for small data sets, but requires the entire input to be stored as an array in memory. Also requires the looping code to know about the structure of the input, so not very adaptable. But surprisingly resiliant: Perl and Java support the same syntax.
For-Each Loop
foreach item in list
do stuff with item
done
Probably the most popular syntactic looping construct, and easy to see why: it’s very easy to read and understand. The for-each loop shows up in Perl, Python, Visual Basic, and a host of other languages. Because it’s usually built in to the language syntax, it can rarely be extended to non-standard container types.
Container Method Loop
list.each do |item|
do stuff with item
end
In purely object-oriented languages like Smalltalk and Ruby (shown above), looping constructs can be implemented as methods of container classes. This has the great advantages that new looping constructs can be added and standard loops can be implemented for new container types. Since the code “inside” the loop is just an anonymous function that takes a single item as its argument, it doesn’t need to know anything about the structure or type of the container.
List Comprehensions
[ item.do_stuff() for item in list ]
Although Python (syntax above) has gotten a lot of press, both good and bad, for its adoption of list comprehensions, they’ve been around a lot longer. I believe they were originally developed to describe lists in a way that looks more like mathematics. For simple patterns list comprehensions are easy to understand, but I don’t yet grok their full significance. They can be nested and combined to produce complex looping patterns that would be awkward to write with C-style iteration.
Half-Nelson Functional Loop
(map (lambda (item) do stuff with item) list)
(Update 26 July 2006: Replaced the idiotic example (map (lambda (item) (process item)) list) with the above. I’m talking about block structures in code here, using the body of the lambda like the body of a for loop in the other languages. Obviously, if your loop body was just a single function, you would just use (map #'process list). Same changes apply to the next example below.)
Functional languages, including most dialects of Lisp, usually have a map operator that takes a function and a list and applies that function successively to each element of that list. I call this the Half-Nelson Functional Loop because it’s not, to my mind, the ultimate of functional behavior. For that, we turn to…
Full-Nelson Functional Loop
((lift (lambda (item) do stuff with item)) list)
This looks pretty much like the previous example. But here, instead of map, we have lift, which takes only one argument, a function, and returns a new function that applies that function to every element of a list. That new function is then applied (here, in a Scheme-like syntax) to the list argument. I learned about lift from this blog article.
6 Comments »
Despite all of the AJAX/Web 2.0 hype, the fact remains that most web pages are mostly static. The most efficient way to serve static pages is unquestionably to store them as static files on a file-based web server such as Apache. I add new pages to this site once every few days at most, but I’m still using a framework (WordPress) that requires the server to execute dozens of lines of PHP code and make several database calls for every page request. This seems like a tremendous waste, even though it makes it very easy for me, the maintainer, to add new content whenever I want to.
In the past, I built this site and others by writing programs (usually shell scripts calling an XSLT processor on a series of stylesheets) to generate static HTML from content stored in XML source files. Unfortunately, this method made it difficult or impossible to update only the content that had changed, so I ended up regenerating the entire site every time I updated one page. For a small site this was not a problem, but as the site grew larger it was cumbersome.
On a commercial site I experimented with a variation on this process: I still stored content in XML source files, but did not generate any HTML until it was requested. If an HTTP request came in for a file that did not exist on the server, an .htaccess directive would call a Perl script that generated the requested page and then saved it as a file at the original requested URL. Then on the next request for that URL, Apache would simply serve that file. To update a page, all I had to do was modify the source file and delete the “cached” HTML file.
This caching technique proved very reliable, and meant I did not have to worry much about the efficiency of my HTML-generation code. I could simulate “dynamic” pages that changed on a schedule by setting up a cron job that would delete the cached HTML file on a regular basis. By adding some directory prefixes and URL rewriting, I was even able to simulate a kind of session tracking without cookies or hidden form fields.
So getting back to my blog, why can’t it use the same technique? Store the content in a database, yes, but never render anything more than once. (In programming, this would be called memoization.) If one change would affect many pages, simply flag those pages as out-of-date and regenerate them when they are requested.
On a related note, someone on comp.lang.lisp suggested that Kenny Tilton’s Cells dataflow extension to Common Lisp might be useful for web applications. I hope to have some time to explore something along these lines using Cells as a front-end to output static HTML files.
1 Comment »
Posted by: Stuart in Programming, tags: Perl
A common complaint about Object-Oriented Programming (OOP) is that classes can make simple data hard to deal with: “I don’t want a DirectoryList object, I just want a list of strings.” I would say this is not a flaw inherent in OOP but rather in the way it is commonly used. “Encapsulation” and “data hiding” encourage creation of abstract interfaces completely divorced from their underlying implementation, even when that underlying implementation is in fact very simple. The problem is that abstract interfaces usually only provide a tiny subset of the operations one might potentially want to perform on the data hiding behind them.
What we have, essentially, are interfaces that abstract “down” to increasing levels of speficity, farther and farther away from the data themselves. Wouldn’t it be better if we could abstract “up,” choosing the representation of the data based on what we want to do with it? For example, take a very common, very general case: a collection of data elements with similar structure. There are only a few basic operations we are likely to perform on such a collection:
- Add an item
- Remove an item
- Modify an item
- Sort the items based on some criteria
- Find one or more items that satisfy certain criteria
- Perform agregate calculations on the items (max, min, etc.)
This is the genius of SQL: it provides the first three with simple commands, INSERT, UPDATE, and DELETE; and the last three with the multipurpose SELECT. An SQL query looks the same regardless of how big the data set is or what algorithm is used to index it. Unfortunately, this idea has never quite made it into general purpose programming languages.
Ultimately, it shouldn’t matter whether data are stored in an array in local memory or on a cluster of relational database servers located across the internet. The things we want to do with the data remain the same. A compiler should be able to translate abstract operations on collections into the actual work needed to perform them in a specific implementation. Ideally, we could just start feeding data into the system and let the compiler choose the most efficient implementation, switching over automatically when the collection becomes too big for a particular method. This is similar to the way Common Lisp (and other languages) automatically switch to arbitrary-precision integers when a calculation grows too large for fixed-width integers.
For an example of this, I turn, strangely enough, to one of the hairier aspects of Perl: tied variables. Basically, a tied variable looks and behaves exactly like one of Perl’s built-in data types (scalar, array, or hash table) when you use it, but it is actually an object that calls methods in response to the operations performed on it. For example, the Tie::File module represents a file as an array of strings, one per line. Delete an item from the array, and you remove a line from the file. Same with insertion, modification, and sorting. Presto! A complete random-access file library that doesn’t require the user to learn a new interface. The implementation does not load the whole file into memory, so it is efficient even for very large files. Iterating over lines in a file used to require a special syntax unique to files; with Tie::File it can be done with standard Perl control structures that iterate over lists.
The even more ambitious Data::All module attempts the same thing in a more general way: tying any external data format (CSV, RDBMS, Excel spreadsheets…) to standard Perl data structures. It’s still alpha code, but the idea is brilliant.
In short, I would like object-oriented interfaces that abstract data towards its simplest possible representation rather than the most specific representation. I do not want objects that limit what I can do with the data they contain.
1 Comment »
Posted by: Stuart in Programming, tags: Perl
Steve Oualline wrote a nifty little Perl program to graph regular expressions. And Oliver Steele wrote an even niftier OpenLaszlo app to show how regular expressions work.
Together, they make the best (unintended) argument I’ve seen for visual programming languages. As Oualline writes in his article, “Humans can process images far faster and better than any computer. We don’t do so well when it comes to text.” His demonstration with a 359-character regular expression for validating email addresses proves the point.
Programmers need better tools to help them visualize code. Architects and engineers have been using highly sophisticated CAD tools for years. Why don’t programmers have the same tools?
No Comments »
|