Posts Tagged “World Wide Web”
Well, a new year, and (finally) a new post. In the past two weeks I have undertaken a complete rewrite of Project Posner from Common Lisp to Ruby on Rails. Now, before the Lispniks descend upon me with their sharp parenthetical barbs, allow me to explain. The Common Lisp version was never anything more than a cheap hack: a few hundred lines of code that crawls through a few tens of megabytes of plain-text documents and spits out about the same amount of HTML. It’s completely off-line, static, Web 0.5 stuff. For a search engine I used ht://Dig, whose last release was in 2004. All that being said, Common Lisp was a great language for doing it, and definitely made the process easier and more fun than it would have been in any other language.
But I want to move on with a more sophisticated search, more dynamic features (highlighting search terms and personal search histories to name just two), and, of course, AJAX! I could do all that in Common Lisp. Several people have successfully done so. But many hundreds more have done so with Ruby. With Rails, half the work is already done for me. I don’t have to think about how to connect to a database or even how to name my files. Someone else has already done that work. When I had a problem with Rails dropping MySQL connections on Ubuntu, Google delivered a one-command solution on the first try. Compare that with the endless speculation and one-upsmanship that might accompany such a query on comp.lang.lisp.
So any distaste I may have for Ruby’s syntax is completely overcome by my delight at Rails’ helpfulness. I’d still rather be working in Lisp, but Ruby is good enough, and Rails is better. It is not the path of least resistance — would that be PHP? — but it is the path of least work. As someone wrote, if Lisp’s audience had been harried sysadmins rather than AI researchers, it’d rule the world by now.
No Comments »
I just started playing with this, and already I love it: Zotero. It’s like a bookmark manager crossed with a note-taking program crossed with BibTeX.
Zotero is an extension that runs inside Firefox 2.0 — click the icon, and it captures a complete bibliographic record of the page you’re looking at, and saves a copy. This is vital when you need to cite web pages that may not be permanent.
Even better, it has scrapers (they call them “translators”) for a bunch of online databases, the kind you get in university libraries. Say you’re reading the online PDF version of an article that appeared in a journal. Click on Zotero, and it saves the PDF then stores both the URL and the journal name, volume, number, page, author, title, etc. Then click another button and it spits out a bibliography in APA, MLA, or Chicago form.
It works for offline resources too. Look up a book in a card catalog, and click to record the bibliography. Add your own notes and links to files on your hard drive. Of course, it has a search function, with tagging promised in future release. It’s open-source.
Unlike the cool-but-geeky note-taking programs (like desktop Wikis), Zotero is designed for scholarly work, and it has some big-name research institutions behind it. Here’s hoping it continues to grow.
No Comments »
The first draft of Project Posner was written in Common Lisp. I thought it would be fun to see how Common Lisp fared as a language for doing heavy text processing with a web front end. It worked well, and I’m convinced it made the process easier than it would have been with any other language. But everything I’ve done with it up till now is off-line. I used Lisp to statically generate the site on my desk, then uploaded the HTML pages to the server. Search is handled by ht://Dig, an old-school CGI app written in C.
I’d love to continue to develop Project Posner in Lisp, especially since I’m currently the only programmer working on it. But to add any more features I need server-side programming. I find myself wondering … do I dare try to use Lisp? First off, I’d have to get a new web host, probably a virtual server, since no shared-host server offers Lisp pre-installed. That would cost more. Then I’d have to set up and maintain the OS on the server, which I’d frankly rather not be bothered to do. Then there’s the multiple headache of getting Apache, mod_Lisp, PostgreSQL, and a CL implementation all running and talking to one another. Then, and only then, can I start work on the application itself. And then I don’t have much pre-written code to draw on. Sure, HTML and JavaScript generation is in the bag, but there aren’t any drop-in libraries for forums, guestbooks, user authentication, or any of that good stuff.
I could probably write that stuff myself in Lisp. But could I do it faster and better than the hundreds of other people who have already done it in Perl/PHP/Python/Ruby? I don’t think so. I’m not that good.
So there it is. Web application development is an evolving problem, but by and large a solved one. And it wasn’t solved in Lisp. When Paul Graham was creating Viaweb, no one else was even thinking of web applications, so he had to create his own tools. But the biggest recent poster child for Lisp on the Web, Reddit, gave up and switched to Python (to much gnashing of teeth in the Lisp world). It has nothing to do with the language itself. Lisp is still great. It’s all about the tools, the libraries, the “borrowablility” of other people’s code.
So I’ll continue using Lisp for off-line stuff, private projects and such. But for building Project Posner version 2.0, I’ll probably look elsewhere.
3 Comments »
Been too busy with work and class to post much, but here’s a link for all the IANALs out there: Project Posner. It’s an on-line database collecting the case opinions of Richard A. Posner, judge on the 7th Circuit Court of Appeals. This was the brainchild of law professor and former Posner clerk Tim Wu. I wrote all the code to parse and format the cases, in 100% Common Lisp! Specifically, about 800 lines of code that spits out 36 MB of static HTML in about 5 minutes — whee! Currently having some problems with Google’s free web search; maybe they’ll crawl the site now that I’ve linked to it. Or maybe I’ll break down and implement my own search function. In any case, take a look, comments welcome.
1 Comment »
Posted by: Stuart in Uncategorized, tags: User Interfaces, World Wide Web
I just returned from a short vacation with a little business mixed in. On the third day of my trip, I realized I needed to check my email. My hotel had free in-room Ethernet connections, but I hadn’t thought to bring my laptop with me. No problem, I thought, since the hotel also had one of those TVs with a wireless keyboard for web browsing. So I turned on the television, punched buttons for Internet access, and accepted the $9.95 charge for 24 hours’ use — a little steep, I thought, but I was only going to use it once.
To my dismay, the service was barely usable. The Web browser in the television seemed to be about Netscape 3.0 level, unable to render new-fangled sites like my ISP’s web mail. Half the text was hidden off the left side of the screen, and there was no horizontal scroll bar. To make matters worse, it operated at dial-up speeds. As I watched the “progress” bar creep along the bottom of the screen, I reflected on why WebTV never caught on. I think it’s because no one ever bothered to get it right before other technologies — small, cheap laptops with WiFi — took over.
No Comments »
Quiet A.I. I think this will be the way A.I. ultimately sneaks in to everyday life. It’s already happening on the web. But this response on kuro5hin is a fair warning. Choose carefully what you feed your digital “children”!
No Comments »
Posted by: Stuart in Uncategorized, tags: User Interfaces, World Wide Web
The just-released Dabble DB is, to my mind, one of the most innovative pieces of software since the spreadsheet. It’s a relational database that can figure out your data relations for you. It’s a spreadsheet that can run structured queries on your data. It’s an on-line calendar with RSS feeds. It’s a web form processor that understands dates like “next Tuesday.” It’s a platform for JavaScript development. It’s so cool.
No Comments »
Or, What I Have In Common With Craig Silverstein.
I’ve been enjoying John Battele’s The Search, a history of the search engine business from Archie to Google. He quotes Google’s first employee, Craig Silverstein, as saying, “I would like to see the search engines become like the computers in Star Trek. You talk to them and they understand what you’re asking.”
This is exactly what I’ve often said I want, except that I would extend the concept beyond search engines to computers in general. This leads me to a Grand Prediction On The Future Of Computing. I call it “Do Engines.” No, seriously, stay with me here. Google has gotten pretty close to the Star Trek computer when it comes to one specific task, namely, searching for information. The Google search box lets you tell the computer “find this” and it gets what you want.
The next stage must be the ability to say “do this” and have the computer know what you want, as in “email my résumé to Craig Silverstein.” Natural-language processing has so far had little success at this. (Battele cites GNP Development, a product that added natural-language spreadsheet formulas to to Lotus 1-2-3, but it didn’t catch on.) I believe that now, with the advent of web-based applications for desktop tasks such as word processing and spreadsheets, the “do engine” can become a reality. Instead of digging through menus and dialog boxes to find a command or setting to achieve the effect you want, you just type in what you want and click “do.”
This can leverage the vastness of the web much like open-source software. If I want to accomplish a specific task with my computer, I might search on Freshmeat, CPAN, or the Common Lisp Directory to find a piece of code that does what I want. If a web-based application has a programmable API with that kind of user community, the same advantages come to everyone who uses them. Therefore, the successful web-based applications will be the ones that make it possible for users to extend them beyond what the original desigers imagined. Google Maps mashups are a perfect example of this, but the trend at the moment seems to be simply porting traditional desktop applications to JavaScript, e.g. Google Spreadsheets. By leveraging the input of millions of users, a web application can “know” how to do common tasks the same way a search engine “knows” how to find things.
No Comments »
Despite all of the AJAX/Web 2.0 hype, the fact remains that most web pages are mostly static. The most efficient way to serve static pages is unquestionably to store them as static files on a file-based web server such as Apache. I add new pages to this site once every few days at most, but I’m still using a framework (WordPress) that requires the server to execute dozens of lines of PHP code and make several database calls for every page request. This seems like a tremendous waste, even though it makes it very easy for me, the maintainer, to add new content whenever I want to.
In the past, I built this site and others by writing programs (usually shell scripts calling an XSLT processor on a series of stylesheets) to generate static HTML from content stored in XML source files. Unfortunately, this method made it difficult or impossible to update only the content that had changed, so I ended up regenerating the entire site every time I updated one page. For a small site this was not a problem, but as the site grew larger it was cumbersome.
On a commercial site I experimented with a variation on this process: I still stored content in XML source files, but did not generate any HTML until it was requested. If an HTTP request came in for a file that did not exist on the server, an .htaccess directive would call a Perl script that generated the requested page and then saved it as a file at the original requested URL. Then on the next request for that URL, Apache would simply serve that file. To update a page, all I had to do was modify the source file and delete the “cached” HTML file.
This caching technique proved very reliable, and meant I did not have to worry much about the efficiency of my HTML-generation code. I could simulate “dynamic” pages that changed on a schedule by setting up a cron job that would delete the cached HTML file on a regular basis. By adding some directory prefixes and URL rewriting, I was even able to simulate a kind of session tracking without cookies or hidden form fields.
So getting back to my blog, why can’t it use the same technique? Store the content in a database, yes, but never render anything more than once. (In programming, this would be called memoization.) If one change would affect many pages, simply flag those pages as out-of-date and regenerate them when they are requested.
On a related note, someone on comp.lang.lisp suggested that Kenny Tilton’s Cells dataflow extension to Common Lisp might be useful for web applications. I hope to have some time to explore something along these lines using Cells as a front-end to output static HTML files.
1 Comment »
|