The Only Data Structures You’ll Ever Need

Perl was the first programming language I really liked, the first language that made programming fun.

Perl has three basic types: “scalars” for atomic values, arrays for ordered sets, and hash tables for unordered sets. (Yes, there are others, but those are the popular ones.) I quickly discovered that these three types can be combined to produce most any data structure you might need. Need an ordered list of records? Use an array of hashes. Need a tree of named elements with attributes (e.g. XML)? Use nested arrays with hashes in them.

These basic types can also be conveniently mapped to external data. A CSV file can be represented as an array of hashes. A database table can be an array of arrays, an array of hashes, or a hash of hashes, whichever you prefer.

Python and Ruby both followed in Perl’s footsteps here. (Python calls them “lists” and “dictionaries.”) Lisp, predating all these new-fangled “scripting” languages, includes lists, arrays, hash tables, plus a whole raft of other built-in types. This is one of those areas that makes Common Lisp difficult for the beginner to grasp. When should you use a plist, an alist, or a hash table? When should you use arrays and when should you use lists? The answers to these questions delve into details of how the various structures are implemented. The only obvious criteria for choice is speed at handling a given data set, something a beginning programmer doesn’t want to worry about when designing a new piece of code.

At least one hacker has implemented generic get/set functions for all of Common Lisp’s data types, but to my knowledge no one has implemented abstract ordered/unordered set types that don’t care about their implementation. CL-Containers is a good foundation, but it further complicates the issue by adding a bunch of new data types.

What I want is a general-purpose “collection” class, of which instances can be declared ordered or unordered, numerically-indexed or key-indexed. Something like this:


(define-collection my-set
  :ordered t
  :index string)

Then, based on the data that I feed in to that class and the operations I perform on it, the compiler decides what sort of data structure to use for maximum efficiency. Or, if that’s too much magic to ask, at least let me change the underlying implementation without affecting any of the code that uses the collection:

(implement-collection my-set
  (array :resizable
         :elements (cons string object)))

4 Replies to “The Only Data Structures You’ll Ever Need”

  1. > Perl has three basic types: “scalars” for atomic values, arrays for ordered sets, and hash tables for unordered sets. (Yes, there are others, but those are the popular ones.) I quickly discovered that these three types can be combined to produce most any data structure you might need.

    If all you have is a hammer, everything looks like a nail.
    I agree that the data types you mention are very useful and convenient, but I think building all data structures only with them is not a good idea. For more complex requirements, it often results in unnecessarily complicated, nested data structures. If do you go that route, though, there’s no need for arrays: there are several languages doing fine with only hashes and scalars.

    Using generic data structures for everything also makes it harder to enforce particular constraints (e.g. on the type of data they should hold). Judicious use of specialized data structures (by creating new structs/classes) also makes code clearer to read and separates the interface from the underlying implementation. Incidentally, Perl also allows by using tied data structures (hashes/arrays), though that feature isn’t used very often (due to the way it is implemented).

  2. Mork said: I agree that the data types you mention are very useful and convenient, but I think building all data structures only with them is not a good idea.

    You’re right; it’s not necessarily a good idea to implement all your data structures with the same three broadly-defined types. What interests me about Perl is that you can represent almost any structure as a combination of those simple types. Tied data structures are a great example, and they make some things really easy.

Comments are closed.