All Your Base

Let’s have the database models strut down the runway:

Relational (SQL):

  • Data consist of rows in tables.
  • Each row has multiple columns.
  • Each column has a fixed type.
  • Queries use filters and joins.
  • Fixed schema is defined separately from the data.
  • User-defined indexes improve query performance.
  • Robust transaction/data-integrity support.

Graph (RDF):

  • Data consist of nodes and labeled links between them.
  • Nodes may have type information.
  • Optional schema is defined in the same model (e.g. RDFS, OWL).
  • Queries use graph-based constraints and filters.
  • Indexes are not optimized for specific datasets.
  • Limited transaction support.

Tree (XML, filesystem):

  • Data are identified by a name and a position within a hierarchy.
  • Queries use hierarchies and filters.
  • Data may have type information.
  • Optional schema may be defined in the same model (XML-Schema) or outside it (DTD).
  • Transaction support varies by implementation.

Hash (BDB, memcached):

  • Data are untyped binary blobs.
  • Data are identified by unique keys in a flat namespace.
  • Only key-based lookups are possible.
  • Limited transaction support.

Inverted Index (Lucene/Solr):

  • Data are text documents.
  • Queries are free-form text with some special operators.
  • Text queries are fast.
  • Query results can be scored/ranked.
  • Joins are inefficient; denormalized data are preferred.
  • Limited transaction support.

Document-Oriented (CouchDB):

  • Data are identified by unique keys in a flat namespace.
  • Data have named fields.
  • Optional schema defined by application code.
  • Denormalized data are preferred.
  • Limited type information.

Column-Oriented (Hbase, BigTable):

  • Data consist of rows in tables.
  • Each row has multiple columns, rows are assumed to be sparse.
  • Data from a single column are stored contiguously to improve performance.
  • Designed for distributed processing environments such as Hadoop and MapReduce.