All Your Base

Posted on March 21, 2008 by Stuart Sierra

Let’s have the database models strut down the runway:

Relational (SQL):

Data consist of rows in tables.
Each row has multiple columns.
Each column has a fixed type.
Queries use filters and joins.
Fixed schema is defined separately from the data.
User-defined indexes improve query performance.
Robust transaction/data-integrity support.

Graph (RDF):

Data consist of nodes and labeled links between them.
Nodes may have type information.
Optional schema is defined in the same model (e.g. RDFS, OWL).
Queries use graph-based constraints and filters.
Indexes are not optimized for specific datasets.
Limited transaction support.

Tree (XML, filesystem):

Data are identified by a name and a position within a hierarchy.
Queries use hierarchies and filters.
Data may have type information.
Optional schema may be defined in the same model (XML-Schema) or outside it (DTD).
Transaction support varies by implementation.

Hash (BDB, memcached):

Data are untyped binary blobs.
Data are identified by unique keys in a flat namespace.
Only key-based lookups are possible.
Limited transaction support.

Inverted Index (Lucene/Solr):

Data are text documents.
Queries are free-form text with some special operators.
Text queries are fast.
Query results can be scored/ranked.
Joins are inefficient; denormalized data are preferred.
Limited transaction support.

Document-Oriented (CouchDB):

Data are identified by unique keys in a flat namespace.
Data have named fields.
Optional schema defined by application code.
Denormalized data are preferred.
Limited type information.

Column-Oriented (Hbase, BigTable):

Data consist of rows in tables.
Each row has multiple columns, rows are assumed to be sparse.
Data from a single column are stored contiguously to improve performance.
Designed for distributed processing environments such as Hadoop and MapReduce.