Apache CouchDB: Technical Overview

Sam Rose's picture

The CouchDB file layout and commitment system features all Atomic Consistent Isolated Durable (ACID) properties. On-disk, CouchDB never overwrites committed data or associated structures, ensuring the database file is always in a consistent state. This is a “crash-only” design where the CouchDB server does not go through a shut down process, it’s simply terminated.

Document updates (add, edit, delete) are serialized, except for binary blobs which are written concurrently. Database readers are never locked out and never have to wait on writers or other readers. Any number of clients can be reading documents without being locked out or interrupted by concurrent updates, even on the same document. CouchDB read operations use a Multi-Version Concurrency Control (MVCC) model where each client sees a consistent snapshot of the database from the beginning to the end of the read operation.

Documents are indexed in b-trees by their name (DocID) and a Sequence ID. Each update to a database instance generates a new sequential number. Sequence IDs are used later for incrementally finding changes in a database. Theses b-tree indexes are updated simultaneously when documents are saved or deleted. The index updates always occur at the end of the file (append-only updates).

Documents have the advantage of data being already conveniently packaged for storage rather than split out across numerous tables and rows in most databases systems. When documents are committed to disk, the document fields and metadata are packed into buffers, sequentially one document after another (helpful later for efficient building of views).

When CouchDB documents are updated, all data and associated indexes are flushed to disk and the transactional commit always leaves the database in a completely consistent state. Commits occur in two steps:

1. All document data and associated index updates are synchronously flushed to disk.
2. The updated database header is written in two consecutive, identical chunks to make up the first 4k of the file, and then synchronously flushed to disk.

Comments

Sam Rose's picture

"Document updates (add, edit,

"Document updates (add, edit, delete) are serialized, except for binary blobs which are written concurrently. Database readers are never locked out and never have to wait on writers or other readers. Any number of clients can be reading documents without being locked out or interrupted by concurrent updates, even on the same document. CouchDB read operations use a Multi-Version Concurrency Control (MVCC) model where each client sees a consistent snapshot of the database from the beginning to the end of the read operation.

Documents are indexed in b-trees by their name (DocID) and a Sequence ID. Each update to a database instance generates a new sequential number. Sequence IDs are used later for incrementally finding changes in a database. Theses b-tree indexes are updated simultaneously when documents are saved or deleted. The index updates always occur at the end of the file (append-only updates)."

Sam writes this potentially gives us a consistent distributed source of both unique ID and sequential changes. Unique ID could line up with our own simple Knowledge Commons object ID server/service.

This part is also especially important:

"Documents have the advantage of data being already conveniently packaged for storage rather than split out across numerous tables and rows in most databases systems. When documents are committed to disk, the document fields and metadata are packed into buffers, sequentially one document after another (helpful later for efficient building of views)."

this seems to suggest that CouchDB would work best as a convenient way to package data for final delivery, that fits the "Document" model. So, a document object could be built from ID, content, multiple metadata sources, and stored in CouchDB as a CouchDB model "Document" object. This could be built programatically depending on needs of archive users.

To me, this makes CouchDB more likely candidate to be a utility of convenience, a storage system that can work with the spec that we creatd

So, I can imagine a distributed data structure where Unique ID for an object lives in a network-connected database, metadata about object lives in multiple databases,with FLOWS component applications that do the job of programatically collecting and writing metadata. And, when needed, special combinations of these can be stored for convenience in couchDB. It could turn out that all of these needs could be met by multiple couch databases (which would be awesome, as couch has provision for distributing db's). This could negate the need for using Postrgres/Mysql and other SQL-based db's for Knowledge commons, and FLOWS db network storage. (although applications like iRODS, Drupal, etc would still need their local SQL db to operate, but only for local functionality)

Post new comment

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd><blockquote>
  • Lines and paragraphs break automatically.
  • Link to content with [[some text]], where "some text" is the title of existing content or the title of a new piece of content to create. You can also link text to a different title by using [[link to this title|show this text]]. Link to outside URLs with [[http://www.example.com|some text]], or even [[http://www.example.com]].
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <c>, <c++>, <d5>, <d6>, <java>, <js>, <mxml>, <mysql>, <perl>, <php>, <python>, <rails>, <ruby>, <xml>. Beside the tag style "<foo>" it is also possible to use "[foo]".

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Copy the characters (respecting upper/lower case) from the image.