GitHub

Map/reduce queries

Map/reduce queries, also known as the query() API, are one of the most powerful features in PouchDB. However, they can be quite tricky to use, and so this guide is designed to dispell some of the mysteries around them.

The first thing to understand is that you don't need map/reduce queries if you merely want to look up documents by _id or sort them by _id. The allDocs() API already does this, using an efficient built-in index (see "bulk operations" for details).

The second thing to know is that map/reduce is also unnecessary if you want to sort documents by their update time – this is exactly what the changes feed does! Again, this is a built-in index that you get for free.

Finally, it's important to understand that Mango queries are much easier to use than map/reduce queries, and they can usually satisfy 99% of use cases. The point of map/reduce is to provide an extremely advanced API for building secondary indexes, suitable for those with specific querying needs.

So now that you've read the fine print, let's talk about how map/reduce queries actually work!

The PouchDB query() API (which corresponds to the _view API in CouchDB) has two modes: temporary queries and persistent queries.

Temporary queries

Temporary queries are very slow, and we only recommend them for quick debugging during development. To use a temporary query, you simply pass in a map function:

db.query(function (doc, emit) {
  emit(doc.name);
}, {key: 'foo'}).then(function (result) {
  // found docs with name === 'foo'
}).catch(function (err) {
  // handle any errors
});

In the above example, the result object will contain stubs of documents where the name attribute is equal to 'foo'. To include the document in each row of results, use the include_docs option.

The emit pattern is part of the standard CouchDB map/reduce API. What the function basically says is, "for each document, emit doc.name as a key."

Persistent queries

Persistent queries are much faster, and are the intended way to use the query() API in your production apps. To use persistent queries, there are two steps.

First, you create a design document, which describes the map function you would like to use:

// document that tells PouchDB/CouchDB
// to build up an index on doc.name
var ddoc = {
  _id: '_design/my_index',
  views: {
    by_name: {
      map: function (doc) { emit(doc.name); }.toString()
    }
  }
};
// save it
pouch.put(ddoc).then(function () {
  // success!
}).catch(function (err) {
  // some error (maybe a 409, because it already exists?)
});
The .toString() at the end of the map function is necessary to prep the object for becoming valid JSON.
The emit function will be available in scope when the map function is run, so don't pass it in as a parameter.

Then you actually query it, by using the name you gave the design document when you saved it:

db.query('my_index/by_name').then(function (res) {
  // got the query results
}).catch(function (err) {
  // some error
});

Note that, the first time you query, it will be quite slow because the index isn't built until you query it. To get around this, you can do an empty query to kick off a new build:

db.query('my_index/by_name', {
  limit: 0 // don't return any results
}).then(function (res) {
  // index was built!
}).catch(function (err) {
  // some error
});

After this, your queries will be much faster.

CouchDB builds indexes in exactly the same way as PouchDB. So you may want to familiarize yourself with the "stale" option in order to get the best possible performance for your app.

That was a fairly whirlwind tour of the query() API, so let's get into more detail about how to write your map/reduce functions.

Indexes in SQL databases

Quick refresher on how indexes work: in relational databases like MySQL and PostgreSQL, you can usually query whatever field you want:

SELECT * FROM pokemon WHERE name = 'Pikachu';

But if you don't want your performance to be terrible, you first add an index:

ALTER TABLE pokemon ADD INDEX myIndex ON (name);

The job of the index is to ensure the field is stored in a B-tree within the database, so your queries run in O(log(n)) time instead of O(n) time.

Indexes in NoSQL databases

All of the above is also true in document stores like CouchDB and MongoDB, but conceptually it's a little different. By default, documents are assumed to be schemaless blobs with one primary key (called _id in both Mongo and Couch), and any other keys need to be specified separately. The concepts are largely the same; it's mostly just the vocabulary that's different.

In CouchDB, queries are called map/reduce functions. This is because, like most NoSQL databases, CouchDB is designed to scale well across multiple computers, and to perform efficient query operations in parallel. Basically, the idea is that you divide your query into a map function and a reduce function, each of which may be executed in parallel in a multi-node cluster.

Map functions

It may sound daunting at first, but in the simplest (and most common) case, you only need the map function. A basic map function might look like this:

function myMapFunction(doc) {
  emit(doc.name);
}

This is functionally equivalent to the SQL index given above. What it essentially says is: "for each document in the database, emit its name as a key."

And since it's just JavaScript, you're allowed to get as fancy as you want here:

function myMapFunction(doc) {
  if (doc.type === 'pokemon') {
    if (doc.name === 'Pikachu') {
      emit('Pika pi!');
    } else {
      emit(doc.name);
    }
  }
}

Then you can query it:

// find pokemon with name === 'Pika pi!'
pouch.query(myMapFunction, {
  key          : 'Pika pi!',
  include_docs : true
}).then(function (result) {
  // handle result
}).catch(function (err) {
  // handle errors
});

// find the first 5 pokemon whose name starts with 'P'
pouch.query(myMapFunction, {
  startkey     : 'P',
  endkey       : 'P\ufff0',
  limit        : 5,
  include_docs : true
}).then(function (result) {
  // handle result
}).catch(function (err) {
  // handle errors
});
The pagination options for query() – i.e., startkey/endkey/key/keys/skip/limit/descending – are exactly the same as with allDocs(). For a guide to pagination, read the Bulk operations guide or Pagination strategies with PouchDB.

Reduce functions

As for reduce functions, there are a few handy built-ins that do aggregate operations ('_sum', '_count', and '_stats'), and you can typically steer clear of trying to write your own:

// emit the first letter of each pokemon's name
var myMapReduceFun = {
  map: function (doc) {
    emit(doc.name.charAt(0));
  },
  reduce: '_count'
};
// count the pokemon whose names start with 'P'
pouch.query(myMapReduceFun, {
  key: 'P', reduce: true, group: true
}).then(function (result) {
  // handle result
}).catch(function (err) {
  // handle errors
});

If you're adventurous, though, you should check out the CouchDB documentation or the PouchDB documentation for details on reduce functions.

The map/reduce API is complex, and it can be computationally expensive because it requires building up an entirely new index. Therefore, it's good to know some tricks for avoiding the map/reduce API when you don't need it:

  1. If you can use allDocs() or changes() instead of the query() API, do it!
  2. If your query is simple enough that you can use find(), use that instead.
  3. Read the 12 tips for better code with PouchDB, especially the tip to "use and abuse your doc _ids."
  4. If your data is highly relational, try the relational-pouch plugin, which follows this advice, and only uses _id and allDocs() under the hood.

Now that we've learned how to map reduce, map reuse, and map recycle, let's move on to destroy() and compact().