CodeHard

Setting up Couchbase on Azure for people in a hurry

David Ostrovsky — Tue, 16 May 2017 19:37:17 GMT

We're collaborating with the open source guys over at Microsoft on some content and events. One of the questions that came up was: what's the easiest way to set up Couchbase on Azure? Now, that sounds rather trivial - just look at the Couchbase docs and follow the instructions. But combine that with Azure and suddenly there's a dozen ways to go about the whole thing.

So here are my top 3 ways of setting up Couchbase on Microsoft Azure

Azure Marketplace - use the prepared image
Docker with Azure Container Servces - just docker run
Just VM it - spin up your own VMs and install Couchbase

Let's go with the first choice, because it's by far the easiest.

In the Azure Portal, click the friendly + icon and search for Couchbase. You'll be presented with 3 choices, pick one of the Couchbase Enterprise versions, note that if you pick Silver Support that will result in additional charges for the Couchbase license. Bring Your Own License (BYOL) means exactly what it says on the tin. If you're just going to test out Couchbase, don't worry about it and go with BYOL.

Set up your admin credentials resource group and location. Note that the credentials are both for the VMs themselves and Couchbase Server, so if you intend to use them for anything other than testing, you really should change at least the Couchbase credentials, so they're not the same as the machine itself. If you're just playing around with Couchbase, I'd recommend making a new resouce group, so it's easier to clean up after.

Pick the type and number of nodes. If you want to test features like replication, high availability, failover, etc., you should go with at least 3 nodes. If you want to get an idea about performance, pick a node that supports SSD drives and at least 4 CPU cores.

Click through the rest of the prompts, agree to sell your soul and first born child to Microsoft and Couchbase respectively as part of the EULA, and just wait for the VMs to come up.

Eventually, the deployment will finish and you'll be able to connect to your shiny new Couchbase cluster.

Click the deployment notification, or the resource group from your dashboard to see all the resources within that group. Pick any of the VMs - they're all part of the same Couchbase cluster, so you can use any one of them to manage the entire cluster - click the VM and find the public IP address in the VM overview.

Browse to http://:8091 to access the Couchbase UI and use the credentials you set up earlier to log in. The newly created cluster has no buckets defined, so let's create some. Click the Settings tab, then click the Sample Buckets sub-tab, check all 3 of the sample buckets in the list and finally click the Create button on the bottom. Wait a minute for the sample data to be loaded.

Congratulations, your Couchbase Cluster is now fully operational. You can head over to the Query tab to try out some N1QL queries or something.

Remember that you can clean everything up by deleting the resource group you created in Azure.

Couchbase transport plugin for Elasticsearch 5.0-alpha2

David Ostrovsky — Fri, 27 May 2016 08:19:11 GMT

TL;DR: Get the preview release of the plugin for ES 5.0-alpha2 here.

While I was working on updating the Couchbase transport plugin for Elasticsearch 2.x, Elastic went and released an alpha version of 5.0. There are a lot of changes in various APIs, breaks from backwards compatibility, etc., but very little of it actually affected the transport plugin. The main two things that required changes were:

Plugins now always have their own class loader, there is no longer an isolated=false option.
Because of the previous point and the overhauled security mechanism from ES 2.2, the plugin needed to request some additional runtime permissions during installation.
The settings mechanism got revamped and now requires everything (ES and plugins) to declare settings programmatically in advance. Elasticsearch validates all the settings in the elasticsearch.yml file and refuses to start if it finds any settings that haven't been declared by some component. So you can't just add whatever you like to elasticsearch.yml anymore.

While I was at it, I refactored some ugly settings related code in the plugin and fixed some minor coding issues. Finally, because we tried to keep the plugin versions in tune with Elasticsearch, while also having our own minor versions, the versioning scheme for the plugin got way too silly. So starting with the next plugin release after this, we're going to follow Elastic's lead and align the plugin versions to Elasticsearch, while reserving the build ({major.minor.revision.build}) field for ourselves.

In the meantime, you can get the plugin here. Because the plugin utility got renamed to ES 5.0, use the following command from the Elasticsearch root directory to install:

bin/elasticsearch-plugin install -b https://github.com/couchbaselabs/elasticsearch-transport-couchbase/releases/download/2.5.0.0-alpha2/elasticsearch-transport-couchbase-2.5.0.0-alpha2.zip

Couchbase Transport Plugin for Elasticsearch 2.x

David Ostrovsky — Tue, 12 Apr 2016 21:44:02 GMT

TL;DR: Get it here: https://github.com/couchbaselabs/elasticsearch-transport-couchbase/releases

While the folks at Elasticsearch are busy working on version 5.0, I've now released updated versions of the Couchbase transport plugin for versions 2.x. Elasticsearch 2.0 brought a lot of changes to the security model, then more changes in 2.1 and then finally some major improvements in plugin security in 2.2. All that combined to quite a delay in pushing out the proper verions, especially since Elasticsearch plugins are version specific now. This means that for every minor version of ES, I need to build a minor version of the plugin.

We've updated the way the plugin versioning works, so it now tracks the ES minor releases. Plugin versions are now the format "2." to make it easier to figure out which one you should install.

If you're running ES 2.2 and higher, installing the plugin is as simple as it was before 2.0 - all you need to do is run:

bin/plugin install -b https://github.com/couchbaselabs/elasticsearch-transport-couchbase/releases/download/2./elasticsearch-transport-couchbase-2..zip

Where is, obviously, the relevant Elasticsearch version. The -b flag tells the plugin installer to automatically accept all prompts. If you leave it out, the installer will ask you to grant the plugin extra security permissions, which you should say YES to.

If you're running ES 2.0 or 2.1, first of all you should consider upgrading. But if you're determined to install the plugin on it, you'll need to do a little extra work. In order to get the plugin to load, it is necessary to edit the system's default java.policy file, which is located in the %JAVA_HOME%/jre/lib/security directory. You can either edit this file directly, or use the policytool utility, which can be found in the %JAVA_HOME%/bin directory. Note that editing the policy file requires root permissions.
If you're editing the policy file directly, add the following to the end of the file:

grant codeBase "file://*" {
  permission javax.security.auth.AuthPermission "modifyPrincipals";
  permission javax.security.auth.AuthPermission "modifyPrivateCredentials";
  permission javax.security.auth.AuthPermission "setReadOnly";
  permission java.lang.RuntimePermission "setContextClassLoader";
  permission java.net.SocketPermission "*", "listen,resolve";
  permission java.lang.reflect.ReflectPermission "suppressAccessChecks";
};

Replace file://* with the directory where you installed the plugin. For example, if you installed Elasticsearch from as a deb/rpm package on Linux, this would be file:/usr/share/elasticsearch/plugins/transport-couchbase/* - note the trailing *, which means that the policy will apply to all files in that directory.

If you prefer to use the GUI applet, run specify the location of the policy file with the -file parameter, for example:

sudo $JAVA_HOME/bin/policytool -file $JAVA_HOME/jre/lib/security/java.policy

When the applet window appears, verify that the Policy File textbox shows the name of the policy file you're going to edit. To add the required security settings click on the "Add Policy Entry" button. In the new "Policy Entry" window that will appear, fill in the CodeBase textbox with the URL path to the plugin's installation directory, as explained above.
Next, add each of the following permissions by clicking on the "Add Permission" button for each, then selecting the specified options:

1.)  permission javax.security.auth.AuthPermission "modifyPrincipals";

     From the "Permission:" dropdown select:

        AuthPermission

     From the "Target Name:" dropdown select:

        modifyPrincipals

     Click "Ok"

2.)  permission javax.security.auth.AuthPermission "modifyPrivateCredentials";

     From the "Permission:" dropdown select:

        AuthPermission

     From the "Target Name:" dropdown select:

        modifyPrivateCredentials

     Click "Ok"

3.)  permission javax.security.auth.AuthPermission "setReadOnly";

     From the "Permission:" dropdown select:

        AuthPermission

     From the "Target Name:" dropdown select:

        setReadOnly

     Click "Ok"

4.)  permission java.lang.RuntimePermission "setContextClassLoader";

     From the "Permission:" dropdown select:

        RuntimePermission

     From the "Target Name:" dropdown select:

        setContextClassLoader

     Click "Ok"

5.)  permission java.net.SocketPermission "*", "listen,resolve";

     From the "Permission:" dropdown select:

        SocketPermission

     In the "Target Name:" textbox type:

        *

     In the "Actions:" dropdown type:

        listen,resolve

     Click "Ok"

6.)  permission "java.lang.reflect.ReflectPermission" "suppressAccessChecks";

     From the "Permission:" dropdown select:

        ReflectPermission

     From the "Target Name:" dropdown select:

        suppressAccessChecks

     Click "Ok"

At the end of the process, your policy tool should look similar to this:

Click "Done" to close the "Policy Entry" window and then select "Save" from the File menu of the "Policy Tool" window. Close the policytool utility when you're finished.

That's it. If you find any bugs, please file them in the projects issue tracker on github.

Workaround for N1QL * queries in the .NET SDK

David Ostrovsky — Thu, 07 Jan 2016 07:58:10 GMT

TL;DR: SELECT myBucket.* FROM myBucket

A change in the way N1QL returns results for star queries in Couchbase Server 4.1 seems to have broken how the .NET SDK deserializes the returned results. It causes the Query<> method to return the correct number of results, but with all properties at their default value. The reason is that a query like SELECT * FROM default now produces the following JSON result:

[
  {
    "default": {
      "prop1": "...",
      "prop2": "..."
    }
  },
  {
    "default": {
      "prop1": "...",
      "prop2": "..."
    }
  }
]

As you can see, each document is returned as a property under the name of the bucket, whereas the .NET SDK implementation expects the results like they used to be in the earlier versions of N1QL, as an array of JSON document bodies:

[
  {
    "prop1": "...",
    "prop2": "...",
  },
  {
    "prop1": "...",
    "prop2": "...",
  }
]

Luckily, we can easily fix this by selecting the content of the default property instead of * in our query, like so: SELECT default.* FROM default which will now return the results in the format the .NET SDK expects.

Try it out and see that the Query<> method now returns objects with all the values correctly deserialized.

Using Couchbase from Delphi

David Ostrovsky — Fri, 20 Nov 2015 22:29:13 GMT

TL;DR: Expose the Couchbase .NET SDK through COM interop by wrapping it in a simple facade class.

I got an unusual request today: a prospective Couchbase customer wants to use the database from a Delphi application. Turns out Delphi is still a thing! Who knew?

As it happens, it's actually quite simple to use the Couchbase .NET SDK from Delphi by wrapping it in a slim facade class and exposing said facade through COM interop.

We start by creating a regular .NET class library project and adding a NuGet reference to the Couchbase SDK. Next we add interface we're going to expose through COM to the class library:

[ComVisible(true)]
[Guid("CC843972-BF02-4453-9C39-6B36E2E1CDFA")]
public interface ICouchbaseFacade
{
    object Get(string key);
    void Upsert(string key, object value);
}

And the class that implements said interface:

[ComVisible(true)]
[ClassInterface(ClassInterfaceType.None)]
[Guid("961E66E7-8DF5-4366-AAF6-F84037D26B05")]
public class CouchbaseFacade : ICouchbaseFacade
{
    // ...
}

Since this is just a POC, we're not going to do anything fancy in the facade class. Just open a connection to the 'default' bucket on localhost in the constructor, and implement a couple of simple methods to demonstrate CRUD operations.

    private IBucket _bucket;
    private Cluster _cluster;

    public CouchbaseFacade()
    {
        _cluster = new Cluster();
        _bucket = _cluster.OpenBucket("default");
    }        

    public object Get(string key)
    {
        return _bucket.Get(key).Value;
    }

    public void Upsert(string key, object value)
    {
        _bucket.Upsert(key, value);
    }

Next, let's build the project and register the assembly for COM interop. Either check the "Register for COM interop" box in the Build tab of the project properties, or run regasm /codebase /tlb from the command-line. Either option requires administrator privileges to work.

With that done, it's time to use our new SDK wrapper from Delphi. Create a new Delphi project and then click "Import Component" under the Component menu. Select "Import Type Libray" and find the .NET assembly we exported as COM earlier. Import the library, which will add the ICouchbaseFacade interface and CoCouchbaseFacade classes to your Delphi project. All that's left is to create an instance of the class and try out the Upsert and Get methods we added earlier:

uses
   DelphiCouchbaseInterop_TLB;

var
  Couchbase: ICouchbaseFacade;

// ...

Couchbase := CoCouchbaseFacade.Create;
Couchbase.Upsert('_test', 'hello world!');
ShowMessage('Result: ' + Couchbase.Get('_test'));

If all goes well, this will create a new document with the key "_test" and the value "hello world!".

Then retrieve it and show the result in a message box.

You can get the complete POC code here: https://github.com/Branor/DelphiCouchbaseInterop

Validating Couchbase Referential Integrity with N1QL

David Ostrovsky — Wed, 21 Jan 2015 19:33:03 GMT

TL;DR: Join the bucket to itself with an outer left join on the foreign (reference) ID, where the right side of the join is missing.

Most NoSQL databases do not have built-in support for referential integrity and Couchbase Server is no exception. The best you can do is check which documents have broken references so you can fix them manually or programmatically. In Couchbase it's possible to write a MapReduce view that will find all the documents that have a broken reference to another document, but the view definition is pretty complex and coupled to a specific document field. If you want to check the integrity of another field, write another view.

With the advent of N1QL, we now have a much better way to test referential integrity with a relatively straightforward query. You still can't enforce integrity, of course, but at least you can easily check if it's broken and do something to fix it retroactively.

Let's assume we have two document types: and another type that the source document references. We'll also assume that the referenced document ID is stored in the ref_id field, formatted as "" - you can remove the prefix/suffix if it's not relevant, of course. The query is as follows:

SELECT META(b1).id AS source_doc, 
       b1.ref_id AS missing_target_doc
FROM  b1 
LEFT OUTER JOIN  b2 
ON KEYS '' || b1.ref_id || ''
WHERE b1.type = '' AND b2 IS MISSING;

As you can see above, we're joining to itself with an outer left join, where the left side (b1) is the source document itself and the right side (b2) is the document it references. We're interested in any results from the left side of the join that don't have a value on the right side (MISSING) - those are the documents that have a broken reference. In this case, as the names imply, source_doc is the ID of the source document, and missing_target_doc is the ID of the missing target document.

Note 1: The || operator denotes string concatenation in N1QL.

Note 2: This works in N1QL DP4, I didn't test it with DP3 because I'm lazy and you really should switch to DP4 anyway. :)

Running N1QL as a service with Upstart

David Ostrovsky — Thu, 08 Jan 2015 15:10:10 GMT

TL;DR: Copy this script to /etc/init and run start n1ql from the command line to start N1QL as a service.

The 3rd developer preview of N1QL, the new Couchbase query language, has been out for a while now and I've seen several clients developing code for N1QL already. The one compaint they all had was that it doesn't come with a convenient installer that would set it up on their dev/CI/production environments. (Well, actually, clients always complain about everything, but this is what we're solving today.) Obviously, in the production release it will be part of Couchbase Server itself, but for now we have to run it as a separate service.

Setting up N1QL as a service on Linux is actually quite simple, since pretty much every disto these days uses Upstart as its init daemon.

Note that the new 2.0 generation of Couchbase SDKs assumes that the cbq (N1QL) service is running on every Couchbase node in the cluster, so you'll need to repeat the following process on every node.

First, download and extract N1QL DP3 somewhere on your system, for example into /opt/n1ql.

Next, create a file named n1ql.conf in your /etc/init folder with the following content:

#!upstart
description "N1QL Service"
author      "David Ostrovsky"

start on stopped rc RUNLEVEL=[2345]
respawn

script
    export HOME="/root"

    echo $$ > /var/run/n1ql.pid
    exec /opt/n1ql/cbq-engine -couchbase http://localhost:8091 >> /var/log/n1ql.sys.log 2>&1
end script

pre-start script
    # Date format same as (new Date()).toISOString() for consistency
    echo "[`date -u +%Y-%m-%dT%T.%3NZ`] (sys) Starting" >> /var/log/n1ql.sys.log
end script

pre-stop script
    rm /var/run/n1ql.pid
    echo "[`date -u +%Y-%m-%dT%T.%3NZ`] (sys) Stopping" >> /var/log/n1ql.sys.log
end script

post-stop exec sleep 5

We want to start the N1QL service only after Couchbase Server starts. However, because Couchbase still uses the old SysV init stript, we can't configure our Upstart service to start and stop N1QL depending on the status of the Couchbase service itself. The best we can do is start N1QL after all the SysV scripts have finished executing, which is what start on stopped rc RUNLEVEL=[2345] does. We also tell it to respawn if the process exits with a non-zero error code, because even if the Couchbase init script has finished, the server might not be accepting connections just yet, so Upstart will keep trying until cbq-engine connects. Remember to replace the path to cbq-engine in the script with the correct location on your system.

You can now start the N1QL service from the command line:

start n1ql

Unsurprisingly, to stop the service run:

stop n1ql

To check its status:

status n1ql

It will also persist through reboots, so give that a try (extra points for doing it on the production system!) If you run into any issues, whether on start or at runtime, check the N1QL service log, which is set to /var/log/n1ql.sys.log in the example script.

With the service running, you can use N1QL from your application code through the SDK, or from the command line shell that comes with the developer preview:

/opt/n1ql/cbq -engine=http://localhost:8093

Important note: If you're using Couchbase Server 3.0.x, before you can run N1QL queries, you have to create the primary index for every bucket you intend to query. To do so, either run the following command in the cbq shell:

CREATE PRIMARY INDEX ON

Or use the HTTP API (the query has to be URL encoded):

curl http://localhost:8093/query?q=CREATE%20PRIMARY%20INDEX%20ON%20

That's it. Go forth and query all the things!

PS - Thanks to Tim Smart for the pre and post start log file idea, which I shamelessly "borrowed". Read the rest of his blog to learn how to use Monit to monitor the service and restart it if the process stops responding for any reason.

Encoding UUID-Based Keys to Save Memory

David Ostrovsky — Mon, 11 Aug 2014 12:59:21 GMT

TL;DR: The most compact printable ASCII encoding of a binary value is Ascii85 (aka. Base85). Encoding a 16-byte UUID as Base85 takes only 20 bytes, as opposed to 36 for the standard string representation.

Using UUIDs to generate unique keys for Couchbase documents is a popular choice. You can generate UUIDs on the client side without referring to some central component to provide unique values, they are widely supported and have a fairly familiar human-readable representation. However, UUID-based keys suffer from one important drawback - the standard string encoding is 36 bytes long: d8b594f4-bb26-49f8-bdc4-7d68aba9dc54. Since Couchbase keeps all document keys in memory, this can add up to a significant overhead for databases with a lot of documents. For example, a Couchbase bucket with one billion items would need 36 * 10^9 / 2^30 = 33.5 GB of RAM just to hold all the keys. If your documents are small, it's possible that you'll actually use more memory for keys than the documents themselves.

(Note: Couchbase Server 3.0, which is coming out soon, includes an optional feature to eject keys+metadata from memory along with their documents. This makes the key overhead a lot less of an issue. However, this feature negatively impacts the performance of some operations, so there are cases where you would not want to use it.)

Of course, we don't have to use the standard string UUID encoding. There are 95 printable ASCII characters, not including whitespace, so the most compact text encoding for a binary value is Ascii85 (aka. Base85), which encodes a 16-byte UUID as a 20-byte string, or only 25% more. Using the better known Base64 encoding takes 22 (or 24 with padding) bytes to represent a UUID, that is 33% more.

Because I'm lazy, I've used Jeff Atwood's C# implementation of Ascii85 to write a small demo application that inserts one million items with UUID-based keys into my Couchbase bucket, first using plain ToString() and then using Ascii85. You can download the solution here.

One million items with UUID.toString() as keys:

> cbstats localhost:11210 all | grep meta_data
 ep_meta_data_memory:                92000000

Makes perfect sense, 92 bytes per item, 56 bytes metadata, 36 bytes key, which is exactly how much a plain string UUID representation with hyphens takes.

One million items with Base85 encoded UUIDs as keys:

We do lose some readability with the Ascii85 encoded UUIDs. Hopefully you won't have to memorize them or anything.

> cbstats localhost:11210 all | grep meta_data
 ep_meta_data_memory:                76000000

Just as expected, 76 bytes per item, 56 metadata, 20 bytes key.

As you can see, saving 16 bytes per key in a bucket with one million documents is no big deal, just 15 MB of RAM saved. However, saving 16 bytes per key in a bucket with one billion documents would be 15 GB of RAM saved, or rather used to cache document data for faster retrieval instead of storing long keys. Much better.

There are implementations of Base85 for several common languages. Alternatively, if you don't mind giving up 2 bytes per key, most modern frameworks come with a built-in implementation of Base64. Obviously, nothing comes for free - we pay for the memory saving with additional client-side CPU cycles to encode the keys at creation. Base64 is also somewhat faster than Base85; not that it matters a whole lot, since it's only used once per key.

Real-Time Data Analytics with Couchbase and ElasticSearch

David Ostrovsky — Fri, 01 Aug 2014 05:58:01 GMT

TL;DR: Replicate documents from Couchbase to ElasticSearch through XDCR and display real-time analytics in Kibana.

For the less impatient readers, I'd like to tell you about an interesting use-case for getting (near) real-time data analytics with a combination of Couchbase Server, ElasticSearch and Kibana.

This works particularly well if you already have an existing Couchbase database that you're using for other things, and would like to add some BI or real-time monitoring on the data you storing. For those who aren't familiar with Couchbase' Cross Datacenter Replication (XDCR) feature: this is a mechanism that lets you replicate data between two Couchbase clusters. The cool thing is that you can use a plugin for ElasticSearch which implements the XDCR protocol to use the same mechanism for replicating data between Couchbase and an ElasticSearch cluster. Unlike the ElasticSearch river service, which usually works by pulling data from the source periodically, XDCR pushes data from Couchbase into ElasticSearch, which means the replication is much closer to real-time.

Sidenote: The Couchbase 3.0 beta that came out yesterday has a new mechanism for streaming document changes to XDCR from memory, rather than from disk. This should further reduce the time between a document being changed in Couchbase and the change being reflected in ElasticSearch.

Anyway, let's get to the meat of the post! The main idea is to use Kibana for aggregating and showing different statistics of the data that's stored in Couchbase. So the first thing we need is to set up the XDCR replication from a Couchbase bucket to an ElasticSearch index. Normally, when using ElasticSearch as a search front-end for Couchbase, we don't store any of the document fields as the source - we have the source documents readily available directly from Couchbase. However, in this case we'll need the values for any fields we intend to analyse in Kibana, so we'll have to configure a mapping that stores the source values of those fields.

For our example, we'll store documents representing users in a Couchbase bucket named, unsurprisingly, "users". We'll use XDCR to replicate this bucket into an ElasticSearch index named "cb_users". The user document will have some data fields, including name, age, last GPS location update, and last GPS update data. We'll load some initial data into the bucket, and then run a simple script that randomly updates the users' locations and dates. We'll then use this data to build a Kibana dashboard that tracks statistics about user activity. For simplicity, I'm going to assume everything is installed and running locally, so replace any references to "localhost" as appropriate in your case.

To use XDCR, you will need to install the transport-couchbase ElasticSearch plugin, which is what allows Couchbase to push documents to ElasticSearch. There is a great guide for installing and configuring the plugin in the Couchbase online documentation. The one thing you must do differently from the guide is edit the couchbase template before applying it to. The default template is set to match * (that is, all indices) and doesn't store the source document at all. If you're going to use your ElasticSearch cluster for other things besides indexing data from Couchbase, the overly inclusive couchbase template will mess that up.

In our case, we'll change the default template to store everything in the source and map the fields of the couchbaseDocument type to their respective data types in advance. You could also leave out the couchbaseDocument type mapping and let ElasticSearch figure it out dynamically. The reason I'm hard-coding it in this case is that ElasticSearch doesn't automatically map the location array to a geo_point. Note that I'm being lazy and storing doc.* in the source. In a real application, you should save the processing and storage by only storing the fields you'll actually need. After editing the template, apply it to ElasticSearch:

> curl -XPUT http://localhost:9200/_template/couchbase -d @template.json

And then create the cb_users index in ElasticSearch:

> curl -XPUT localhost:9200/cb_users

Next, let's create the users bucket in Couchbase:

And the XDCR cluster reference to our ElasticSearch cluster:

And finally the XDCR replication from the users bucket to the cb_users index:

Now let's check that the replication is working by creating some documents in Couchbase and searching for them in ElasticSearch. I'm going to use a couple of quick Node.js scripts to create some users in Couchbase and then search for any users located within 10km of the specified coordinates. Running first geoquery_setup.js and then geoquery_test.js should print out some results.

Great, we finally have all the data in place, so now it's time to create a Kibana dashboard full of awesome pie charts and real-time graphs! I'm going to leave the Kibana installation up to you, because there are just too many ways (and operating systems) to cover. Just get the latest version and do whatever you need to do in order to run Kibana as a local website. If you're a seasoned Kibana user, just open a blank dashboard and get cracking. Otherwise, you might want to start with a pre-made dashboard that gives you more guidance: open the following URL in a browser http://localhost/index.html#/dashboard/file/noted.json and spend some time reading the instructions.

The first thing we'll do is point our dashboard to the cb_users index and couchbaseDocument type. Open the dashboard configuration (cog icon in the top right), go to the Index tab, change the default index from "_all" to "cb_users/couchbaseDocument" and check the preload fields box. Then go to the Timepicker tab and change the time field to "doc.update_date". Next we'll create a pie chart of users by age. In the first row click the green plus (Add Panel) button in the floating menu on the left. Select "terms" from the panel type list, give it a name, set "doc.age" as the field name, and select "pie" from the style list. You can uncheck the missing and other boxes, so the pie chart doesn't show documents without an age field. Save the panel and admire the results. (This might be a good time to delete the pre-defined text panels, if you're done reading them.)

Let's spice our dashboard up with a map: add another panel to the first row and select "Bettermap" as the panel type. Enter "doc.location" as the coordinate field, and "doc.name" as the tooltip field. Note that Bettermap needs the location field to be a geoJSON array in the format [longitude, latitude], which is why we stored it that was in the user document, instead of the more readable geoJSON object with nested lat and long fields.

Sidenote: "Bettermap" is a bit of a misnomer. A more appropriate name might have been "it's-the-only-choice-you've-got-so-suck-it-map". But I digress...

No dashboard is complete without a real-time histogram or two, so let's add one of those next. Add a new panel to the second row, set the type to "histogram" and the time field to "doc.update_date". Change the span to the maximum value allowed, to make the histogram span the entire screen width. After you save the new histogram panel, you might want to go to the row settings and reduce its height to a more manageable 150 pixels or so.

Last, lets add a table to display the most recently updated users. Add a third panel to the top row, select the "table" type, set the sort and local time field to "doc.update_date" field, add "doc.name" and "doc.update_date" to the columns and set paging to 7 results per page and only 1 page, to only show the top 7 results. After you save the panel, minimize the fields list on the left, to leave just the user names and update dates visible.

Alright! All that's left is to actually change some users and see whether the dashboard works as expected. Run the geoquery_realtime.js script or write your own code for changing user locations and update dates over time. In the Kibana dashboard, set the time filter to auto-refresh every 5 seconds and to display data for the last 5 minutes. Finally, save the dashboard so you can come back to it later by clicking on the save menu in the top right and either clicking the save icon - which will store the dashboard in ElasticSearch - or selecting "export schema" under the advanced menu - which will export the dashboard as a JSON file that you can import later. Here's mine.

Congratulations, you are now the proud owner of a nifty real-time analytics system!

Couchbase - NoSQL for you! (Slides and demos from SDP 2014)

David Ostrovsky — Sat, 05 Jul 2014 10:48:11 GMT

I gave a talk about Couchbase Server at the Sela Developer Practice conference last week. Here are the slides and demos; you can also flip through the slide deck below. It's a 50-minute introduction to Couchbase, basic development and a small preview of the N1QL query language.

Couchbase - NoSQL for you! (SDP 2014) from SirKetchup

Generating Sequentials IDs with Couchbase

David Ostrovsky — Mon, 28 Apr 2014 21:09:15 GMT

Reliably generating sequential IDs (i.e. numbers) is useful for a multitude of applications. For example, if you want to deliver messages asynchronously - say through a queue - and then reconstruct the message order on the receiving side.
If you can be sure that the IDs 1, 2 and 3 were allocated in that order, it doesn't matter in what order they go through the queue, you can always sort them in the true order on the other side.

Conventional wisdom tells us that using an external component, especially a database, to generate application sequences is a bad idea. If you think about it, it makes sense right away - it's very hard to generate true sequential IDs in a distributed manner, and if it's not distributed then there will certainly be a hard upper limit to the number of items you can generate per time unit. In the context of Couchbase Server, we can use the increment operation to create a counter. Increment is atomic, so we can call it repeatedly from any number of treads or processes and be assured that it will return sequential values (assuming we actually pass a positive increment to the method.) If course, if we use a global counter to generate our sequence, it will be a single document, meaning it will reside on a single node. Regardless of how many nodes we actually have in the cluster, the processing load of incrementing the counter and saving it to disk will fall on that node alone. I wanted to find out just how bad this bottleneck would be: can I use a single counter for generating sequences globally in a distributed application with reasonable performance?

Test setup in Azure:

Single-node Couchbase cluster on an A4 VM (8 cores, 14GB RAM, data on a RAID0 volume on top of 4 disks within the same affinity group.)
Two application servers on similar A4 VMs:
- Python script calling increment("counter", 1, 1) in a tight loop from multiple threads.
Only one document in the database - the counter itself, and no other load.

The first application server, at 100% CPU, managed about 25k increments per second. Bringing the second one online increased that to 40k.

The database server steadied out at 40% CPU and essentially no disk activity. As you can see below, while the disk write queue fill rate was high, the actual number of disk writes was absurdly low, because Couchbase de-duplicates multiple updates to the same key before writing them to disk.

One thing you'll note is that the average age of the item(s) in the disk write queue is high and still rising. It's possible that Couchbase counts the age from the moment the counter item was first updated, since it never actually leaves the queue due to being constantly re-added.

In conclusion, we can see that even if we use one global counter for all our sequential IDs, we can still generate a respectable number of new items without overwhelming the node. Of course, if you need to generate sequential numbers at a higher rate, you shouldn't use a single global counter, as it will become a bottleneck at some point. One possibility is to use multiple counters, one per entity type, or application domain, or some other differentiator - to spread the load better.

Couchbase on Azure - Does Read/Write Caching Help?

David Ostrovsky — Sun, 16 Mar 2014 17:43:10 GMT

TL;DR: Do not enable host caching on Couchbase data disks.

When creating a new disk in Windows Azure, you can choose whether to enable host caching or not.

It's generally assumed that for database workloads, it's best to run with caching off, but we couldn't find any actual benchmarks that showed the difference. Furthermore, Couchbase doesn't exactly behave like SQL server, as far as disk activity is concerned. So, we set out to see what, if any, effect the Windows Azure host caching has on Couchbase Server.

To make the test as simple as possible we used two identical large Ubuntu 12.04 VMs, each within its own dedicated storage account, with a single 20GB data disk attached. One VM had a data disk with host cache set to NONE, the other set to READ/WRITE. We mounted the data disks with defaults, noatime in fstab.

azureuser@cbcache:~$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1        29G  1.4G   26G   6% /
udev            1.7G  8.0K  1.7G   1% /dev
tmpfs           689M  248K  689M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none            1.7G     0  1.7G   0% /run/shm
/dev/sdb1       133G  188M  126G   1% /mnt
/dev/sdc1        20G  172M   19G   1% /data

We installed Couchbase Server 2.5 EE on both VMs, with the data path set to the disk we want to test, and the index path set to the OS drive, since we weren't going to use any views anyway.

Write Test:

For the write test, we first used cbworkloadgen to create 2 million 4k JSON documents to set up the initial data set, then we used cbworkloadgen from two sessions simultaneously to repeatedly go over the data and update each document. With 2GB of RAM allocated for the bucket, 22% of the documents were resident in memory.

cbworkloadgen -t 10 -r 1 -s 4096 -j -l -i 2000000

That's 10 threads, 100% writes 0% reads, 4096 byte JSON documents in an infinite loop, with keys from 0 to 2000000, times two sessions. We let this run for an hour.

The disk with the host cache enabled is on the left, without is on the right. As you can see, the write queue on the disk with cache enabled drained about 10% slower; 18.2k vs. 20.6k on average over an hour. At the end of one hour, the average age of the write queue was about 5 seconds for the cached disk, 2.5 seconds for the noncached.

(As a side note, the initial data loading also finished about 10% faster on the disk without caching.)

Just to be sure, we repeated the test with a python script inserting documents from 10 different processes, instead of cbworkloadgen, and got the same results - 10% difference in favour of the disk without cache.

Sequential Read Test:

For the sequential read test, we used a simple python script get documents one by one. We staggered the initial offset of each read process, so that all the reads were from disk, rather than from the working set cached in RAM.

Disk with cache enabled on top, without on the bottom.

As we intended, we had 100% cache miss ratio, meaning we were reading all documents from disk. Again, you can see a clear difference in favour of the disk without host caching, an average of 4.8k vs. 4.2k reads per second over one hour.

Random Read Test, 20% working set:

As expected, as soon as the working set was 100% cached in RAM, there was no difference between the two VMs, because they were not reading from disk at all. The VM without host caching was about 10% faster to reach that point.

Mixed Read/Write Test, 20% working set:

Same as the read test, and the write queue drained 10% faster on the disk without host cache.

We tried several other combinations of reading and writing, but we did not find a single instance where enabling host cache gave better results. The only scenario where we found host caching beneficial was restarting the Couchbase process - the VM with caching enabled finished server warmup faster.

Static IPs on Windows Azure - the Right Way

David Ostrovsky — Fri, 07 Mar 2014 10:06:08 GMT

TL;DR: Use the Set-AzureStaticVNetIP cmdlet to reserve static IPs in a VNet.

Yesterday I saw a blog post about deploying Couchbase on Windows Azure. It was good in everything related to creating VMs and actually installing Couchbase, but it had one flaw: it relied on the order in which the VMs were started to assign IPs in the virtual network. That is a terrible idea in production.

By default, the Windows Azure Virtual Network assigns IPs within a subnet in the order they're requested, starting from *.*.*.4 - so for example, if your subnet is configured for 10.0.0.0/23, your machines will have the IP addresses 10.0.0.4, 10.0.0.5, and so on. If you shutdown (deallocate) the first VM (that had the IP 10.0.0.4) and start another one, then the new VM will get 10.0.0.4 because it's the first free IP in the subnet range.

As you can see, relying on the default way that the VNet assigns addresses is very unreliable in practice. Furthermore, if you don't neet all the machines at all times, there is no good way to shutdown some of the VMs to save money, because they will most likely come back with a different IP later. In particular, when deploying something like Hadoop on Azure it makes sense to only have the cluster running part of the time, and deallocate the VMs the rest of the time to avoid paying for idle uptime. Having IPs reshuffle because you accidentally used a different order, or because Microsoft decided to change the default way IPs are allocated, will make your cluster unusable.

The common way of dealing with this issue until now, was to deploy a DNS server inside it's own subnet, so that it will always get the first IP. This way you have a (sort of) static IP for your DNS. Then you can configure proper domain name resolution for the rest of the virtual network, and use FQDNs (because Couchbase can't handle short hostnames due to a quirk in how Erlang works.) VMs in the domain automatically register their names with the dynamic DNS when they start up, so IP changes are not a problem. This pattern is pretty well documented and there are plenty of blogs that cover the exact steps.

However, the problem is that this way also relies on the specific implementation of how Azure VNets hand out IPs. If Microsoft suddenly change that, your DNS can end up with a different IP address, and there goes your name resolution for the cluster. Besides, in a proper production deployment, you pretty much have to have two DNS servers for durability. Even if you use small VM instances, the costs add up.

There is now a new option. In the latest Windows Azure PowerShell SDK tools there are 4 new cmdlets that let you reserve static IPs for VMs in a virtual network. The cross-platform tools haven't been updated yet, and there's no such option in the management portal. So for now, you're limited to using the PowerShell or figuring out the exact HTTP request they send (using some HTTP capture tool, like Fiddler) and sending your own custom request.

Set-AzureStaticVNetIP - Reserves a static IP for a VM.
Get-AzureStaticVNetIP - Retrieves the reserved static IP of a VM.
Remove-AzureStaticVNetIP - Removes the reserved static IP.
Test-AzureStaticVNetIP - Checks whether the specified IP is available for a VM. Returns a list of alternative suggestions if the IP is unavailable.

Using these cmdlets is pretty simple. To reserve a static IP for an existing VM:

> $vm = Get-AzureVM -ServiceName CouchbaseSVC -Name CouchbaseVM1
> Set-AzureStaticVNetIP -VM $vm -IPAddress 10.0.0.4 | Update-AzureVM

Replace the service name, VM name, and IP address parameters with your own, of course. The other three cmdlets are pretty simple to use as well, just play around and give them a try.

Take a look at Niall Moran's blog post about using these cmdlets to set up a new VM, including a sample PowerShell script.

Update: Important note - do NOT mix VMs with static and dynamic (DHCP) IPs within the same subnet. If a machine with a reserved static IP is shutdown (deallocated), its IP can be allocated by DHCP to another VM. Put the two types of VMs in different subnets to avoid conflicts.

Lessons in Couchbase: The Danger of Underprovisioning

David Ostrovsky — Thu, 06 Mar 2014 19:07:02 GMT

It's story time! Recently, I consulted for a client on upgrading from their current big data infrastructure centered around Azure SQL and Azure Queues, to a more robust and scalable architecture, with Couchbase Server and HDInsight. As part of the Couchbase POC, we ran load tests on various configurations of Couchbase clusters on Azure.

The particular test I'm writing about involved loading a dataset of 100,000,000 documents into the database, and then testing how it handles 10k updates per second.

The test parameters were as follows:

100M documents
2.5k average JSON document size
Working set of 10% (10M items)
GUID-based key (38 bytes)
1 replica configured for the bucket
Azure worker role instances inserting data in a tight loop
4 x A5 Azure VM instances (CPU: 2 x 1.66Ghz, RAM: 14GB) running Windows Server 2012 R2
Data files on a striped 4 x Azure Blob 30GB volume (120GB total), which gives a theoretical throughput of ~2000 IOPS
Index files on the VM's temporary drive (D:)

In our case, using Azure blobs as storage, let's assume 30% headroom, and we're using the default 85% high water mark. Using the cluster sizing guidelines from the Couchbase documentation, gives us:

Metadata = 100M * (56 + 38) * 2 = 17.5GB
Total Data = 100M * 2.5K * 2 = 476.8GB
Working Set = 476.8 * 10% = 47.7GB
Total RAM required = (Metadata + Working Set) *  /   
Total RAM = (17.5 + 47.7) * 1.3 / 0.85 = 99.7GB

Using 14GB VMs, and leaving about 1GB for the OS, that's 13GB for Couchbase, meaning we need 99.7 / 13 = 7.67 ~= 8 nodes. We went with 4 to save costs and to see what will happen.

We created the 4-node cluster, and began provisioning worker roles to load the initial data. The first worker gave us an average of 4.5k store operations per second. Adding a second worker brought us to around 8k, and adding a third got us to just under 12k store ops per second.

With just one loader the disk write queues drained faster than they filled, with two loaders it was about even, with three the write queues kept growing continuously. It was pretty clear where the disk throughput cap was, so we brought it back down to 2 workers loading the data and left them alone for a while. The loading ran fine for the first 79M documents. Near the end of that range, we noticed that by then all the nodes were heavily prioritizing active data writes (which go into the disk write queue) over replication (which goes into the replication TAP queue) - the write queue had about 1.5M items in it and was keeping steady, but the replication TAP queue grew to over 3.5M items and kept increasing. Note that at this point, we had around 14GB of RAM taken up by the keys+metadata, and the rest by documents. Around 8% of the active documents were resident (cached), and less than 3% of the replicas.

That 79M items point is important, because when the RAM usage passed the default 85% High Water Mark, Couchbase started trying to evict documents from memory to make room. However, it never managed to successfully move down from the High Water Mark of RAM usage, the cluster stayed above 85% RAM use. Around 80M items loaded, the data loader worker roles began getting "Temporary failure" errors en masse. The "Temporary OOM" error counter in the bucket monitor spiked to about the number as the Ops per second counter. Clearly the nodes were rejecting store operations until the cluster could evict enough items from memory. At the same time, the backoff message counter also spiked to about 10k per second, and the replication TAP queue stopped draining, because the nodes were all sending "backoff" messages to each other. CPU usage went from around 60% per node, to a steady 100%.

At this point we stopped the worker role instances that were loading the data, but the cluster never recovered. Memory use never went down below the High Water Mark, the disk write queue stopped draining, and the TAP queue stayed at 4M items plus 1.5M waiting on disk.

To fix the problem we first tried adding another A5 instance node to the cluster, which failed. In hindsight, it's obvious why - rebalancing uses a TAP queue to transfer data between nodes. We monitored the bucket carefully, and the replication TAP queue never moved even a single item. The rebalance stayed at 0% complection. Deleting items to free up room failed as well. Leaving the cluster with no new incoming data overnight, in the hopes that it will manage to either free up memory through working set eviction, or by draining the disk write queue failed as well. The cluster remained in exactly the same condition through the night. At that point we considered either shutting down the entire cluster (accepting the loss of any unpersisted data), or using the cbbackup/cbrestore tools to backup the data, and then killing the cluster and building a new one.

We never finished the data loading stage of the test, so the 10k updates per second test wasn't even on the menu.

Important lessons learned:

It is possible to bring a Couchbase cluster into an irreversible failure state. I'm not saying this as criticism, but rather to reinforce the next point, which is:
Skimping on resources (RAM, disk IOPS) when provisioning a Couchbase cluster is very bad.
The RAM sizing guidelines in the Couchbase documentation are phrased as suggestions. They're not. They are holy commandments.
Overprovision your cluster. Then, if you see that it's handling the load well and has plenty of headroom, you can scale it down gradually.
Monitor. Monitor. Monitor! I can't emphasize this point enough. With proper monitoring, you have plenty of warning of the impending failure. You can see the disk queue growing. You can see that nodes are prioritizing active writes over replication. You can see that the TAP queues are growing fast. You can see when nodes start sending each other backoff messages. And finally, you can see that nodes begin returning temporary OOM errros.
Without monitoring, your first warning will be your application getting a spectacular number of "temporary failure" errors.
Always - always! - perform full-scale load testing with your real dataset before trying anything in production. Then double, or triple, the load and see how long it takes the cluster to begin to fail. Notice all the warning signs, and under what conditions the cluster fails, then draw appropriate conclustions about your future provisioning needs.
And finally, do not underprovision your Couchbase cluster!

Exporting data from Couchbase Server

David Ostrovsky — Thu, 20 Feb 2014 18:50:39 GMT

First post about Couchbase Server - gotta start somewhere, so I'll briefly talk about a usecase that came up recently.

Sometimes, you want to export your data from Couchbase. It might be because you want to have a backup in another system, or you want to use the data for another process. Whatever it is, you need to have the data in a format that's accessible without having a live Couchbase cluster up, and going through the SDKs.

There's actually a very simple way to do this, which relies on the fact that Couchbase backup utilities store data in SQLite database file format. You can find all the utilities in the Couchbase bin directory.
On Windows: \\Couchbase\Server\bin
On Linux: /opt/couchbase/bin

Step 1: Make a backup of your data with either cbbackup or cbtransfer. If you have a live cluster:

> cbbackup http://:8091  -u  -p  -b

If your Couchbase server is down, you can specify the data directory instead of the host URL, in couchstore-files protocol:

> cbbackup couchstore-files:///opt/couchbase/var/lib/couchbase/data   -u  -p  -b

Note that the latter will only backup the data of this particular node. Also, if you haven't specified another location for the data files, you'll need admin/root permissions to access it.

Step 2: Go to the directory, inside you'll find a folder named bucket-, and inside it a subfolder per node in the cluster. In each of the node subfolders, there are *.cbb files, which hold the backup data for that bucket/node.

Despite the *.cbb extension, these are plain SQLite files, which you can open with any SQLite browser utility, or the appropriate SQLite driver for your language/platform.

You can grab a free, open-source, SQLite editor here: http://sourceforge.net/projects/sqlitebrowser/

And there you go, all your data available to use ouside of Couchbase.