You will at a minimum need the following:

  • Java 6 or Java 7 (Java 7 is recommended)

Setup passphraseless ssh

These instructions are taken from the Hadoop Quick Start Guide.

Now check that you can ssh to the localhost without a passphrase:

ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Heads Up!

Also you will need to know the location of the JAVA_HOME directory.

Download Source and Binary Artifacts

Both the source and binary artifacts are provided via mirrors here:

Apache Blur 0.2.3 Source Apache Blur 0.2.3 Hadoop1 Binary Apache Blur 0.2.3 Hadoop2 Binary

If building from source, the distribution needs to be compiled before use

Clone master

git clone https://git-wip-us.apache.org/repos/asf/incubator-blur.git

Hadoop 1

Build the artifacts for Hadoop 1 (if you want to run the tests remove the "-DskipTests")

cd incubator-blur/
mvn install -DskipTests -Dhadoop1

The binary artifact is located distribution/target/apache-blur-0.2.3-incubating-hadoop1-bin.tar.gz.

Hadoop 2

Heads Up!

While all the tests pass on Hadoop 2, Blur has not be tested at scale on Hadoop 2 and bin/blur-config.sh script will likely require modification to include the correct Hadoop 2 libraries.

Build the artifacts for Hadoop 2 (if you want to run the tests remove the "-DskipTests")

cd incubator-blur/
mvn install -DskipTests -Dhadoop2

The binary artifact is located distribution/target/apache-blur-0.2.3-incubating-hadoop2-bin.tar.gz.

Once a distribution is available, follow the simple steps to install.

Extract the contents of the distribution

tar -xzvf apache-blur-*-bin.tar.gz
While it's not required it is a good idea to set BLUR_HOME in your environment variables.

For bash edit .bash_profile and add:

export BLUR_HOME=<directory where Blur was extracted>

There are a few things at a minimum that will need to be configured to start Apache Blur

Edit $BLUR_HOME/conf/blur-env.sh and set JAVA_HOME:

export JAVA_HOME=<Java Home Directory>

Caution

If this variable is not set, then the script will attempt to locate JAVA_HOME by using the location of the "java" command.

Starting Apache blur is a simple one command step

To start Apache Blur run the following command:

$BLUR_HOME/bin/start-all.sh

This will start a single Controller server and a single Shard server on your localhost.

You should see:

blur@blurvm:~$ apache-blur-0.2.3-incubating/bin/start-all.sh 
localhost: ZooKeeper starting as process 6650.
localhost: Shard [0] starting as process 6783.
localhost: Controller [0] starting as process 6933.

If you run the start command again you should see:

blur@blurvm:~$ apache-blur-0.2.3-incubating/bin/stop-all.sh 
localhost: Stopping Controller [0] server with pid [6933].
localhost: Stopping Shard [0] server with pid [6783].
localhost: Stopping ZooKeeper with pid [6650].

If you see it starting the servers again, then there is likely some issue with startup. Look in the $BLUR_HOME/logs directory for log and out files.

Once the servers have been started, you can use the shell to interact with Blur.

The shell command can be found in the bin directory

Auto detect the controller servers from the $BLUR_HOME/conf/controllers file

$BLUR_HOME/bin/blur shell

You can also explicitly call out the controller servers.

$BLUR_HOME/bin/blur shell controller1:40010,controller2:40010

Once in the shell, tables can be created, enabled, disabled, and removed. Type help to get a list of the commands.

The below example creates a table and stores the contents of the table in a local directory of /data/testTableName which will only work if you are running blur in a single instance. Normally if you are running a hadoop cluster this will be a hdfs URI for example hdfs://host:port/blur/tables/testTableName.

Create Table

blur> #Creates a table called testtable in the hdfs directory of /data/testtable with 11 shards
blur> create -t testtable -c 11 -l hdfs://namenode/data/testtable

Note

The local directory can be used however the integrity of the data may be compromised.

blur> #Creates a table called testtable in the local directory of /data/testtable with 11 shards
blur> create -t testtable -c 11 -l file:///data/testtable

Mutate

blur> #Adds a row to testtable
blur> mutate testtable rowid1 recordid1 fam0 col1:value1

Query

blur> #Runs a query on testtable
blur> query testtable fam0.col1:value1
 - Results Summary -
    total : 1
    time  : 7.874 ms
-----------------------------------------------------------------------------------------------------
      hit : 0
    score : 1.4142135381698608
       id : rowid1
 recordId : recordid1
   family : fam0
     col1 : value1
-----------------------------------------------------------------------------------------------------
 - Results Summary -
    total : 1
    time  : 7.874 ms

Enable Highlighting

blur> #Turns highlighting on
blur> highlight
highlight of query command is now on

Query with Highlights

blur> #Runs a query on testtable with highlighting on, notice <<<value1>>> is highlighted 
blur> query testtable2 fam0.col1:value1
 - Results Summary -
    total : 1
    time  : 13.395 ms
-----------------------------------------------------------------------------------------------------
      hit : 0
    score : 1.4142135381698608
       id : rowid1
 recordId : recordid1
   family : fam0
     col1 : <<<value1>>>
-----------------------------------------------------------------------------------------------------
 - Results Summary -
    total : 1
    time  : 13.395 ms
blur>