The following examples are of using the Thrift API directly. You will need to following libraries at a minimum:

  • blur-thrift-*.jar
  • blur-util-*.jar
  • slf4j-api-1.6.1.jar
  • slf4j-log4j12-1.6.1.jar
  • commons-logging-1.1.1.jar
  • log4j-1.2.15.jar

Note

Other versions of these libraries could work, but these are the versions that Blur currently uses.

Getting A Client Example


Connection String

The connection string can be parsed or constructed through "Connection" object. If you are using the parsed version there are some options. At a minimum you will have to provide hostname and port:
host1:40010
You can list multiple hosts:
host1:40010,host2:40010
You can add a SOCKS proxy server for each host:
host1:40010/proxyhost1:6001
You can also add a timeout on the socket of 90 seconds (the default is 60 seconds):
host1:40010/proxyhost1:6001#90000
Multiple hosts with a different timeout:
host1:40010,host2:40010,host3:40010#90000
Here is all options together:
host1:40010/proxyhost1:6001,host2:40010/proxyhost1:6001#90000

Thrift Client

Client Example 1:
Iface client = BlurClient.getClient("controller1:40010,controller2:40010");
Client Example 2:
Connection connection = new Connection("controller1:40010");
Iface client = BlurClient.getClient(connection);
Client Example 3:
BlurClientManager.execute("controller1:40010,controller2:40010", new BlurCommand<T>() {
  @Override
  public T call(Client client) throws BlurException, TException {
	// your code here...
  }
});
Client Example 4:
List<Connection> connections = BlurClientManager.getConnections("controller1:40010,controller2:40010");
BlurClientManager.execute(connections, new BlurCommand<T>() {
  @Override
  public T call(Client client) throws BlurException, TException {
	// your code here...
  }
});

Query Example

This is a simple example of how to run a query via the Thrift API and get back search results. By default the first 10 results are returned with only row ids to the results.

Iface client = BlurClient.getClient("controller1:40010,controller2:40010");

Query query = new Query();
query.setQuery("+docs.body:\"Hadoop is awesome\"");

BlurQuery blurQuery = new BlurQuery();
blurQuery.setQuery(query);

BlurResults results = client.query("table1", blurQuery);
System.out.println("Total Results: " + results.totalResults);
for (BlurResult result : results.getResults()) {
  System.out.println(result);
}

Query Example with Data

This is an example of how to run a query via the Thrift API and get back search results with data. All the columns in the "fam0" family are returned for each Record in the Row.

Iface client = BlurClient.getClient("controller1:40010,controller2:40010");

Query query = new Query();
query.setQuery("+docs.body:\"Hadoop is awesome\"");

Selector selector = new Selector();

// This will fetch all the columns in family "fam0".
selector.addToColumnFamiliesToFetch("fam0");

// This will fetch the "col1", "col2" columns in family "fam1".
Set cols = new HashSet();
cols.add("col1");
cols.add("col2");
selector.putToColumnsToFetch("fam1", cols);

BlurQuery blurQuery = new BlurQuery();
blurQuery.setQuery(query);
blurQuery.setSelector(selector);

BlurResults results = client.query("table1", blurQuery);
System.out.println("Total Results: " + results.totalResults);
for (BlurResult result : results.getResults()) {
  System.out.println(result);
}

Fetch Data

This is an example of how to fetch data via the Thrift API. All the records of the Row "rowid1" are returned. If it is not found then Row would be null.

Iface client = BlurClient.getClient("controller1:40010,controller2:40010");

Selector selector = new Selector();
selector.setRowId("rowid1");

FetchResult fetchRow = client.fetchRow("table1", selector);
FetchRowResult rowResult = fetchRow.getRowResult();
Row row = rowResult.getRow();
for (Record record : row.getRecords()) {
  System.out.println(record);
}

Mutate Example

This is an example of how to perform a mutate on a table and either add or replace an existing Row.

Iface client = BlurClient.getClient("controller1:40010,controller2:40010");

Record record1 = new Record();
record1.setRecordId("recordid1");
record1.setFamily("fam0");
record1.addToColumns(new Column("col0", "val0"));
record1.addToColumns(new Column("col1", "val1"));
    
Record record2 = new Record();
record2.setRecordId("recordid2");
record2.setFamily("fam1");
record2.addToColumns(new Column("col4", "val4"));
record2.addToColumns(new Column("col5", "val5"));
    
List recordMutations = new ArrayList();
    
recordMutations.add(new RecordMutation(RecordMutationType.REPLACE_ENTIRE_RECORD, record1));
recordMutations.add(new RecordMutation(RecordMutationType.REPLACE_ENTIRE_RECORD, record2));

// This will replace the exiting Row of "rowid1" (if one exists) in table "table1". It will
// write the mutate to the write ahead log (WAL) and it will not block waiting for the 
// mutate to become visible. 
RowMutation mutation = new RowMutation("table1", "rowid1", true, RowMutationType.REPLACE_ROW,
                                       recordMutations, false);
mutation.setRecordMutations(recordMutations);
    
client.mutate(mutation);

Shortened Mutate Example

This is the same example as above but is shorted with a help class.

import static org.apache.blur.thrift.util.BlurThriftHelper.*;

Iface client = BlurClient.getClient("controller1:40010,controller2:40010");

// This will replace the exiting Row of "rowid1" (if one exists) in table "table1". It will
// write the mutate to the write ahead log (WAL) and it will not block waiting for the 
// mutate to become visible. 
RowMutation mutation = newRowMutation("table1", "rowid1",
    newRecordMutation("fam0", "recordid1", newColumn("col0", "val0"), newColumn("col1", "val2")),
    newRecordMutation("fam1", "recordid2", newColumn("col4", "val4"), newColumn("col5", "val4")));

client.mutate(mutation);

The shell can be invoked by running:

$BLUR_HOME/bin/blur shell
Also any shell command can be invoked as a cli command by running:
$BLUR_HOME/bin/blur <command>
# For example to get help
$BLUR_HOME/bin/blur help
The following rules are used when interacting with the shell:
  • Arguments are denoted by "< >".
  • Optional arguments are denoted by "[ ]".
  • Options are denoted by "-".
  • Multiple options / arguments are denoted by "*".

Table Commands

create

Description: Create the named table. Run -h for full argument list.

create -t <tablename> -c <shardcount>

enable

Description: Enable the named table.

enable <tablename>

disable

Description: Disable the named table.

disable <tablename>

remove

Description: Remove the named table.

remove <tablename>

truncate

Description: Truncate the named table.

truncate <tablename>

describe

Description: Describe the named table.

describe <tablename>

list

Description: List tables.

list 

schema

Description: Schema of the named table.

schema <tablename>

stats

Description: Print stats for the named table.

stats <tablename>

layout

Description: List the server layout for a table.

layout <tablename>

parse

Description: Parse a query and return string representation.

parse <tablename> <query>

definecolumn

Description: Defines a new column in the named table.

definecolumn <table name> <family> <column name> <type> [-s <sub column name>] [-F] [-p name value]*

Data Commands

query

Description: Query the named table.

query <tablename> <query>

get

Description: display the specified row

get <tablename> <rowid>

mutate

Description: Mutate the specified row.

mutate <tablename> <rowid> <recordid> <columnfamily> <columnname>:<value>*

delete

Description: Delete the specified row.

delete <tablename> <rowid>

highlight

Description: Toggle highlight of query output on/off.

highlight 

selector

Description: Manage the default selector.

selector reset | add <family> [<columnName>*]

Cluster Commands

controllers

Description: List controllers.

controllers 

shards

Description: list shards

shards <clustername>

clusterlist

Description: List the clusters.

clusterlist 

cluster

Description: Set the cluster in use.

cluster <clustername>

safemodewait

Description: Wait for safe mode to exit.

safemodewait [<clustername>]

top

Description: Top for watching shard clusters.

top [<cluster>]

Shell Commands

help

Description: Display help.

help 

debug

Description: Toggle debugging on/off.

debug 

timed

Description: Toggle timing of commands on/off.

timed 

quit

Description: Exit the shell.

quit 

reset

Description: Resets the terminal window.

reset 

Here is an example of the typical usage of the BlurOutputFormat. The Blur table has to be created before the MapReduce job is started. The setupJob method configures the following:

  • The reducer class to be DefaultBlurReducer
  • The number of reducers to be equal to the number of shards in the table.
  • The output key class to a standard Text writable from the Hadoop library
  • The output value class is a BlurMutate writable from the Blur library
  • The output format to be BlurOutputFormat
  • Sets the TableDescriptor in the Configuration
  • Sets the output path to the TableDescriptor.getTableUri() value
  • Also the job will use the BlurOutputCommitter class to commit or rollback the MapReduce job

Example Usage

Iface client = BlurClient.getClient("controller1:40010");

TableDescriptor tableDescriptor = client.describe(tableName);

Job job = new Job(jobConf, "blur index");
job.setJarByClass(BlurOutputFormatTest.class);
job.setMapperClass(CsvBlurMapper.class);
job.setInputFormatClass(TextInputFormat.class);

FileInputFormat.addInputPath(job, new Path(input));
CsvBlurMapper.addColumns(job, "cf1", "col");

BlurOutputFormat.setupJob(job, tableDescriptor);
BlurOutputFormat.setIndexLocally(job, true);
BlurOutputFormat.setOptimizeInFlight(job, true);

job.waitForCompletion(true);

Options

  • BlurOutputFormat.setIndexLocally(Job,boolean)
    • Enabled by default, this will enable local indexing on the machine where the task is running. Then when the RecordWriter closes the index is copied to the remote destination in HDFS.
  • BlurOutputFormat.setMaxDocumentBufferSize(Job,int)
    • Sets the maximum number of documents that the buffer will hold in memory before overflowing to disk. By default this is 1000 which will probably be very low for most systems.
  • BlurOutputFormat.setOptimizeInFlight(Job,boolean)
    • Enabled by default, this will optimize the index while copying from the local index to the remote destination in HDFS. Used in conjunction with the setIndexLocally.
  • BlurOutputFormat.setReducerMultiplier(Job,int)
    • This will multiple the number of reducers for this job. For example if the table has 256 shards the normal number of reducers is 256. However if the reducer multiplier is set to 4 then the number of reducers will be 1024 and each shard will get 4 new segments instead of the normal 1.

The CSV Loader program can be invoked by running:

$BLUR_HOME/bin/blur csvloader

Caution

Also the machine that will execute this command will need to have Hadoop installed and configured locally, otherwise the scripts will not work correctly.
usage: csvloader
The "csvloader" command is used to load delimited into a Blur table.
The required options are "-c", "-t", "-d". The standard format for the contents of a file
is:"rowid,recordid,family,col1,col2,...". However there are several options, such as the rowid and
recordid can be generated based on the data in the record via the "-A" and "-a" options. The family
can assigned based on the path via the "-I" option. The column name order can be mapped via the "-d"
option. Also you can set the input format to either sequence files vie the "-S" option or leave the
default text files.
 -A                     No Row Ids - Automatically generate row ids for each record based on a MD5
                        has of the data within the record.
 -a                     No Record Ids - Automatically generate record ids for each record based on a
                        MD5 has of the data within the record.
 -b <size>              The maximum number of Lucene documents to buffer in the reducer for a single
                        row before spilling over to disk. (default 1000)
 -c <controller*>       * Thrift controller connection string. (host1:40010 host2:40010 ...)
 -C <minimum maximum>   Enables a combine file input to help deal with many small files as the
                        input. Provide the minimum and maximum size per mapper.  For a minimum of
                        1GB and a maximum of 2.5GB: (1000000000 2500000000)
 -d <family column*>    * Define the mapping of fields in the CSV file to column names. (family col1
                        col2 col3 ...)
 -I <family path*>      The directory to index with a family name, the family name is assumed to NOT
                        be present in the file contents. (family hdfs://namenode/input/in1)
 -i <path*>             The directory to index, the family name is assumed to BE present in the file
                        contents. (hdfs://namenode/input/in1)
 -l                     Disable the use storage local on the server that is running the reducing
                        task and copy to Blur table once complete. (enabled by default)
 -o                     Disable optimize indexes during copy, this has very little overhead.
                        (enabled by default)
 -p <codec>             Sets the compression codec for the map compress output setting.
                        (SNAPPY,GZIP,BZIP,DEFAULT, or classname)
 -r <multiplier>        The reducer multipler allows for an increase in the number of reducers per
                        shard in the given table.  For example if the table has 128 shards and the
                        reducer multiplier is 4 the total number of reducers will be 512, 4 reducers
                        per shard. (default 1)
 -s <delimiter>         The file delimiter to be used. (default value ',')  NOTE: For special
                        charactors like the default hadoop separator of ASCII value 1, you can use
                        standard java escaping (\u0001)
 -S                     The input files are sequence files.
 -t <tablename>         * Blur table name.

The JDBC driver is very experimental and is currently read-only. It has a very basic SQL-ish language that should allow for most Blur queries. Basic SQL syntax will work for example:

select * from testtable where fam1.col1 = 'val1'
You may also use Lucene syntax by wrapping the Lucene query in a "query()" function:
select * from testtable where query(fam1.col1:val?)
Here is a screenshot of the JDBC driver in SQuirrel: