Enumerations

The error type of a BlurException.
UNKNOWNUnknown error.
QUERY_CANCELQuery has been cancel.
QUERY_TIMEOUTQuery has timed out.
BACK_PRESSUREServer has run out of memory and is trying to prevent a failure.
REQUEST_TIMEOUTThe TCP connection has timed out.


UNKNOWN0
QUERY_CANCEL1
QUERY_TIMEOUT2
BACK_PRESSURE3
REQUEST_TIMEOUT4

The scoring type used during a SuperQuery to score multi Record hits within a ColumnFamily.
SUPERDuring a multi Record match, a calculation of the best match Record plus how often it occurs within the match Row produces the score that is used in the scoring of the SuperQuery.
AGGREGATEDuring a multi Record match, the aggregate score of all the Records within a ColumnFamily is used in the scoring of the SuperQuery.
BESTDuring a multi Record match, the best score of all the Records within a ColumnFamily is used in the scoring of the SuperQuery.
CONSTANTA constant score of 1 is used in the scoring of the SuperQuery.


SUPER0
AGGREGATE1
BEST2
CONSTANT3

The state of a query.
RUNNINGQuery is running.
INTERRUPTEDQuery has been interrupted.
COMPLETEQuery is complete.


RUNNING0
INTERRUPTED1
COMPLETE2
BACK_PRESSURE_INTERRUPTED3

NOT_FOUNDBlur status UUID is not found.
FOUNDBlur status UUID is present.


NOT_FOUND0
FOUND1

Specifies the type of Row mutation that should occur during a mutation of a given Row.
DELETE_ROWIndicates that the entire Row is to be deleted. No changes are made if the specified row does not exist.
REPLACE_ROWIndicates that the entire Row is to be deleted, and then a new Row with the same id is to be added. If the specified row does not exist, the new row will still be created.
UPDATE_ROWIndicates that mutations of the underlying Records will be processed individually. Mutation will result in a BlurException if the specified row does not exist.


DELETE_ROW0
REPLACE_ROW1
UPDATE_ROW2

Specifies the type of Record mutation that should occur during a mutation of a given Record.
DELETE_ENTIRE_RECORDIndicates the Record with the given recordId in the given Row is to be deleted. If the target record does not exist, then no changes are made.
REPLACE_ENTIRE_RECORDIndicates the Record with the given recordId in the given Row is to be deleted, and a new Record with the same id is to be added. If the specified record does not exist the new record is still added.
REPLACE_COLUMNSReplace the columns that are specified in the Record mutation. If the target record does not exist then this mutation will result in a BlurException.
APPEND_COLUMN_VALUESAppend the columns in the Record mutation to the Record that could already exist. If the target record does not exist then this mutation will result in a BlurException.


DELETE_ENTIRE_RECORD0
REPLACE_ENTIRE_RECORD1
REPLACE_COLUMNS2
APPEND_COLUMN_VALUES3

The shard state, see shardServerLayoutOptions method in the Blur service for details.
OPENINGThe shard is opening.
OPENThe shard is open.
OPENING_ERRORAn error during the opening of the shard.
CLOSINGIn the process of closing.
CLOSEDThe shard is closed.
CLOSING_ERRORAn error during the closing of the shard.


OPENING0
OPEN1
OPENING_ERROR2
CLOSING3
CLOSED4
CLOSING_ERROR5


Data structures

KeyFieldTypeDescriptionRequirednessDefault value
1messagestringThe message in the exception. default
2stackTraceStrstringThe original stack trace (if any). default
3errorTypeErrorTypedefault

BlurException that carries a message plus the original stack trace (if any).

KeyFieldTypeDescriptionRequirednessDefault value
1namestringThe name of the column. default
2valuestringThe value to be indexed and stored. default

Column is the lowest storage element in Blur, it stores a single name and value pair.

KeyFieldTypeDescriptionRequirednessDefault value
1recordIdstringRecord id uniquely identifies a record within a single row. default
2familystringThe family in which this record resides. default
3columnslist<Column>A list of columns, multiple columns with the same name are allowed. default

Records contain a list of columns, multiple columns with the same name are allowed.

KeyFieldTypeDescriptionRequirednessDefault value
1idstringThe row id. default
2recordslist<Record>The list records within the row. If paging is used this list will only reflect the paged records from the selector. default
3recordCounti32The total record count for the row. If paging is used in a selector to page through records of a row, this count will reflect the entire row. default

Rows contain a list of records.

KeyFieldTypeDescriptionRequirednessDefault value
1querystringA Lucene syntax based query. default
2rowQueryboolIf the Row query is on, meaning the query will be perform against all the Records (joining records in some cases) and the result will be Rows (groupings of Record). default1
3scoreTypeScoreTypeThe scoring type, see the document on ScoreType for explanation of each score type. defaultUNKNOWN
4rowFilterstringThe Row filter (normal Lucene syntax), is a filter performed after the join to filter out entire Rows from the results. This field is ignored when rowQuery is false. default
5recordFilterstringThe Record filter (normal Lucene syntax), is a filter performed before the join to filter out Records from the results. default

The Query object holds the query string (normal Lucene syntax), filters and type of scoring (used when super query is on).

KeyFieldTypeDescriptionRequirednessDefault value
1queryQueryThe original query is required if used in the Blur.fetchRow call. If the highlightOptions is used in a call to Blur.query then the Query passed into the call via the BlurQuery will be used if this query is null. So that means if you use highlighting from the query call you can leave this attribute null and it will default to the normal behavior. default
2preTagstringThe pre tag is the tag that marks the beginning of the highlighting. default"<<<"
3postTagstringThe post tag is the tag that marks the end of the highlighting. default">>>"

The HighlightOptions controls how the data is fetched and returned.

KeyFieldTypeDescriptionRequirednessDefault value
1recordOnlyboolFetch the Record only, not the entire Row. default
2locationIdstringWARNING: This is an internal only attribute and is not intended for use by clients. The location id of the Record or Row to be fetched. default
3rowIdstringThe row id of the Row to be fetched, not to be used with location id. default
4recordIdstringThe record id of the Record to be fetched, not to be used with location id. However the row id needs to be provided to locate the correct Row with the requested Record. default
5columnFamiliesToFetchlist<string>The column families to fetch. If null, fetch all. If empty, fetch none. default
6columnsToFetchmap<string, set<string>>The columns in the families to fetch. If null, fetch all. If empty, fetch none. default
8startRecordi32Only valid for Row fetches, the record in the row to start fetching. If the row contains 1000 records and you want the first 100, then this value is 0. If you want records 300-400 then this value would be 300. If startRecord is beyond the end of the row, the row will be null in the FetchResult. Used in conjunction with maxRecordsToFetch. default0
9maxRecordsToFetchi32Only valid for Row fetches, the number of records to fetch. If the row contains 1000 records and you want the first 100, then this value is 100. If you want records 300-400 then this value would be 100. Used in conjunction with maxRecordsToFetch. By default this will fetch the first 1000 records of the row. default1000
10highlightOptionsHighlightOptionsThe HighlightOptions object controls how the data is highlighted. If null no highlighting will occur. default

Select carries the request for information to be retrieved from the stored columns.

KeyFieldTypeDescriptionRequirednessDefault value
1rowRowThe row fetched. default

FetchRowResult contains row result from a fetch.

KeyFieldTypeDescriptionRequirednessDefault value
1rowidstringThe row id of the record being fetched. default
2recordRecordThe record fetched. default

FetchRecordResult contains rowid of the record and the record result from a fetch.

KeyFieldTypeDescriptionRequirednessDefault value
1existsboolTrue if the result exists, false if it doesn't. default
2deletedboolIf the row was marked as deleted. default
3tablestringThe table the fetch result came from. default
4rowResultFetchRowResultThe row result if a row was selected form the Selector. default
5recordResultFetchRecordResultThe record result if a record was selected form the Selector. default

FetchResult contains the row or record fetch result based if the Selector was going to fetch the entire row or a single record.

KeyFieldTypeDescriptionRequirednessDefault value
1queryStrstringThe facet query. default
2minimumNumberOfBlurResultsi64The minimum number of results before no longer processing the facet. This is a good way to decrease the strain on the system while using many facets. For example if you set this attribute to 1000, then the shard server will stop processing the facet at the 1000 mark. However because this is processed at the shard server level the controller will likely return more than the minimum because it sums the answers from the shard servers. default9223372036854775807

Blur facet.

KeyFieldTypeDescriptionRequirednessDefault value
1queryQueryThe query information. default
3facetslist<Facet>A list of Facets to execute with the given query. default
4selectorSelectorSelector is used to fetch data in the search results, if null only location ids will be fetched. default
6useCacheIfPresentboolEnabled by default to use a cached result if the query matches a previous run query with the configured amount of time. default1
7starti64The starting result position, 0 by default. default0
8fetchi32The number of fetched results, 10 by default. default10
9minimumNumberOfResultsi64The minimum number of results to find before returning. default9223372036854775807
10maxQueryTimei64The maximum amount of time the query should execute before timing out. default9223372036854775807
11uuidstringSets the uuid of this query, this is normal set by the client so that the status of a running query can be found or the query can be canceled. default
12userContextstringSets a user context, only used for logging at this point. default
13cacheResultboolEnabled by default to cache this result. False would not cache the result. default1
14startTimei64Sets the start time, if 0 the controller sets the time. default0

The Blur Query object that contains the query that needs to be executed along with the query options.

KeyFieldTypeDescriptionRequirednessDefault value
1locationIdstringWARNING: This is an internal only attribute and is not intended for use by clients. default
2scoredoubleThe score for the hit in the query. default
3fetchResultFetchResultThe fetched result if any. default

The BlurResult carries the score, the location id and the fetched result (if any) form each query.

KeyFieldTypeDescriptionRequirednessDefault value
1totalResultsi64The total number of hits in the query. default0
2shardInfomap<string, i64>Hit counts from each shard in the table. default
3resultslist<BlurResult>The query results. default
4facetCountslist<i64>The faceted count. default
5exceptionslist<BlurException>Not currently used, a future feature could allow for partial results with errors. default
6queryBlurQueryThe original query. default

BlurResults holds all information resulting from a query.

KeyFieldTypeDescriptionRequirednessDefault value
1recordMutationTypeRecordMutationTypeDefine how to mutate the given Record. defaultUNKNOWN
2recordRecordThe Record to mutate. default

The RowMutation defines how the given Record is to be mutated.

KeyFieldTypeDescriptionRequirednessDefault value
1tablestringThe table that the row mutation is to act upon. default
2rowIdstringThe row id that the row mutation is to act upon. default
3walboolWrite ahead log, by default all updates are written to a write ahead log before the update is applied. That way if a failure occurs before the index is committed the WAL can be replayed to recover any data that could have been lost. default1
4rowMutationTypeRowMutationTypeThe RowMutationType to define how to mutate the given Row. defaultUNKNOWN
5recordMutationslist<RecordMutation>The RecordMutations if any for this Row. default
6waitToBeVisibleboolOn mutate waits for the mutation to be visible to queries and fetch requests. default0

The RowMutation defines how the given Row is to be mutated.

KeyFieldTypeDescriptionRequirednessDefault value
1cpuTimei64The total cpu time for the query on the given shard. default
2realTimei64The real time of the query execution for a given shard. default

Holds the cpu time for a query executing on a single shard in a table.

KeyFieldTypeDescriptionRequirednessDefault value
1queryBlurQueryThe original query. default
2cpuTimesmap<string, CpuTime>A map of shard names to CpuTime, one for each shard in the table. default
3completeShardsi32The number of completed shards. The shard server will respond with how many are complete on that server, while the controller will aggregate all the shard server completed totals together. default
4totalShardsi32The total number of shards that the query is executing against. The shard server will respond with how many shards are being queried on that server, while the controller will aggregate all the shard server totals together. default
5stateQueryStateThe state of the query. e.g. RUNNING, INTERRUPTED, COMPLETE default
6uuidstringThe uuid of the query. default
7statusStatusThe status of the query NOT_FOUND if uuid is not found else FOUND default

The BlurQueryStatus object hold the status of BlurQueries. The state of the query (QueryState), the number of shards the query is executing against, the number of shards that are complete, etc.

KeyFieldTypeDescriptionRequirednessDefault value
1tableNamestringThe table name. default
2bytesi64The size in bytes. default
3recordCounti64The record count. default
4rowCounti64The row count. default

TableStats holds the statistics for a given table.

KeyFieldTypeDescriptionRequirednessDefault value
1familystringRequired. The family that this column exists within. default
2columnNamestringRequired. The column name. default
3subColumnNamestringIf this column definition is for a sub column then provide the sub column name. Otherwise leave this field null. default
4fieldLessIndexedboolIf this column should be searchable without having to specify the name of the column in the query. NOTE: This will index the column as a full text field in a default field, so that means it's going to be indexed twice. default
5fieldTypestringThe field type for the column. The built in types are:
  • text - Full text indexing.
  • string - Indexed string literal
  • int - Converted to an integer and indexed numerically.
  • long - Converted to an long and indexed numerically.
  • float - Converted to an float and indexed numerically.
  • double - Converted to an double and indexed numerically.
  • stored - Not indexed, only stored.
default
6propertiesmap<string, string>For any custom field types, you can pass in configuration properties. default

The ColumnDefinition defines how a given Column should be interpreted (indexed/stored)

KeyFieldTypeDescriptionRequirednessDefault value
1tablestringThe table name. default
2familiesmap<string, map<string, ColumnDefinition>>Families and the column definitions within them. default

The current schema of the table.

KeyFieldTypeDescriptionRequirednessDefault value
1enabledboolIs the table enabled or not, enabled by default. default1
3shardCounti32The number of shards within the given table. default1
4tableUristringThe location where the table should be stored this can be "file:///" for a local instance of Blur or "hdfs://" for a distributed installation of Blur. default
7clusterstringThe cluster where this table should be created. default"default"
8namestringThe table name. default
9similarityClassstringSets the similarity class in Lucene. default
10blockCachingboolShould block cache be enable or disabled for this table. default1
11blockCachingFileTypesset<string>The files extensions that you would like to allow block cache to cache. If null (default) everything is cached. default
12readOnlyboolIf a table is set to be readonly, that means that mutates through Thrift are NOT allowed. However updates through MapReduce are allowed and in fact they are only allowed if the table is in readOnly mode. default0
13preCacheColslist<string>This map sets what column families and columns to prefetch into block cache on shard open. default
14tablePropertiesmap<string, string>The table properties that can modify the default behavior of the table. TODO: Document all options. default
15strictTypesboolWhether strict types are enabled or not (default). If they are enabled no column can be added without first having it's type defined. default0
16defaultMissingFieldTypestringIf strict is not enabled, the default field type. default"text"
17defaultMissingFieldLessIndexingboolIf strict is not enabled, defines whether or not field less indexing is enabled on the newly created fields. default1
18defaultMissingFieldPropsmap<string, string>If strict is not enabled, defines the properties to be used in the new field creation. default

The table descriptor defines the base structure of the table as well as properties need for setup.

KeyFieldTypeDescriptionRequirednessDefault value
1namestringmetric name. default
2strMapmap<string, string>map of string values emitted by the Metric. default
3longMapmap<string, i64>map of long values emitted by the Metric. default
4doubleMapmap<string, double>map of double values emitted by the Metric. default

The Metric will hold all the information for a given Metric.


Services

Service: Blur

The Blur service API. This API is the same for both controller servers as well as shards servers. Each of the methods are documented.

void createTable(TableDescriptor tableDescriptor)
throws BlurException
Creates a table with the given TableDescriptor.

Parameters

NameDescription
tableDescriptorthe TableDescriptor.

void enableTable(string table)
throws BlurException
Enables the given table, blocking until all shards are online.

Parameters

NameDescription
tablethe table name.

void disableTable(string table)
throws BlurException
Disables the given table, blocking until all shards are offline.

Parameters

NameDescription
tablethe table name.

void removeTable(string table,
bool deleteIndexFiles)
throws BlurException
Removes the given table, with an optional to delete the underlying index storage as well.

Parameters

NameDescription
tablethe table name.
deleteIndexFilestrue to remove the index storage and false if to preserve.

bool addColumnDefinition(string table,
ColumnDefinition columnDefinition)
throws BlurException
Attempts to add a column definition to the given table. @return true if successfully defined false if not.

Parameters

NameDescription
tablethe name of the table.
columnDefinitionthe ColumnDefinition.

list<string> tableList()
throws BlurException
Returns a list of the table names across all shard clusters. @return list of all tables in all shard clusters.

list<string> tableListByCluster(string cluster)
throws BlurException
Returns a list of the table names for the given cluster. @return list of all the tables within the given shard cluster.

Parameters

NameDescription
clusterthe cluster name.

TableDescriptor describe(string table)
throws BlurException
Returns a table descriptor for the given table. @return the TableDescriptor.

Parameters

NameDescription
tablethe table name.

Schema schema(string table)
throws BlurException
Gets the schema for a given table. @return Schema.

Parameters

NameDescription
tablethe table name.

string parseQuery(string table,
Query query)
throws BlurException
Parses the given query and returns the string that represents the query. @return string representation of the parsed query.

Parameters

NameDescription
tablethe table name.
querythe query to parse.

TableStats tableStats(string table)
throws BlurException
Gets the table stats for the given table. @return TableStats.

Parameters

NameDescription
tablethe table name.

void optimize(string table,
i32 numberOfSegmentsPerShard)
throws BlurException
Will perform a forced optimize on the index in the given table.

Parameters

NameDescription
tabletable the name of the table.
numberOfSegmentsPerShardthe maximum of segments per shard index after the operation is completed.

void createSnapshot(string table,
string name)
throws BlurException
Creates a snapshot for the table with the given name

void removeSnapshot(string table,
string name)
throws BlurException
Removes a previous snapshot(identified by name) of the table

map<string, list<string>> listSnapshots(string table)
throws BlurException
Returns a map where the key is the shard, and the list is the snapshots within that shard

BlurResults query(string table,
BlurQuery blurQuery)
throws BlurException
Executes a query against a the given table and returns the results. If this method is executed against a controller the results will contain the aggregated results from all the shards. If this method is executed against a shard server the results will only contain aggregated results from the shards of the given table that are being served on the shard server, if any. @return the BlurResults.

Parameters

NameDescription
tablethe table name.
blurQuerythe query to execute.

FetchResult fetchRow(string table,
Selector selector)
throws BlurException
Fetches a Row or a Record in the given table with the given Selector. @return the FetchResult.

Parameters

NameDescription
tablethe table name.
selectorthe Selector to use to fetch the Row or Record.

void mutate(RowMutation mutation)
throws BlurException
Mutates a Row given the RowMutation that is provided.

Parameters

NameDescription
mutationthe RowMutation.

void mutateBatch(list<RowMutation> mutations)
throws BlurException
Mutates a group of Rows given the list of RowMutations that are provided. Note: This is not an atomic operation.

Parameters

NameDescription
mutationsthe batch of RowMutations.

void cancelQuery(string table,
string uuid)
throws BlurException
Cancels a query that is executing against the given table with the given uuid. Note, the cancel call maybe take some time for the query actually stops executing.

Parameters

NameDescription
tablethe table name.
uuidthe uuid of the query.

list<string> queryStatusIdList(string table)
throws BlurException
Returns a list of the query ids of queries that have recently been executed for the given table. @return list of all the uuids of the queries uuids.

Parameters

NameDescription
tablethe table name.

BlurQueryStatus queryStatusById(string table,
string uuid)
throws BlurException
Returns the query status for the given table and query uuid. @return fetches the BlurQueryStatus for the given table and uuid.

Parameters

NameDescription
tablethe table name.
uuidthe uuid of the query.

list<string> terms(string table,
string columnFamily,
string columnName,
string startWith,
i16 size)
throws BlurException
Gets the terms list from the index for the given table, family, column using the startWith value to page through the results. This method only makes sense to use with string and text field types. @return the list of terms for the given column.

Parameters

NameDescription
tablethe table name.
columnFamilythe column family. If the frequency requested is a system field like "rowid", "recordid", "family", etc then columnFamily can be null.
columnNamethe column name.
startWiththe term to start with assuming that you are paging through the term list.
sizethe number to fetch at once.

i64 recordFrequency(string table,
string columnFamily,
string columnName,
string value)
throws BlurException
Gets the record frequency for the provided table, family, column and value. @return the count for the entire table.

Parameters

NameDescription
tablethe table name.
columnFamilythe column family. If the frequency requested is a system field like "rowid", "recordid", "family", etc then columnFamily can be null.
columnNamethe column name.
valuethe value.

list<string> shardClusterList()
throws BlurException
Returns a list of all the shard clusters. @return list of all the shard clusters.

list<string> shardServerList(string cluster)
throws BlurException
Returns a list of all the shard servers for the given cluster. @return list of all the shard servers within the cluster.

Parameters

NameDescription
clusterthe cluster name.

list<string> controllerServerList()
throws BlurException
Returns a list of all the controller servers. @return list of all the controllers.

map<string, string> shardServerLayout(string table)
throws BlurException
Returns a map of the layout of the given table, where the key is the shard name and the value is the shard server.

This method will return the "correct" layout for the given shard, or the "correct" layout of cluster if called on a controller.

The meaning of correct:
Given the current state of the shard cluster with failures taken into account, the correct layout is what the layout should be given the current state. In other words, what the shard server should be serving. The act of calling the shard server layout method with the NORMAL option will block until the layout shard server matches the correct layout. Meaning it will block until indexes that should be open are open and ready for queries. However indexes are lazily closed, so if a table is being disabled then the call will return immediately with an empty map, but the indexes may not be close yet.

@return map of shards in a table to the shard servers.

Parameters

NameDescription
tablethe table name.

map<string, map<string, ShardState>> shardServerLayoutState(string table)
throws BlurException
Returns a map of the layout of the given table, where the key is the shard name and the value is the shard server.

This method will return immediately with what shards are currently open in the shard server. So if a shard is being moved to another server and is being closed by this server it WILL be returned in the map. The shardServerLayout method would not return the shard given the same situation. @return map of shards to a map of shard servers with the state of the shard.

Parameters

NameDescription
tablethe table name.

bool isInSafeMode(string cluster)
throws BlurException
Checks to see if the given cluster is in safemode. @return boolean.

Parameters

NameDescription
clusterthe name of the cluster.

map<string, string> configuration()
throws BlurException
Fetches the Blur configuration. @return Map of property name to value.

map<string, Metric> metrics(set<string> metrics)
throws BlurException
Fetches the Blur metrics by name. If the metrics parameter is null all the Metrics are returned. @return Map of metric name to Metric.

Parameters

NameDescription
metricsthe names of the metrics to return. If null all are returned.