Blur unlocks your organization's big data and safely puts data in the hands of your analysts with its unique feature set:
- Fast data ingestion
- Hierarchical data storage
- Record-level access control
- Paged results
- Quick search
- Boolean search logic
- Fuzzy searches
- Wildcard searches
- Term statistics
- Term lists
Blur leverages multiple open source projects including Hadoop, HDFS, Lucene, Thrift and Zookeeper to create an environment where structured data can be transformed into a sharded index that runs on a distributed (cloud) computing environment.
Using map/reduce jobs on a Hadoop data processing cluster, you can create custom data models and index them into Blur shards that are stored in HDFS. Once in HDFS, Blur tables are enabled, and Blur automatically distributes the index shards to the search cluster.
Through Blur configuration, you define where in the search cluster the Blur shard servers and controller servers live. Once configured, Blur figures out which shards are online and automatically distributes all of the index shards across the active shard servers. If a shard server dies, the other shard servers automatically compensate and identify where the additional load must be shifted. Having multiple controller servers both increases search throughput and provides redundancy at the controller layer.
Blur controllers coordinate all searches of the shard servers. They have a Thrift API that developers use to search and retrieve data from Blur.