This project has retired. For details please refer to its Attic page.
Apache Stanbol - Stanbol Commons Solr

Stanbol Commons Solr

Solr is used by several Apache Stanbol components. The Apache Stanbol Solr Commons artifacts provide a set of utilities that ease the use of Solr within OSGi, allow the initialization and management of Solr indexes as well as the publishing of Solrs RESTful interface on the OSGi HttpService.

Although this utilities where implemented with the requirements of Apache Stanbol in mind they do not depend on other Stanbol components that are not themselves part of "stanbol.commons".

Solr OSGi Bundle

The "org.apache.commons.solr.core" bundle currently includes all dependencies required by Solr and also exports the client as well as the server API. For details please have a look at the pom file of the "solr.core" artifact.

Please note also the exclusion list, because some libraries currently not directly used by Stanbol are explicitly excluded. Using such features within a "solrConf.xml" or "schema.xml" will result in "ClassNotFoundException" and "ClassNotFoundErrors".

If you require an additional Library that is currently not included please give us a short notice on the stanbol-dev mailing list.

Solr Server Components

This section provides information how to managed and get access to the server side CoreContainer and SolrCore components of Solr.

Accessing CoreContainers and SolrCores

All CoreContainer and SolrCores initialized by the Stanbol Solr framework are registered with the OSGi Service Registry. This means that other Bundels can obtain them by using

CoreContainer defaultSolrServer;
ServiceReference ref = bundleContext.getServiceReference(
    CoreContainer.class.getName())
if (ref != null) {
    defaultSolrServer = (CoreContainer) bundleContext.getService(ref);
} else {
    defaultSolrServer = null; //no SolrServer available
}

It is also possible to track service registration and unregistration events by using the OSGi ServiceTracker utility.

The above Code snippet would always return the SolrServer with the highest priority (the highest value for the "service.ranking" property). However the OSGi Service Registry allows also to obtain/track service by the usage of filters. For specifying such filters it is important to know what metadata are provided when services are registered with the OSGi Service Registry.

Metadata for CoreContainer:

Metadata for SolrCores:

In addition the following metadata of the CoreContainer for this SolrCore are also available

The the mentioned keys used for metadata of registered CoreContainer and SolrCores are defined as public constants in the SolrConstants class.

ReferencedSolrServer

This component allows to initialize a Solr server running within the same JVM as Stanbol based on indexes provided by a directory on the local file system. This does not support management capabilities, but it initializes a Solr CoreContainer based on the data in the file system and registers it (including all SolrCores) with the OSGi Service Registry as described above.

The ReferencedSolrServer uses the ManagedServiceFactory pattern. This means that instances are created by parsing configurations to the OSGi ConfigurationAdmin service. Practically this means that:

Configurations need to include the following properties (see also section "Metadata for CoreContainer" for details about such properties)

NOTE: Keep in mind that of the RESTful API of the SolrServer is published users might use the Admin Request handler to manipulate the SolrConfiguration. In such cases the metadata provided by the ServiceReferences for the CoreContainer and SolrCores might get out of sync with the actual configuration of the Server.

ManagedSolrServer

This component allows to manage a multi core Solr server. It provides an API to create, update and remove SolrCores. In addition cores can be activated and deactivated.

Creating ManagedServerInstances

The ManagedSolrServer uses the ManagedServiceFactory pattern. This means that instances are created by parsing configurations to the OSGi ConfigurationAdmin service. Practically this means that:

Configurations need to include the following properties (see also section "Metadata for CoreContainer" for details about such properties). Although the properties are the same as for the ReferencedSolrServer their semantics differs in some aspects.

NOTE: Keep in mind that of the RESTful API of the SolrServer is published users might use the Admin Request handler to manipulate the SolrConfiguration. In such cases the metadata provided by the ServiceReferences for the CoreContainer and SolrCores might get out of sync with the actual configuration of the Server.

Managing Solr Indexes

This describes how to manage (create, update, remove, activate, deactivate) Indexes on a ManagedSolrServer.

Managed Indexes do not 1:1 correspond to SolrCores registered on the CoreContainer. However all SolrCores on the CoreContainer do have a 1:1 mapping with a managed index on the Managed SolrServer.

Managed Index can be in one of the following States (defined by the ManagedIndexState enumeration):

Indexes can not only be managed by calls to the API of the ManagedSolrServer. The "org.apache.stanbol.commons.solr.install" bundle provides also support for installing/uninstalling indexes by using the Apache Sling OSGi installer framework. This allows to install indexes by providing Solr-Index-Archives or Solr-Index-Archive-References to any available Provider. By default Apache Stanbol includes Provider for the Launchers and Bundles. However the Sling Installer Framework also includes Providers for Directories on the File and JCR Repositories.

Solr-Index-Archives do use the following name pattern:

{name}.solrindex[.zip|.gz|.bz2]

Solr-Index-Archive-References are normal Java properties files and do use the following name pattern:

{name}.solrindex.ref

The following keys are used (see also org.apache.stanbol.commons.solr.managed.ManagedIndexConstants):

Other interesting Notes

Solr Client Components

This sections describes how to use Solr servers and indexes referenced and managed by the "org.apache.stanbol.commons.solr" framework. Principally there are two possibilities: (1) to directly access Solr indexes via the SolrServer Java API and (2) to publish locally managed index on the OSGi HttpService and than use such indexes via the Solr RESTful API.

The Stanbol Solr framework does not provide utilities for accessing remote Solr servers, because this is already easily possible by using SolrJ.

Java API

This describes how to lookup and access a Solr Server initialized by the "org.apache.stanbol.commons.solr" framework. The client side Java API of Solr is defined by the SolrServer abstract class. The implementation used for accessing a SolrCore running in the same JVM is the EmbeddedSolrServer.

All Solr server (CoreContainer) and Solr indexes (SolrCore) initialized by the ReferencedSolrServer and/or ManagedSolrServer are registered with the OSGi service registry. More information about this can be found in the first part of the "Solr Server Components" of this documentation.

OSGi already provides APIs and utilities to lookup and track registered services. In the following I will provide some examples how to lookup SolrServers registered as OSGi services.

IndexReference

The IndexReference is a Java class that manages a reference to an Index. It defines a constructor that takes a serverName and coreName. In addition there is a static parse(String ref) method that takes

The IndexMetadata class also defines a getter to get the IndexReference.

One feature of the IndexReference is also that it provides getters of Filters as used to lookup/track the referenced CoreContainer/SolrCore in the OSGi service Registry. The returned filter include the constraint for the registered interface (OBJECTCLASS). Therefore when using this filters one can parse NULL for the class parameter

To lookup the CoreContainer of the referenced index:

bundleContext.getServiceReferences(null, indexReference.getServerFilter());

To lookup the SolrCore for the referenced index:

bundleContext.getServiceReferences(null, indexReference.getIndexFilter());

Lookup Solr Indexes

This example shows how to lookup the default CoreContainer and create a SolrServer for the core "mydata".

ComponentContext context; // typically passed to the activate method
BundleContext bc = context.getBundleContext();
ServiceReference coreContainerRef =
    bc.getServiceReference(CoreContainer.class.getName());
CoreContainer coreContainer = (CoreContainer) bc.getService(coreContainerRef)
SolrServer server = new EmbeddedSolrServer(coreContainer, "mydata");

Now there might be cases where several CoreContainers are available and "mydata" is not available on the default one. The "default" refers to the one with the highest "service.ranking" value. In this case we need to know a available property we can use to filter for the right CoreContainer. In this case we assume the index is on a CoreContainer registered with the name "myserver".

ComponentContext context; // typically passed to the activate method
BundleContext bc = context.getBundleContext();

// Now let's use the IndexReference to create the filter
IndexReference indexRef = new IndexReference("myserver", "mydata");
ServiceReference[] coreContainerRefs = bc.getServiceReferences(
    null, indexRef.getServerFilter());

// TODO: check that coreContainerRefs != null AND not empty!
// Now we have all References to CoreContainers with the name "myserver"
// Yes one can register several for the same name (e.g. to have fallbacks)
// let get the one with the highest service.ranking
Arrays.sort(coreContainerRefs, ServiceReferenceRankingComparator.INSTANCE);

// Create the SolrServer (same as above)
CoreContainer coreContainer = (CoreContainer) bc.getService(coreContainerRefs[0])
SolrServer server = new EmbeddedSolrServer(coreContainer, indexRef.getIndex());

In cases where one only knows the name of the SolrCore (and not the CoreContainer) the initialization looks like this.

ComponentContext context; // typically passed to the activate method
BundleContext bc = context.getBundleContext();
String nameFilter = String.format("(%s=%s)", SolrConstants.PROPERTY_CORE_NAME, "mydata");
ServiceReference[] solrCoreRefs = bc.getServiceReferences(
    SolrCore.class.getName(), nameFilter);

// TODO: check that != null AND not empty!
// Now we have all References to CoreContainer with a SolrCore "mydata"
// let get the one with the highest service.ranking
Arrays.sort(solrCoreRefs, ServiceReferenceRankingComparator.INSTANCE);

// Now get the SolrCore and create the SolrServer
SolrCore core = (SolrCore) bc.getService(solrCoreRefs[0]);

// core.getCoreDescriptor() might be null if SolrCore is not
// registered with a CoreContainer
SolrServer server = new EmbeddedSolrServer(
    core.getCoreDescriptor().getCoreContainer(), "mydata");

Tracking Solr Indexes

The above examples do a lookup at a single point in time. However because OSGi is an dynamic environment where services can come the go at every time in most cases users might rather want to track services. To do this OSGi provides the ServiceTracker utility.

To ease the tracking of SolrServers the "org.apache.stanbol.commons.solr.core" bundle provides the RegisteredSolrServerTracker. The following examples show how to create a Managed SolrIndex and than track the SolrServer.

First during the activation we need to check if "mydata" is already created and create it if not. Than we can start tracking the index:

BundleContext bc;
// The ManagedSolrServer instance can be looked up manually using a service
// reference or using declarative services / SCR injection
IndexMetadata metadata = managedServer.getIndexMetadata("mydata");
if (metadata == null) {
    // No index with that name:
    // Asynchronously init the index as soon as the solrindex archive is available
    metadata = managedServer.createSolrIndex("mydata", "mydata.solrindex.zip", null);
}
RegisteredSolrServerTracker indexTracker =
    new RegisteredSolrServerTracker(bc, metadata.getIndexReference());

// Do not forget to close the tracker while deactivating
indexTracker.open();

Now every time we need the SolrServer we can retrieve it from the indexTracker

private SolrServer getServer() {
    SolrServer server = indexTracker.getService();
    if(server == null) {
        // Report the missing server
        throw new IllegalStateException("Server 'mydata' not active");
    } else {
        return server;
    }
}

The RegisteredSolrServerTracker does take "service.ranking" into account. So if there are more Services available that match the passed IndexReference those methods will always return the one with the highest "service.ranking". In case arrays are returned such arrays are sorted accordingly.

RESTful API

The following describes how to publish the RESTful API of CoreContainer registered as OSGi services on the OSGi HttpService. The functionality described in this section is provided by the "org.apache.stanbol.commons.solr.web" artifact.

SolrServerPublishingComponent

This is an OSGi component that starts immediate and does not require a configuration. Its main purpose is to track all CoreContainers with the property "org.apache.solr.core.CoreContainer.publishREST=true". For all such CoreContainers it publishes the RESTful API under the URL

http://{host}:{port}/solr/{server-name}

If two CoreContainers with the same {server-name} (the value of the "org.apache.solr.core.CoreContainer.name" property) are registered the one with the highest "service.ranking" is published.

The root-prefix ("/solr" by default) can be configured by setting the "org.apache.stanbol.commons.solr.web.dispatchfilter.prefix" property.

SolrDispatchFilterComponent

This Component provides the same functionality as the SolrServerPublishingComponent, but can be configured specifically for a CoreContainer. It is intended to be used if one wants to publish the RESTful API of a specific CoreContainer under a specific location. To deactivate the publishing of the same core on the SolrServerPublishingComponent users need to set the "org.apache.solr.core.CoreContainer.publishREST" to false.

This component is configured by two properties

If two CoreContainers with the same {server-name} (the value of the "org.apache.solr.core.CoreContainer.name" property) are registered the one with the highest "service.ranking" is published.