This project has retired. For details please refer to its Attic page.
Apache Stanbol - Stanbol Enhancer RESTful Services

Stanbol Enhancer RESTful Services

The RESTful service endpoint provided by the Stanbol Enhancer is a stateless interface that allows the caller to submit content and get the resulting enhancements formatted as RDF at once without storing anything on the server-side. More advanced options also allow to parse pre-existing metadata, parse and request alternate content versions and additional metadata created by the Enhancer or specific Enhancement Engines.

The RESTful interface described below is provided on several endpoints

Basic Enhancement Service

This sections describes how to parse content to the Stanbol Enhancer which then gets analyzed. Results are sent back in the form of a serialized RDF graph.

The content to analyze should be sent in a POST request with the mime-type specified in the Content-type header. The response will hold the RDF enhancement serialized in the format specified in the Accept header:

curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \
    --data "The Stanbol enhancer can detect famous cities such as Paris \
            and people such as Bob Marley." \
    http://localhost:8080/enhancer

The list of mime-types accepted as inputs depends on the deployed engines. By default most Enhancement Engines can only process plain text content. However EnhancementEngines like Metaxa can be used to create 'text/plain' versions of parsed content. This allows also to enhance contents with mime-types such as html, pdf and MS office documents (see the Metaxa documentation for details)

Stanbol Enhancer is able to serialize the response in the following RDF formats:

application/json (JSON-LD)
application/rdf+xml (RDF/XML)
application/rdf+json (RDF/JSON)
text/turtle (Turtle)
text/rdf+nt (N-TRIPLES)

Additional Parameters

The following example shows how to send an enhancement request with a custom content item URI that will include the execution metadata in the response. In addition this request is directed to a Enhancement Chain with the name "dbpedia-keyword"

curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \
    --data "The Stanbol enhancer can detect famous cities such as Paris \
            and people such as Bob Marley." \
    "http://localhost:8080/enhancer/chain/dbpedia-keyword?uri=urn:fise-example-content-item&executionmetadata=true"

Request Properties Support

since 0.12.1

Request Properties allow to parse request specific Enhancement Properties as additional query parameters of enhancement requests.

The following shows an curl request that parses two enhancement properties:

curl -X POST -H "Accept: text/turtle" -H "Content-type: text/plain" \
    --data "The Eifeltower is located in Paris." 
    http://localhost:8080/enhancer?enhancer.max-suggestions=5&\
    dbpedia-fst:enhancer.min-confidence=0.85&\
    dbpedia-dereference:enhancer.engines.dereference.languages=de&\
    dbpedia-dereference:enhancer.engines.dereference.languages=es

The above request parses two request and engine scoped Enhancement Properties. First the minimum confidence value for suggested entities is set for the dbpedia-fst engine to 0.85 and second the dbpedia-dereference engine is configured to dereference German and English labels in addition to labels in the language detected for the parsed text. Finally the maximum number of suggestions is aset as a request scoped property to 5. This means that this property will get parsed to all engines executed in the context of the request.

Request properties use the following encoding:

Enhancer Configuration

The Stanbol Enhancer supports several RESTful services to inspect the configuration. This services allow to retrieve currently active Enhancement Engines and Enhancement Chains.

Example Response as 'application/rdf' serialization of the default configuration of the Stanbol Enhancer.

The request

curl -v -X GET -H "Accept: application/rdf+xml" "http://localhost:8080/enhancer/ep"

returns the following results

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:j.0="http://stanbol.apache.org/ontology/enhancer/enhancer#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" > 
  <rdf:Description rdf:about="http://localhost:8080/enhancer/engine/langid">
    <rdfs:label>langid</rdfs:label>
    <rdf:type rdf:resource="http://stanbol.apache.org/ontology/enhancer/enhancer#EnhancementEngine"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://localhost:8080/enhancer">
    <rdf:type rdf:resource="http://stanbol.apache.org/ontology/enhancer/enhancer#Enhancer"/>
    <j.0:hasEngine rdf:resource="http://localhost:8080/enhancer/engine/dbpediaLinking"/>
    <j.0:hasEngine rdf:resource="http://localhost:8080/enhancer/engine/langid"/>
    <j.0:hasEngine rdf:resource="http://localhost:8080/enhancer/engine/entityhubLinking"/>
    <j.0:hasEngine rdf:resource="http://localhost:8080/enhancer/engine/tika"/>
    <j.0:hasEngine rdf:resource="http://localhost:8080/enhancer/engine/metaxa"/>
    <j.0:hasEngine rdf:resource="http://localhost:8080/enhancer/engine/ner"/>
    <j.0:hasChain rdf:resource="http://localhost:8080/enhancer/chain/default"/>
    <j.0:hasDefaultChain rdf:resource="http://localhost:8080/enhancer/chain/default"/>
    <j.0:hasChain rdf:resource="http://localhost:8080/enhancer/chain/language"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://localhost:8080/enhancer/chain/language">
    <rdfs:label>language</rdfs:label>
    <rdf:type rdf:resource="http://stanbol.apache.org/ontology/enhancer/enhancer#EnhancementChain"/>
  </rdf:Description>
  <rdf:Description rdf:about="http://localhost:8080/enhancer/engine/ner">
    <rdf:type rdf:resource="http://stanbol.apache.org/ontology/enhancer/enhancer#EnhancementEngine"/>
    <rdfs:label>ner</rdfs:label>
  </rdf:Description>
[...]
</rdf:RDF>

Executionplan of Enhancement Chains

The ExecutionPlan can be also requested by sending a GET request with an supported RDF serialization as 'Accept' header to

Multi-part ContentItem support

The multipart ContentItem extensions to the basic RESTful services are provided by the Stanbol Enhancer. It was introduced (by STANBOL-481) to allow advanced usage scenarios. Users will want to use this extensions if they need to:

QueryParameters

The following QueryParameters are defined by the multi-part content item extension:

Parsing multiple ContentParts

Requests to the Stanbol Enhancer with the Content-Type: multipart/form-data are considered to contain a ContentItem serialized as MultiPart MIME. The exact specification of the MultiPart MIME format for ContentItems is provided by the documentation of the ContentItem.

The combination of multipart/form-data encoded requests with QueryParameters as described above allow for the usage of MultiPart MIME format for ContentItems for both request and response.

Using the multi-part content item RESTful API extensions

The following examples show typical usage scenarios of the multi-part content item RESTful API. Note that for better readability the values of the query parameters are not URL-encoded.

Example 1: Return metadata and content

The first example shows how users can request both the metadata and transcoded versions of the parsed content. This can be achieved relatively easy by using the "outputContent=/" in combination with "omitParsed=true".

curl -v -X POST -H "Accept: multipart/form-data" \
    -H "Content-type: text/html; charset=UTF-8"  \
    --data "<html><body><p>The Stanbol enhancer can detect famous cities \
            such as Paris and people such as Bob Marley.</p></body></html>" \
    "${it.serviceUrl}?outputContent=*/*&omitParsed=true&rdfFormat=application/rdf+xml"

This will result in a response with the mime-type "Content-Type: multipart/form-data; charset=UTF-8; boundary=contentItem" and the metadata as well as the plain text version of the parsed HTML document as content.

--contentItem
Content-Disposition: form-data; name="metadata"; filename="urn:content-item-sha1-76e44d4b51c626bbed38ce88370be88702de9341"
Content-Type: application/rdf+xml; charset=UTF-8;
Content-Transfer-Encoding: 8bit

<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
[..the metadata formatted as RDF+XML..]
</rdf:RDF>

--contentItem
Content-Disposition: form-data; name="content"
Content-Type: multipart/alternate; boundary=contentParts; charset=UTF-8
Content-Transfer-Encoding: 8bit

--contentParts
Content-Disposition: form-data; name="urn:metaxa:plain-text:2daba9dc-21f6-7ea1-70dd-a2b0d5c6cd08"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The Stanbol enhancer can detect famous cities such as Paris and people such as Bob Marley.
--contentParts--

--contentItem--

Se also the formal specification of the MultiPart MIME format for ContentItems for ContentItems.

Example 2: Directly return the plain text version of parsed content

The using the 'omitMetadata=true' together with the "Accept: {requested-content-type}" the multi-part content API allows to directly request the transcoded version of the content with the format {requested-content-type}.

curl -v -X POST -H "Accept: text/plain" \
    -H "Content-type: text/html; charset=UTF-8" \
    --data "<html><body><p>The Stanbol enhancer can detect famous cities \
            such as Paris and people such as Bob Marley.</p></body></html>" \
    "${it.serviceUrl}?omitMetadata=true"

The response will use Content-Type: text/plain and contain the string

The Stanbol enhancer can detect famous cities such as Paris and people such as Bob Marley.

To make this work the requested Enhancement Chain will need to include an engine (e.g. Metaxa) that supports transcoding the parsed content. If no content with the request type is available the request will answer with a "404 NOT FOUND".

Note also that because the metadata are omitted by responses to such requests it is also recommended to configure/use a chain that does no further processing on the transcoded content.

Example 3: Parse multiple content versions

This example will use the "httpmime" part of the Apache commons httpcomponents to create the Multipart MIME sent to the Stanbol enhancer.

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpmime</artifactId>
    <version>4.1.2</version>
</dependency>

The created Multipart MIME content MUST follow the specifications as defined by the MultiPart MIME format for ContentItems.

InputStream wordIn; //The MS Word version of the Content
InputStream plainIn; //The plain text version of the Content
HttpClient httpClient; //The client used to execute the request

//create the multipart/form-data container for the ContentItem
//MultipartEntity also implements HttpEntity
MultipartEntity contentItem = new MultipartEntity(null, null ,UTF8);
//The multipart/alternate container for the contents
HttpMultipart content = new HttpMultipart("alternate", UTF8 ,"contentParts");

//now add the container for the content to the content item container
contentItem.addPart(
    "content", //the name MUST BE "content"!
    new MultipartContentBody(content));

//now add the MS word content at the first location
//this will make it the "original" content
content.addBodyPart(new FormBodyPart(
    "http://www.example.com/example.docx", //the id of the content part
    new InputStreamBody(
        wordIn, 
        "application/vnd.openxmlformats-officedocument.wordprocessingml.document", 
        "example.docx")));

//now add the alternate plain text version
content.addBodyPart(new FormBodyPart(
    "http://www.example.com/example.docx", //the id of the content part
    new StringBody( //use a StringBody to avoid binary encoding for text
        IOUtils.toString(plainIn), //apache commons IO utility
        "text/plain",
        Charset.forName("UTF-8"))));

//now we are ready to create and execute the POST request to the
//Stanbol Enhancer
HttpPost request = new HttpPost("http://localhost:8080/enhancer");
request.setEntity(contentItem);
request.setHeader("Accept","application/rdf+xml");
Response response = httpClient.execute(request);

Note that for such requests Metaxa will still try to extract metadata of the parsed MS Word document, but all other engines will use the plain text version as parsed by the request for processing.

Example 4: Parse existing free text annotations

This example shows how the multi-part content item API can be used to parse already existing tags for an parsed content to the Stanbol Enhancer. For this example it is important to understand that parsed metadata need to confirm to the Stanbol Enhancement Structure. Because of that this example consist of two main steps:

  1. Convert user tags to TextAnnotations
  2. Send existing Metadata along with the Content to the Stanbol Enhancer

Also note that the code snippets will use utilities provided by the "org.apache.stannbol.enhancer.servicesapi" module. As RDF framework Apache Clerezza is used. Both dependencies are easily replaceable.

First lets have a look at the required information

MGraph graph; //the RDF graph to store the metadata
UriRef ciUri; //the URI for the contentItem
String tag; // user provided tag
UriRef tagType; //the type of the Tag

Regarding the tag type: Stanbol natively supports the following types

The processing of parsed tags that use other or no type depends on the used enhancement engines and their configurations. Especially the configuration of the Named Entity Tagging Engine is important in that respect.

Resource user; //the user that has created the tag (optional)
//in case of an name just use a literal
user = new PlainLiteral("Rudolf Huber");
//in case users have assigned URIs
user = new UriRef("http://my.cms.org/users/rudof.huber");

Now we can convert the User Tags to TextAnnotations

//first create a URI for the text annotation. Here we use a random URN
//If you can create a meaningful URI this would be better!
UriRef ta = new UriRef("urn:user-annotation:"+EnhancementEngineHelper.randomUUID());
//The the 'rdf:type's
graph.add(new TripleImpl(ta, RDF.type, TechnicalClasses.ENHANCER_TEXTANNOTATION));
graph.add(new TripleImpl(ta, RDF.type, TechnicalClasses.ENHANCER_ENHANCEMENT));

//this TextAnnotation is about the ContentItem
graph.add(new TripleImpl(ta, Properties.ENHANCER_EXTRACTED_FROM, ciUri));
//if the Tag uses a type add it
if(tagType != null){
    graph.add(new TripleImpl(ta, Properties.DC_TYPE, tagType));
}
//add the value of the tag
graph.add(new TripleImpl(ta, Properties.ENHANCER_SELECTED_TEXT, new PlainLiteralImpl(tag)));
//add the user
if(user != null){
    graph.add(new TripleImpl(ta, Properties.DC_CREATOR,user));
}

Now the 'graph' contains a valid TextAnnotation for the given user tag. This should be done for all tags of the current content.

In the next step we need to serialize the RDF data. Again we will use here Clerezza as API, but any RDF framework will provide similar functionality

ByteArrayOutputStream out = new ByteArrayOutputStream();
//this tells the Serializer to create "application/rdf+xml"
serializer.serialize(out, metadata, SupportedFormat.RDF_XML);
String rdfContent = new String(out.toByteArray(),UTF8);

Now we need to create the MultiPart MIME content item containing the metadata and the content

String content; //the content we want to send to the Stanbol Enhancer

//the container for the ContentITem
MultipartEntity contentItem = new MultipartEntity(null, null ,UTF8);

//The Metadata MUST BE the first element
contentItem.addPart(
    "metadata", //the name MUST BE "metadata" 
    new StringBody(rdfContent,SupportedFormat.RDF_XML,UTF8){
        @Override
        public String getFilename() { //The filename MUST BE the
            return ciUri.getUnicodeString(); //uri of the ContentItem
        }
    });

Note that because the StringBody class provided my the "httpmime" framework does not set a filename we need to override this method and return the URI of the content item. This is essential, because we need ensure that the URI of the ContentItem is the same as the URI (variable 'ciUri') as used when creating the TextAnnotations for the user tags.

For the following code snippet note that we can directly add the content to the content item container. Only if we would need to sent multiple alternate content versions (as shown in 'Example 3') the usage of an 'multipart/alternate' container is required.

//Add the content as second mime part
contentItem.addPart(
    "content", //the name MUST BE "content"
    new StringBody(content,"text/plain",UTF8));

//now we are ready to create and execute the POST request to the
//Stanbol Enhancer
HttpPost request = new HttpPost("http://localhost:8080/enhancer");
request.setEntity(contentItem);
request.setHeader("Accept", SupportedFormat.RDF_XML);
Response response = httpClient.execute(request);

The response of the Enhancer will now contain entity suggestions for the free text user tags.