Apache UIMA UIMA Sandbox UIMA project logo

General

Community

Development

Conferences

UIMA Sandbox

The UIMA sandbox is a workspace that is open to all UIMA committers and developers who would like to contribute code and join the UIMA developer community. The sandbox is designed to host analysis components and tooling around UIMA. All the components are free to use and licensed under the Apache Software License.

A list of proposed analysis components and tooling for UIMA is available at the UIMA wiki and can be discussed there.

You can access the UIMA sandbox in the SVN repository at http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/.

The list below shows the currently available components of the UIMA sandbox.


UIMA sandbox components


Whitespace Tokenizer Annotator

The Whitespace tokenizer annotator component provides an UIMA annotator implementation that tokenizes text documents using a simple whitespace segmentation. During the tokenization, the annotator creates token and sentence annotations as result. The Java source of the annotator can be accessed in the SVN repository at http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/WhitespaceTokenizer.


Snowball Annotator

The Snowball annotator is an UIMA annotator component that wraps the Snowball stemming algorithm. The annotator iterates over the available token annotations in the CAS and creates for each token a feature containing the stem. The stemming algorithm is avaialble for several languages. For details about Snowball please see http://snowball.tartarus.org/. The Java source of the annotator can be accessed in the SVN repository at http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/SnowballAnnotator.

Note: the used implementation of the Snowball stemming algorithm is licensed under the BSD license.


Regular Expression Annotator

The Regular Expression Annotator (RegexAnnotator) is an Apache UIMA analysis engine that detects entities like email addresses, URLs, phone numbers, zip codes or any other entity based on regular expressions and concepts. For each entity that was detected an annotation can be created or an already existing annotation can be updated with feature values. The Java source of the annotator can be accessed in the SVN repository at http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/RegularExpressionAnnotator.


PEAR Packaging ANT Task

The PEAR packaging ANT task component is a project to create UIMA PEAR packages automatically during a component build using a custom Apache ANT task. With this task, users are able to build their components from the source and then package them automatically as UIMA PEAR package. The Java source of the PEAR packaging task can be accessed in the SVN repository at http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/PearPackagingAntTask.


CAS Editor

The Cas Editor is an annotation tool which supports manual and automatic annotation of CAS files. The CAS Editor can visualize and edit all feature structures, annotations can be viewed and edited directly on text. Currently, only text based CASes are supported. The Java source of the CAS Editor can be accessed in the SVN repository at http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/CasEditor.


PEAR Packaging Maven Plugin

The PEAR packaging Maven plugin component is a project to create UIMA PEAR packages automatically during a component build using a custom Maven plugin. With this plugin, users are able to build their components from the source and then package them automatically as UIMA PEAR package. The Java source of the PEAR packaging Maven plugin can be accessed in the SVN repository at http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/PearPackagingMavenPlugin.


Dictionary Annotator

The Dictionary Annotator is an Apache UIMA analysis engine that creates annotations based on word lists that are compiled to simple dictionaries. The output annotation type for the annotations that are created and the input annotation type where the dictionary lookup is executed on, can be specified individually. The Java source of the annotator can be accessed in the SVN repository at http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/DictionaryAnnotator.


Feature Structure Variables

The Feature Structure variables project allows you to create named feature structure instances. It further allows you to refer to individual feature structures or annotations across annotators, without creating a special index. The Java source of the project can be accessed in the SVN repository at http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/FsVariables.


Tagger Annotator

The Tagger Annotator component implements a Hidden Markov Model (HMM) tagger. The tagger assumes that sentences and tokens have already been annotated in the CAS with sentence and token annotations. It iterates then in turn over sentences and tokens to accumulate a list of words, and then invokes the tagger on this list. The HMM tagger employs the Viterbi algorithm to calculate the most probable tag sequence. For each Token it updates the posTag field with the part of speech tag. Model training is happening outside of UIMA, the tagger just receives statistical information from a model file which is passed to the tagger along with some further parameters through a properties file. The Java source of the annotator can be accessed in the SVN repository at http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/Tagger.


BSF Annotator

The Bean Scripting Framework (BSF) Annotator is an Apache UIMA analysis engine that provides a link between the UIMA framework and the scripting languages that are supported by Apache BSF (http://jakarta.apache.org/bsf). The current implementation comes with examples in Beanshell (http://www.beanshell.org) and Rhino Javascript (http://www.mozilla.org/rhino). Simple tests have also been conducted successfully with Jython (http://jython.sourceforge.net/Project/index.html) and JRuby (http://jruby.codehaus.org). The annotator takes as parameter the source file containing the script. The script is supposed to implement the initialize and process functions of the analysis engine. Using a scripting language can be very handy to do quick prototyping, pre/post processing, CAS cleaning tasks or typeystem conversion/adaptation. The Java source of the annotator can be accessed from the SVN repository at http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/BSFAnnotator.


Simple Server (UIMA REST Service)

The UIMA Simple Server makes results of UIMA processing available in a simple, XML-based format. The intended use of the the Simple Server is to provide UIMA analysis as a REST service. The Simple Server is implemented as a Java Servlet, and can be deployed into any Servlet container (such as Apache Tomcat or Jetty). Click here to access the user documentation of the Simple Server.

The Java source of the annotator can be accessed from the SVN repository at http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/SimpleServer .


OpenCalais Annotator

The OpenCalais Annotator component wraps the OpenCalais web service and makes the OpenCalais analysis results available in UIMA. OpenCalais can detect a large variety of entities, facts and events like for example Persons, Companies, Acquisitions, Mergers, etc. For details about the OpenCalais analytics and the license to use the service, please refer to the to the OpenCalais website. The Java source of the annotator can be accessed in the SVN repository at http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/OpenCalaisAnnotator.




Copyright © 2006-2008, The Apache Software Foundation