|
General
Community
Development
Conferences
|
|
UIMA Sandbox
|
The UIMA sandbox is a workspace that is open to all UIMA committers and developers who would like to
contribute code and join the UIMA developer community. The sandbox is designed to host analysis components
and tooling around UIMA. All the components are free to use and licensed under the
Apache Software License.
A list of proposed analysis components and tooling for UIMA is available at the
UIMA wiki and can be discussed there.
You can access the UIMA sandbox in the SVN repository at
http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/.
The list below shows the currently available components of the UIMA sandbox.
|
|
|
UIMA sandbox components
|
|
Snowball Annotator
|
The Snowball annotator is an UIMA annotator component that wraps the Snowball stemming algorithm. The annotator
iterates over the available token annotations in the CAS and creates for each token a feature
containing the stem.
The stemming algorithm is avaialble for several languages. For details about Snowball please see
http://snowball.tartarus.org/.
The Java source of the annotator can be accessed in the SVN repository at
http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/SnowballAnnotator.
Note: the used implementation of the Snowball stemming algorithm is licensed under the BSD license.
|
|
|
CAS Editor
|
The Cas Editor is an annotation tool which supports manual and automatic
annotation of CAS files. The CAS Editor can visualize and edit
all feature structures, annotations can be viewed and edited directly
on text. Currently, only text based CASes are supported.
The Java source of the CAS Editor can be accessed in the SVN repository at
http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/CasEditor.
|
|
|
Tagger Annotator
|
The Tagger Annotator component implements a Hidden Markov Model (HMM) tagger. The tagger assumes that
sentences and tokens have already been annotated in the CAS with sentence and token annotations.
It iterates then in turn over sentences and tokens to accumulate a list of words, and then invokes the
tagger on this list. The HMM tagger employs the Viterbi algorithm to calculate the most probable tag sequence.
For each Token it updates the posTag field with the part of speech tag.
Model training is happening outside of UIMA, the tagger just receives statistical information from
a model file which is passed to the tagger along with some further parameters through a properties file.
The Java source of the annotator can be accessed in the SVN repository at
http://svn.apache.org/repos/asf/incubator/uima/sandbox/trunk/Tagger.
|
|
|
|
|