UIMA Components

Here are some links to Apache UIMA™ components available from other places. If you find other links that you think should be listed here, please drop us a line on our dev or user mailing lists. Similarly, if any of the content is out of date or inappropriate, please let us know.

Apache cTAKES

Apache OpenNLP

Behemoth

BioNLP Wrappers

ClearTK

DKPro Core

DKPro Text Classification

JCoRe NLP Toolsuite

National Centre for Text Mining (NaCTeM)

u-compare

uimaFIT (legacy)

UIMA French portal

Apache cTAKES

The cTAKES project (clinical Text Analysis and Knowledge Extraction System) is an open-source natural language processing system for information extraction from electronic medical record clinical free-text. cTAKES uses the UIMA Unstructured Information Management Architecture framework and the OpenNLP natural language processing toolkit. Its components are specifically trained for the (English language) clinical domain; it creates rich linguistic and semantic annotations that can be used by clinical decision support systems and clinical research.

Apache OpenNLP

The OpenNLP Project provides the official UIMA integration for the OpenNLP Sentence Detector, Tokenizer, POS Tagger, Name Finder, Document Categorizer, Chunker and Parser. The opennlp.uima distribution includes a sample PEAR which can easily be tested with the Cas Visual Debugger.

Behemoth

Behemoth is an Apache Licensed open source platform for large scale document processing which allows deploying UIMA applications within Hadoop.

BioNLP Wrappers

The Center for Computational Pharmacology at the University of Colorodo has wrapped a number of popular bio-informatic annotators as UIMA components, and added some supporting additional code and components. More information is available at https://bionlp-uima.sourceforge.net.

ClearTK

ClearTK is a framework for developing machine learning and natural language processing components within the Apache UIMA.

DKPro Core

DKPro Core is a collection of software components for natural language processing (NLP) based on the Apache UIMA framework. The provided components wrap a constantly growing set of stand-of-the-art NLP tools and also include several original components covering a wide range of tasks including: tokenization/segmentation, compound splitting, stemming, part-of-speech tagging, lemmatization, constituency parsing, dependency parsing, named entity recognition, coreference resolution, language identification, spelling correction, grammar checking, and support for reading and writing various file and corpus formats.

DKPro Core relies heavily on uimaFIT. DKPro Core is meant to be used with Apache Maven. The main components are hosted on Maven Central while distributable models are available from the public Maven repository at Ubiquitous Knowledge Processing (UKP) Lab, TU Darmstadt.

DKPro Text Classification

DKPro TC is a UIMA-based text classification framework built on top of DKPro Core, DKPro Lab and the Weka Machine Learning Toolkit. It is intended to alleviate supervised machine learning experiments with any kind of textual data.

JCoRe NLP Toolsuite

The JCoRe NLP Toolsuite consists of a collection of NLP components, some of which are provided as freely available UIMA components. It includes also a type system you can use as basis for your own NLP applications.

National Centre for Text Mining (NaCTeM)

The National Centre for Text Mining (NaCTeM) at the University of Manchester hosts a repository.

u-compare

https://nactem.ac.uk/ucompare provides a web-based integrated platform for the purpose of sharing and comparing UIMA components and tools, including visualizers and utilities, and includes a large set of type-system-compatible UIMA components. It is being maintained by the National Centre for Text Mining (NaCTeM) at the University of Manchester.

An older version of this site, available at https://u-compare.org, was run mainly by the Tsujii Laboratory at the University of Tokyo. Because the Tsujii lab was closed in 2011, that site is no longer being updated.

uimaFIT (legacy)

uimaFIT (legacy) is a library that provides factories, injection, and testing utilities for UIMA. It provides a convenient Java API for using UIMA in Java projects without having to deal with XML descriptors.

Apache uimaFIT™ supersedes this project.

UIMA French portal

uima-fr.org is a French-speaking web portal about UIMA. It aims at developing a UIMA French-speaking community by providing services such as a discussion list, a feed aggregator and a repository of UIMA NLP resources (some of them being dedicated for processing French). It is supported by the LINA Lab. at the University of Nantes.