This page tracks the project status, incubator-wise.
The Tika project graduated on 2008-10-28
This page tracks the project status, incubator-wise.
The Tika project graduated on 2008-10-28
Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.
item | type | reference |
---|---|---|
Website | www | http://incubator.apache.org/tika/ |
Mailing list | dev | tika-dev @ incubator.apache.org |
commits | tika-commits @ incubator.apache.org |
|
Moderators | jukka | Jukka Zitting |
Bug tracking | jira | https://issues.apache.org/jira/browse/TIKA |
Source code | svn | https://svn.apache.org/repos/asf/incubator/tika/ |
Sponsor | Apache Lucene PMC | |
Mentors | cutting | Doug Cutting |
bdelacretaz | Bertrand Delacretaz | |
jukka | Jukka Zitting | |
Committers | ridabenjelloun | Rida Benjelloun |
mharwood | Mark Harwood | |
mattmann | Chris A. Mattmann | |
siren | Sami Siren | |
jukka | Jukka Zitting | |
kbennett | Keith Bennett | |
niallp | Niall Pemberton | |
dmeikle | Dave Meikle |
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika entered incubation on March 22nd, 2007.
Community
Development
Issues before graduation
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika entered incubation on March 22nd, 2007.
Community
Development
Issues before graduation
Apache Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika entered incubation on March 22nd, 2007.
Community
Development
Issues before graduation
Tika (http://incubator.apache.org/tika) is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser Libraries. Tika entered incubation on March 22nd, 2007.
Community
Development
Issues before graduation
Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika entered incubation on March 22nd, 2007.
Community
There have been a number of positive items within Tika during the last few months. The traffic on the Tika mailing list has increased significantly (with typically 2, 3 questions, and 1 or 2 commits every day, or every other day), and there have been a lot of recent inquiries from external projects wanting to collaborate with Tika (including Aperture, PDFBox and a fellow developing a JSon library currently hosted at Google code). In addition, Tika's architecture has become a recent discussion of interest (as we'll see below).
We recently elected Keith Bennett as a new committer to Tika. Keith has been spearheading many of the new patches committed to Tika, as well as participating in discussions about the architecture, and future direction of the project.
Tika will be represented at the "Fast Feather" track at ApacheCon US by Jukka Zitting. The rest of the community is helping to create the content for the presentation. The abstract is listed below:
Tika is a new content analysis framework borne from the desire to factor our commonality from the Apache Nutch search engine framework. Tika provides a mime detection framework, an extensible parsing framework and metadata environment for content analysis. Though in its nascent stages, progress on Tika has recently taken shape and the project is nearing a stable 0.1 release. In this talk, we'll describe the core APIs of Tika and discuss its use in several distinct domains including search engines, scientific data dissemination and an industrial setting.
Development
There have been a flurry of JIRA issues and code activity (http://issues.apache.org/jira/browse/TIKA) including 47 issues currently in JIRA, with 32 resolved issues, 14 closed issues, and 2 open major/minor issues in progress).
Tika's Parser interface (one of its key components) has just undergone a major overhaul led by Jukka Zitting, and Chris Mattmann has recently contributed a MimeType system (with help from fellow Apache Nutch committer Jerome Charron) to Tika. We also cleaned up and refactored large parts of the rest of the code (removing references to LiusLite and branding the project wherever possible with the Tika name), in preparation for an upcoming 0.1 release.
Chris Mattmann has led an effort to carve out the existing MimeType detection system in Apache Nutch (http://lucene.apache.org/nutch/) and replace it with Tika's improved MimeType detection system. There is a patch sitting in JIRA right now (http://issues.apache.org/jira/browse/NUTCH-562), and barring objections, Nutch will rely on Tika for its MimeType detection abilities.
Also active recently were committers Bertrand Delacretaz, Sami Siren and Rida Benjelloun, committing patches and improvements wherever needed.
Issues before graduation
No changes since our last report: the Tika project is still at an early stage of incubation. We need to continue bringing in the initial codebases and are targeting an initial incubating release (0.1) probably within the next month. We also need to work on growing the community and figuring out how to best interact with external parser projects.
Tika is a toolkit for detecting and extracting metadata and structured text content from various document formats using existing parser libraries. Tika entered incubation on March 22nd, 2007.
Community
Development
Issues before graduation
Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika entered incubation on March 22nd, 2007.
Community
The Tika mailing lists have been relatively quiet lately, probably because with little code we don't yet have many concrete issues to talk about.
Development
We saw the first piece of Tika code when Chris A. Mattmann ported the Nutch metadata framework to Tika. Rida Benjelloun has created a version of the Lius codebase to be included in Tika, and the code is currently in the issue tracker.
Issues before graduation
The Tika project is still at an early stage of incubation. We need to continue bringing in the initial codebases and probably target for an initial incubating release later this year. We also need to work on growing the community and figuring out how to best interact with external parser projects.
Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.
Incubating since: March 22nd, 2007.
Community
We had a good project bootstrap meeting as a part of the text analysis BOF at the ApacheCon EU in Amsterdam. The resulting ideas were summarized on the project mailing list, and the first design threads have started.
Development
We've started discussing the design of the Tika toolkit. It seems like we will select one of the existing codebases listed in the project proposal as the basis of an early 0.1 release, and start refactoring the code into a more generic toolkit. The Tika svn tree is still empty, but I expect us to see the first code commits before the next report.
Infrastructure
All the initial infrastructure is now in place. There is still some activity on the temporary Tika wiki on the Google Project hosting service, so we may end up requesting a Tika wiki to be set up on the ASF infrastructure.
Issues before graduation
The Tika project is still at an early stage of incubation. The most important tasks before graduation are to develop and release the Tika codebase and to grow a diverse and sustainable project community.
Tika is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries. Tika entered incubation on March 22nd, 2007.
The Tika project has just started. The basic infrastructure (mailing lists, subversion, issue tracker, web site) is mostly in place; the only thing still missing is one committer account. We expect to get started with the actual design and code work during the next few weeks.
This is the first phase on incubation, needed to start the project at Apache.
Item assignment is shown by the Apache id. Completed tasks are shown by the completion date (YYYY-MM-dd).
date | item |
---|---|
2007-03-07 | Make sure that the requested project name does not already exist and check www.nameprotect.com to be sure that the name is not already trademarked for an existing software product. |
NA | If request from an existing Apache project to adopt an external package, then ask the Apache project for the SVN module and mail address names. |
NA | If request from outside Apache to enter an existing Apache project, then post a message to that project for them to decide on acceptance. |
NA | If request from anywhere to become a stand-alone PMC, then assess the fit with the ASF, and create the lists and modules under the incubator address/module names if accepted. |
date | item |
---|---|
2007-03-08 | Identify all the Mentors for the incubation, by asking all that can be Mentors. |
2007-03-08 | Subscribe all Mentors on the pmc and general lists. |
2007-03-08 | Give all Mentors access to the incubator SVN repository. (to be done by PMC chair) |
2007-03-31 | Tell Mentors to track progress in the file 'incubator/projects/tika.html' |
date | item |
---|---|
2008-10-17 | Check and make sure that the papers that transfer rights to the ASF been received. It is only necessary to transfer rights for the package, the core code, and any new code produced by the project. |
2008-10-17 | Check and make sure that the files that have been donated have been updated to reflect the new ASF copyright. |
date | item |
---|---|
2008-10-17 | Check and make sure that for all code included with the distribution that is not under the Apache license, have the right to combine with Apache-licensed code and redistribute. |
2008-10-17 | Check and make sure that all source code distributed by the project is covered by one or more of the following approved licenses: Apache, BSD, Artistic, MIT/X, MIT/W3C, MPL 1.1, or something with essentially the same terms. |
date | item |
---|---|
2007-03-07 | Check that all active committers have submitted a contributors agreement. |
2007-03-31 | Add all active committers in the STATUS file. |
2008-10-17 | Ask root for the creation of committers' accounts on people.apache.org. |
date | item |
---|---|
2007-03-31 | Ask infrastructure to create source repository modules and grant the committers karma. |
2007-03-31 | Ask infrastructure to set up and archive Mailing lists. |
2007-03-31 | Decide about and then ask infrastructure to setup an issuetracking system (Bugzilla, Scarab, Jira). |
2007-05-09 | Migrate the project to our infrastructure. |
See the issue tracker.
These action items have to be checked for during the whole incubation process.
These items are not to be signed as done during incubation, as they may change during incubation. They are to be looked into and described in the status reports and completed in the request for incubation signoff.
Add project specific tasks here.
Things to check for before voting the project out.