[prev in list] [next in list] [prev in thread] [next in thread]
List: incubator-cvs
Subject: =?utf-8?q?=5BIncubator_Wiki=5D_Update_of_=22cTAKESProposal=22_by_PeiChen?=
From: Apache Wiki <wikidiffs () apache ! org>
Date: 2012-05-30 20:20:11
Message-ID: 20120530202011.79536.37613 () eos ! apache ! org
[Download RAW message or body]
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Incubator Wiki" for change \
notification.
The "cTAKESProposal" page has been changed by PeiChen:
http://wiki.apache.org/incubator/cTAKESProposal
New page:
= cTAKES Proposal =
The following is a proposal for a new top-level project within the ASF.
== Abstract ==
cTAKES: clinical Text Analysis and Knowledge Extraction System is an natural language \
processing tool for information extraction from electronic medical record clinical \
free-text.
== Proposal ==
cTAKES (clinical Text Analysis and Knowledge Extraction System)
== Background ==
cTAKES comprises a collection of components and tooling written in Java specifically \
trained for the clinical domain, and creates rich linguistic and semantic annotations \
that can be utilized by clinical decision support systems & clinical research. The \
development of cTAKES started in 2006 by a team of physicians, computer scientists \
and software engineers at the Mayo Clinic. The development team was led by Dr. \
Guergana Savova & Dr. Christopher Chute. cTAKES is released open source under an \
Apache v2.0 license. This system was deployed at Mayo and is currently an integral \
part of their clinical data management infrastructure and has processed in excess of \
80 million clinical notes. Currently, the core development team is co-located at Mayo \
Clinic and Children's Hospital Boston following Dr. Savova's move to Children's \
Hospital Boston in early 2010. Additional collaborations with external groups at \
University of Colorado, Brandeis University, University of Pittsburgh, University of \
California at San Diego continue to extend the capabilities of cTAKES into areas such \
Temporal Reasoning, Clinical Question and Answering, and coreference resolution for \
the clinical domain. In 2010, cTAKES was adopted by the I2B2 program and is a \
central component of the SHARP Area 4. The current cTAKES components \
include:
* Sentence boundary detector
* Rule-based tokenizer to separate punctuations from words
* Normalizer
* Context dependent tokenizer
* Part-of-speech tagger
* Phrasal chunker
* Dictionary lookup annotator and normalization to an ontology
* Context annotator
* Negation detector
* Dependency parser
* Constituency parser
* Semantic Role Labeler
* Coreference resolver
* Module for the identification of patient smoking status
* Drug mention annotator
== Rationale ==
We believe there is a clear gap between cutting edge technologies developed out of \
research labs and in the clinical practice. We believe that moving cTAKES \
development to the Apache development community will lead to faster innovation, \
better integration with other open source software, and broader adoption of cTAKES \
within clinical institutions and improve our healthcare system. We believe that \
having cTAKES on Apache will encourage the development of a basic set of open source \
components that will jumpstart these developers' efforts.
== Initial Goals ==
The initial goals of the proposed project are:
* Bring the community together at the ASF and make the development process \
transparent for them
* Write user documentation about all major components
* Automated build/continuous integration
* Automate regression tests
* Produce an Incubating release
== Current Status ==
=== Meritocracy ===
Some of the initial committers are familiar with Apache's idea of meritocracy, others \
aren't. We will get everybody on the same level as part of the incubation process. \
=== Community === cTAKES already has a considerable user base, both in industry and \
academia. === Core Developers ===
See the initial committer list.
=== Alignment ===
cTAKES has tie-ins with several existing Apache projects. We have been building our \
components using the UIMA framework. We are also reusing existing Apache projects \
such as Lucene, Solr, Maven. We expect these collaborations to strengthen further \
after our move to Apache and experiment with other projects under the Lucene umbrella \
such as Hadoop and Mahout. Another obvious connection exists to some of the projects \
under the OpenNLP umbrella.
== Known Risks ==
=== Orphaned products ===
The project has been around for quite a number of years already, it has a \
well-established user community and a diverse set of committers. === Inexperience \
with Open Source === cTAKES has been an open source project for many years. Many of \
the developers are already familiar with both open source in general and the ASF in \
particular. === Homogenous Developers ===
The current group of developers is very diverse and spans globally and across \
multiple institutions. === Reliance on Salaried Developers ===
Most of the developers are not paid to work specifically on cTAKES, so there is \
little reliance on salaried developers.
=== Relationships with Other Apache Products ===
NLP is often used in search and other algorithms that work with unstructured data, \
thus cTAKES is likely to be useful to the Lucene and Solr communities. It also aligns \
nicely with both Mahout and UIMA as well as OpenNLP. === A Excessive Fascination with \
the Apache Brand === We think the project aligns nicely with the goals of the ASF to \
disseminate source code to the public free of charge. Clinical NLP has long been the \
subject of cutting edge research, but is often lacking in community and shared \
knowledge. We believe that by bringing cTAKES to the ASF, the Apache brand will help \
deliver clinical NLP capabilities to a much larger audience and likewise a cutting \
edge project like cTAKES can further the ASF brand by providing users with tried and \
true, as well as new, natural language processing capabilities. == Documentation ==
* https://wiki.nci.nih.gov/display/VKC/cTAKES+2.0
* http://en.wikipedia.org/wiki/CTAKES
== Initial Source ==
The source code is maintained in SVN on SourceForge:
cTAKES: http://sourceforge.net/projects/ohnlp/
== Source and Intellectual Property Submission Plan ==
The cTAKES source code is already open source under the AL 2.0.
== External Dependencies ==
> > '''Library''' ||||<style="text-align: center;">'''License''' \
> > ||||<style="text-align: center;">'''Description''' ||
> > JWNL ||||<style="text-align: center;">BSD ||||<style="text-align: center;">Java \
> > Wordnet Library ||
> > JUnit ||||<style="text-align: center;">CPL ||||<style="text-align: center;">Unit \
> > Testing Framework ||
> > UIMA ||||<style="text-align: center;">AL 2.0 ||||<style="text-align: \
> > center;">Unstructured Information Management Architecture ||
== Cryptography ==
cTAKES neither provides nor uses any cryptography.
== Required Resources ==
=== Mailing lists ===
* ctakes-dev
* ctakes-private
* ctakes-user
* ctakes-commits
=== Subversion Directory ===
https://svn.apache.org/repos/asf/incubator/ctakes
=== Issue Tracking ===
Jira: CTAKES
=== Other Resources ===
== Initial Committers ==
> > '''Name''' ||||<style="text-align: center;">'''Email''' ||||<style="text-align: \
> > center;">'''CLA''' ||
> > Thilo Goetz ||||<style="text-align: center;"> twgoetz@apache.org \
> > ||||<style="text-align: center;">yes ||
> > Grant Ingersoll ||||<style="text-align: center;"> gsingers@apache.org \
> > ||||<style="text-align: center;">yes ||
> > Jörn Kottmann ||||<style="text-align: center;"> joern@apache.org \
> > ||||<style="text-align: center;">yes ||
> > Thomas Morton ||||<style="text-align: center;"> tsmorton@gmail.com \
> > ||||<style="text-align: center;">no ||
> > William Silva ||||<style="text-align: center;"> william.colen@gmail.com \
> > ||||<style="text-align: center;">yes ||
> > Jason Baldridge ||||<style="text-align: center;"> jasonbaldridge@gmail.com \
> > ||||<style="text-align: center;">yes ||
> > James Kosin ||||<style="text-align: center;"> james.kosin@gmail.com \
> > ||||<style="text-align: center;">yes ||
== Affiliations ==
== Sponsors ==
=== Champion ===
Jörn Kottmann
=== Nominated Mentors ===
Marshall Schor
Benson Margulies
Jörn Kottmann
Grant Ingersoll
=== Sponsoring Entity ===
The Apache Incubator
---------------------------------------------------------------------
To unsubscribe, e-mail: cvs-unsubscribe@incubator.apache.org
For additional commands, e-mail: cvs-help@incubator.apache.org
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic