coansys.ceon.pl
Content Analysis SystemOur goal is to create an open source framework for mining very large collections of scientific publications. Content Analysis System (CoAnSys) will handle tens of millions of bibliographic records on a modest Hadoop cluster. We employ state-of-the-art machine learning techniques for document deduplication, author name disambiguation, citation matching, keyword extraction, and document analysis (similarity, classification). All of this is implemented in Apache Hadoop. Oozie ). CoAnSys uses HBase. Will be ...
http://coansys.ceon.pl/