Semi-automatic and Collaborative Retrieval of Information Based on Ontologies
The SCRIBO project's goal is to develop algorithms and collaborative tools for extracting knowledge from unstructured documents and images. Its distinguishing features rely on the combination of semantic and statistical approaches for natural language processing (NLP) and on its focus on the collaborative dimension of the knowledge extraction process. SCRIBO is contributing to the definition of international NLP standards at the ISO and W3C levels. SCRIBO provides an integrated tool chain for extracting light ontologies from a corpus of documents, for acquiring knowledge by leveraging ontologies, and for extracting structures from digitalized documents. The tool chain will be available as an integrated modular environment building upon standard APIs and facilitating advanced collaboration by providing graphical widgets for annotating semantically evolving texts and images. SCRIBO outcomes are being edited under a commercial friendly open-source license.

The SCRIBO project provided two major advancements.

– On one hand research-oriented partners have provided and improved their algorithms
for automated ontology extraction and semi-automated conversion of text
and image into exploitable knowledge;
– on the other hand technological partners have had the chance to experiment with
these technologies and to integrate them in their products in order to provide a
richer experience for their users.

The outcome is a set of tools and products that have been effectively used to improve
information management processes and that are available for other people as open
source software.

Number of scientific articles published: 11

– “Extracting and Visualizing Quotations from News Wires”, LTC 2009
– “Milena: Write Generic Morphological Algorithms Once, Run on Many Kinds
of Images”, ISMM 2009
– “Word Sense Induction From Multiple Semantic Spaces”, RANLP 2009
– “A Deep Ontology for Named Entities”, ISA 2011

Number of patents filed: 0

Number of product’s innovation: 6

– EPITA/LRDE Dematerialisation module for Milena
– Nuxeo Automated Document Categorization and Semantic Entities extensions Mandriva – Smart Desktop
– Proxem Ubiq E-Reputation extension
– SCRIBO Semantic Workbench
– XWiki Annotation and Semantic analysis framework

Number of product’s innovation service: 0

Number of projected jobs created: 6

Number of jobs maintained: 0

Number of related companies creation: 0