In our project we are developing and testing tools and approaches for the acquisition and goal-driven processing of knowledge from corporate data. Trough the analysis of the semantic relationships between existing, explicitly named entities in the documents, new more complex implicit entities can be learned and identified.
We propose a knowledge-based approach for the recognition of complex, composed or implicit entities. In our approach we first preprocess the document corpus by using existing named entity recognition approaches in order to identify simple, implicitly named entities. In the following steps, the recognized entities are seen as a set of entities and the relationships between them are analyzed. These relationships are extracted from an existing knowledge base as well as from the document corpus. From the analysis of the relationships between the recognized entities as well as the relations to other resources in the knowledge bases we can infer the probability of the existence of a complex entity.
One of the particularities of our approach is the ability to recognize abstract entities that are composed out of different simple entities like for example events. For example one could improve a diagnosis by recognizing and annotating a symptom description in medical texts. The relations between the individual recognized simple entities like "headaches", "fever" and "rash" in combination with background knowledge about symptoms as well as patient records let us delimit possible diseases as the triggers for these symptoms. Another example would be the description of events on the stock market trough the recognition of entities that belong or can be grouped into certain patterns, like the high volume sale of stocks and the related loss in stock value.
Research Associate: Alexandru Todor