Fiction and fantasy are archetypes of long-tail domains that lack comprehensive methods for automated language processing and knowledge extraction. We present ENTYFI, the first methodology for typing entities in fictional texts coming from books, fan communities or amateur writers.
ENTYFI builds on 205 automatically induced high-quality type systems for popular fictional domains, and exploits the overlap and reuse of these fictional domains for fine-grained typing in previously unseen texts.
ENTYFI comprises five steps: type system induction, domain relatedness ranking, mention detection, mention typing, and type consolidation.
The recall-oriented typing module combines a supervised neural model, unsupervised Hearst-style and dependency patterns, and knowledge base lookups. The precision-oriented consolidation stage utilizes co-occurrence statistics in order to remove noise and to identify the most relevant types. Extensive experiments on newly seen fictional texts demonstrate the quality of ENTYFI.
Code and data are coming available soon.
ENTYFI: Entity Typing in Fictional Texts
Cuong Xuan Chu, Simon Razniewski, Gerhard Weikum
In Proc. WSDM 2020