Japanese Wikification Corpora

Last Update: June 14, 2017.

(UNDER CONSTRUCTION)

Joint work with Assoc. Prof. Shinsuke Mori from the Academic Center for Computing and Media Studies, Kyoto University

Annotation for wikification.
An example of wikification. (a) Mappings from mentions to entities. (b) Wikitext as in Wikipedia. (c) Our annotation.

Features

Download

Annocated Corpora

COMING SOON.

A subset of BCCWJ (OW and OY) will be able to be downloaded from NINJAL.

A tweet corpus will be available upon request.

Wikipedia Snapshot

We chose an XML dump of Japanese Wikipedia dated 12 May 2015. For the sake of reproducibility, we decided to stick with this snapshot. Since this snapshot is no longer available in Wikimedia, we provide our copy here.

Reference(s)

See also