JAWJAW: Java Wrapper for Japanese WordNet

Last modified: 2013-03-20

Introduction

JAWJAW (JAva Wrapper for JApanese Wordnet) is a Java API for Japanese WordNet (wn-ja) database (which also contains Princeton's English WordNet v3.0) that offers access to lexical knowledge of a given word such as hypernym, hyponym, definition, translation (English <--> Japanese).

It's an API that hides the wn-ja DB schematic details from the programmer side. We provide both the simple API for general Java programmers and more fine-grained API for Natural Language Processing (NLP) application developers.

Simple API

Just call methods in the façade class. You can see the list of available methods here.

Sample code:

public class SimpleDemo {
	private static void run( String word, POS pos ) {
		// Accessing Japanese WordNet from the façade class called JAWJAW
		Set<String> hypernyms = JAWJAW.findHypernyms(word, pos);
		Set<String> hyponyms = JAWJAW.findHyponyms(word, pos);
		Set<String> consequents = JAWJAW.findEntailments(word, pos);
		Set<String> translations = JAWJAW.findTranslations(word, pos);
		Set<String> definitions = JAWJAW.findDefinitions(word, pos);
		// Showing results. (note: polysemies are mixed up here)
		System.out.println( "hypernyms of "+word+" : \t"+ hypernyms );
		System.out.println( "hyponyms of "+word+" : \t"+ hyponyms );
		System.out.println( word+" entails : \t\t"+ consequents );
		System.out.println( "translations of "+word+" : \t"+ translations );
		System.out.println( "definitions of "+word+" : \t"+ definitions );		
	}
	public static void main(String[] args) {
		// Showing a demo for "買収"(verb) which means to acquire
		SimpleDemo.run( "買収", POS.v );
	}
}

Output:

hypernyms of 買収 : 	[支辨, 払出す, 会計, 払いだす, 払い出す, 出費, 支出, 出金, 支払う, 払いこむ, 支弁, 精算, 得る, 手に入れる, 入手, 買い上げる, 買いつける, 買求める, 召す, 買取り, 買収, 買いあげる, 買いもとめる, 買上げる, 買い入れる, 買込む, 購う, 購買, 買い付ける, 購入, 購求, 買う, 買いとる, 買入れる, 買いいれる, 買い求める, 買い取り, 買い取る, 買い受ける, 買い出し, 買いこむ, 買取, 買取る, 買い込む]
hyponyms of 買収 : 	[買いうける, 買い上げる, 譲りうける, 買収, 買受ける, 買い取る, 譲受ける, 買取る]
買収 entails : 		[支辨, 払出す, 会計, 払いだす, 払い出す, 出費, 支出, 出金, 支払う, 払いこむ, 支弁, 精算, 採択, 択む, 選む, 選考, 選分, 選び取る, より取る, 選りわける, 選取, 選定, 選りすぐる, 択る, チョイス, 択ぶ, 選りどる, 選る, 選取る, 選り分ける, 選り抜く, より分ける, セレクト, 選抜, 選する, 精選, 選り取る, 選び出す, 選抜く, より抜く, 簡抜, 選択, 選り出す, 選分ける, 選りぬく, 選出す, より出す, 選りだす, 選ぶ]
translations of 買収 : 	[corrupt, buy, bribe, grease_one's_palms, purchase, buy_out, take_over, buy_up]
definitions of 買収 : 	[make illegal payments to in exchange for favors or influence; "This judge can be bought", obtain by purchase; acquire by means of a financial transaction; "The family purchased a new car"; "The conglomerate acquired a new company"; "She buys for the big department store", take over ownership of; of corporations and companies]

API for NLP Application Developers

In this API, you can get the raw content from the DB through DAO (Data Access Objects).

Data model:

Here's the domain model diagram generated from the Japanese WordNet DB schema. The API provides each data class and its DAO. Domain attributes "pos", "link" and "lang" are implmented as Enum class.

Available concept relationships:
Here's a summary of concept relationship "links" stored in the synlink table. (As of wn-ja v0.9)

link	link description	#
also	See also	2692
syns	Synonyms	0
hype	Hypernyms	89089
inst	Instances	8577
hypo	Hyponym	89089
hasi	Has Instance	8577
mero	Meronyms	0
mmem	Meronyms --- Member	12293
msub	Meronyms --- Substance	979
mprt	Meronyms --- Part	9097
holo	Holonyms	0
hmem	Holonyms --- Member	12293
hsub	Holonyms --- Substance	797
hprt	Holonyms -- Part	9097
attr	Attributes	1278
sim	Similar to	21386
enta	Entails	408
caus	Causes	220
dmnc	Domain --- Category	6643
dmnu	Domain --- Usage	967
dmnr	Domain --- Region	1345
dmtc	In Domain --- Category	6643
dmtu	In Domain --- Usage	967
dmtr	In Domain --- Region	1345
ants	Antonyms	0

Total number of concepts/words are:

49,190 concepts (called synsets in WordNet)
85,966 words
156,684 word definitions (pairs of word and synset)

Sample code:

public class AdvancedDemo {
	private static void run( String word, POS pos ) {
		// Access the Japanese WordNet DB and process the raw data
		List<Word> words = WordDAO.findWordsByLemmaAndPos(word, pos);
		List<Sense> senses = SenseDAO.findSensesByWordid( words.get(0).getWordid() );
		String synsetId = senses.get(0).getSynset();
		Synset synset = SynsetDAO.findSynsetBySynset( synsetId );
		SynsetDef synsetDef = SynsetDefDAO.findSynsetDefBySynsetAndLang(synsetId, Lang.eng);
		List<Synlink> synlinks = SynlinkDAO.findSynlinksBySynset( synsetId );
		// Showing the result
		System.out.println( words.get(0) );
		System.out.println( senses.get(0) );
		System.out.println( synset );
		System.out.println( synsetDef );
		System.out.println( synlinks.get(0) );
	}
	public static void main(String[] args) {
		// Showing a demo for "自然言語処理"(noun) which means NLP
		AdvancedDemo.run( "自然言語処理", POS.n ); 
	}
}

Output:

Javadoc

Refer to this page.

Download

Download the latest version from here. (License: Apache License, Version 2.0）

How to use

Download the DB from Japanese WordNet website and put it under the src/main/resources directory, e.g. "src/main/resources/wnjpn.db" (not wnjpn-0.9.db). It works on JDK 5 or later. To compile and get libraries (i.e. sqlite-jdbc-3.7.2.jar, junit-4.7.jar), we recommend you use Maven2. With the provided pom.xml file, you can easily compile and solve dependencies with "mvn compile" and sanity-check the code with "mvn test".

Version history

1.0.2 (2013-03-19) - Very fast initialization even when WordNet DB is in jar (0-1 sec), by using "jdbc:sqlite::resource". Compatible with m2e (m2eclipse deprecated).
1.0.0 (2011-10-16) - Released at Project Hosting on Google Code
2009-03-23 - initial release

Future works

~~Metrics for semantic similarity/distance between two synsets~~ Released WS4J (WordNet Similarity for Java)
Command line interface
Web interface

Contact

Hideki Shima at Carnegie Mellon University
Email: hideki at cs.cmu.edu