README.md

GiNZA NLP Library

An Open Source Japanese NLP Library based on Universal Dependencies

License

GiNZA NLP Library and GiNZA Japanese Universal Dependencies Models are distributed under The MIT License. You must agree and follow The MIT License to use GiNZA NLP Library and GiNZA Japanese Universal Dependencies Models.

spaCy

spaCy is the key framework of GiNZA. spaCy LICENSE PAGE

Sudachi and SudachiPy

SudachiPy provides high accuracies for tokenization and pos tagging. Sudachi LICENSE PAGE, SudachiPy LICENSE PAGE

Runtime Environment

This project is developed with Python 3.7 and pip for it.

The footprint of this project is about 250MB. Sudachi dictionary is 200MB. The word embeddings from entire Japanese Wikipedia is 50MB.

(Please see Development Environment section located on bottom too)

Runtime set up

1. Install GiNZA NLP Library with Japanese Universal Dependencies Model

Run following line

pip install "https://github.com/megagonlabs/ginza/releases/download/v1.0.1/ja_ginza_nopn-1.0.1.tgz"

or download pip install archive from release page and specify it as below.

pip install ja_ginza_nopn-1.0.1.tgz

2. Test

Run following line and input some Japanese text + Enter, then you can see the parsed results with conll format.

python -m spacy.lang.ja_ginza.cli

Coding example

Following steps shows dependency parsing results with sentence boundary 'EOS'.

import spacy
nlp = spacy.load('ja_ginza')
doc = nlp('依存構造解析の実験を行っています。')
for sent in doc.sents:
    for token in sent:
        print(token.i, token.orth_, token.lemma_, token.pos_, token.dep_, token.head.i)
    print('EOS')

APIs

Please see spaCy API documents.

Releases

version 1.0

ja_ginza_nopn-1.0.1 (2019-04-02)

Add new Japanese era 'reiwa' to system_core.dic.

ja_ginza_nopn-1.0.0 (2019-04-01)

First release version

Development Environment

Development set up

1. Clone from github

git clone --recursive 'https://github.com/megagonlabs/ginza.git'

2. Run ./setup.sh

For normal environment:

./setup.sh

For GPU environment(cuda92):

./setup_cuda92.sh

Training

Prepare nopn_embedding/, nopn/, and kwdlc/ in your project directory, then run below. (We're preparing the descriptions of training environment. Coming soon.)

shell/build.sh nopn 1.0.1

You can speed up training and analyze process by adding -g option if GPUs available.

After a while, you will find pip installable archive.

target/ja_ginza_nopn-1.0.1.tgz

GitHub - megagonlabs/ginza: A Japanese NLP Library using spaCy as framework base...

README.md

GiNZA NLP Library

License

spaCy

Sudachi and SudachiPy

Runtime Environment

Runtime set up

1. Install GiNZA NLP Library with Japanese Universal Dependencies Model

2. Test

Coding example

APIs

Releases

version 1.0

ja_ginza_nopn-1.0.1 (2019-04-02)

ja_ginza_nopn-1.0.0 (2019-04-01)

Development Environment

Development set up

1. Clone from github

2. Run ./setup.sh

Training

Recommend

GitHub - ethereum/trinity: The Trinity client for the Ethereum network

夏天将至各位肥宅准备买点什么饮料啊

撬开马云王健林们的神秘保险柜

我们长沙就是最硬核的城市，没有之一

值友专享、绝对值:SONY 索尼 WH-1000XM3 蓝牙降噪耳机合1799元包邮（需用券、返400元...

农产品：菜鸟、京东、顺丰们争夺冷链物流的新战场？

当贷款像水一样

新云医疗获2000万元A轮融资，专注于打造疼痛专科医联体

中通直逼顺丰·快递江湖新局

你真的需要服务网格吗？

About Joyk