4

Bruggen Blog: Graphs are everywhere - also in Religious Texts - part 2 - import...

 2 years ago
source link: https://blog.bruggen.com/2022/07/graphs-are-everywhere-also-in-religious.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Graphs are everywhere - also in Religious Texts - part 2 - import the Hadith narrators into Neo4j

The source data that we found in part 1 is in a .csv format - so that means that it basically looks tabular:

uc?export=view&id=1_-zlM-IlnXLYbD79ecFp9MGtVznyvoGi

Luckily, we nowadays have some fantastic tools to import these files, without writing any code at all using the all new Neo4j Data Importer. After drawing a few nodes and relationships, I was able to do the basic import: 

uc?export=view&id=1Xpusn9bf3Sd65b83tGQxQoycjGTvNfVT

 It was super quick to return after a few seconds: 

uc?export=view&id=1XucoqK3d37qSZb7M-96EX3BLmYT1asq5

I am of course sharing the Data Importer config (model and data) as a zip file as well.

As usual, there is a bit of messyness in the data still, so I had to do some wranging to get a better/richer model.

First, we would want to split the two parents of a Scholar into different fields:

:auto MATCH (s:Scholar) 
CALL {
    WITH s
    SET s.parent1 = trim(split(s.parents,"/")[0]) 
    SET s.parent2 =  trim(split(s.parents,"/")[1])
} IN TRANSACTIONS of 1000 ROWS;

<!-- remove the brackets, introduce comma -->
:auto MATCH (s:Scholar) 
CALL {
    WITH s
    SET s.parent1 = replace(s.parent1," [",",")
    SET s.parent1 = replace(s.parent1,"]","")
    SET s.parent2 = replace(s.parent2," [",",")
    SET s.parent2 = replace(s.parent2,"]","")
} IN TRANSACTIONS of 1000 ROWS;

<!-- extract the IDs -->
:auto MATCH (s:Scholar) 
CALL {
    WITH s
    SET s.parent1_id = trim(split(s.parent1,",")[1])
    SET s.parent1 = trim(split(s.parent1,",")[0])
    SET s.parent2_id = trim(split(s.parent2,",")[1])    
    SET s.parent2 = trim(split(s.parent2,",")[0])
} IN TRANSACTIONS of 1000 ROWS;
uc?export=view&id=1YCNJ2NTT41LEeN8zgf2Nx3FODVmUa8ka

This then allows us to create relationships between Scholars that have other Scholars as parents:

MATCH (s:Scholar)
WHERE s.parent1_id IS NOT NULL
WITH s
MATCH (parent:Scholar)
WHERE parent.scholar_indx = s.parent1_id
MERGE (s)-[:CHILD_OF]->(parent);

MATCH (s:Scholar)
WHERE s.parent2_id IS NOT NULL
WITH s
MATCH (parent:Scholar)
WHERE parent.scholar_indx = s.parent2_id
MERGE (s)-[:CHILD_OF]->(parent);
uc?export=view&id=1Y0AM-MaFT7EbHpC8SMfi0IjUgq9U2D-L

Next step is to create the marriage relationships between Scholars. To do that, we first have to split the s.spouse property and store that as a s.listofspouses:

:auto MATCH (s:Scholar)
CALL {
    WITH s
    SET s.listofspouses = split(replace(s.spouse," ",""),",")
} IN TRANSACTIONS OF 1000 ROWS;

Next, we UNWIND the s.listofspouses and get a list of scholar_indx properties that we can match and use to create the [:MARRIED_TO] relationships.

MATCH (s:Scholar)
UNWIND s.listofspouses as scholarspouse
WITH s, replace(split(scholarspouse,"[")[1],"]","") as scholarspouse_id
WHERE scholarspouse_id IS NOT NULL
MATCH (scholarspousenode:Scholar {scholar_indx: scholarspouse_id})
MERGE (s)-[:MARRIED_TO]->(scholarspousenode);

And then finally, we can create the teacher/student relationships between Scholars:

MATCH (s:Scholar)
WITH s, s.students_inds as students_of_scholar
UNWIND students_of_scholar as student
    MATCH (st:Scholar {scholar_indx: student})
    MERGE (st)-[:STUDENT_OF]->(s)
WITH s, s.teachers_inds as teachers_of_scholar
UNWIND teachers_of_scholar as teacher
    MATCH (tea:Scholar {scholar_indx: teacher})
    MERGE (tea)-[:TEACHER_OF]->(s);
uc?export=view&id=1YFoXxlK-FB6Mn6OOerUNuAVVBFAe_5Zy

After having done all of these manipulations, we actually can look at some really interesting subgraphs: 

uc?export=view&id=1YJpWiyovTIMuwr8bEPrTyUezY59Drxu9

Note: there are some additional data in the dataset (and included in the (:Scholar) nodes) like areas of interest and tags. For the purpose of this exercise - the Narrator networks and the chains of narration for each Hadith - this is not as interesting and therefore we are not splitting that information off into separate nodes and relationships. It would be trivial to do so - but unnecessary at this point.

In the next blogpost, we will go and import the actual Hadiths that are being narrated into our graph.

Looking forward already!

PS: as always all the code/queries are available on github!

PPS: you can find all the parts in this blogpost on the following links


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK