Bruggen Blog: Graphs are everywhere - also in Religious Texts - part 2 - import...
source link: https://blog.bruggen.com/2022/07/graphs-are-everywhere-also-in-religious.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Graphs are everywhere - also in Religious Texts - part 2 - import the Hadith narrators into Neo4j
The source data that we found in part 1 is in a .csv
format - so that means that it basically looks tabular:
Luckily, we nowadays have some fantastic tools to import these files, without writing any code at all using the all new Neo4j Data Importer. After drawing a few nodes and relationships, I was able to do the basic import:
It was super quick to return after a few seconds:
I am of course sharing the Data Importer config (model and data) as a zip file as well.
As usual, there is a bit of messyness in the data still, so I had to do some wranging to get a better/richer model.
First, we would want to split the two parents of a Scholar into different fields:
:auto MATCH (s:Scholar)
CALL {
WITH s
SET s.parent1 = trim(split(s.parents,"/")[0])
SET s.parent2 = trim(split(s.parents,"/")[1])
} IN TRANSACTIONS of 1000 ROWS;
<!-- remove the brackets, introduce comma -->
:auto MATCH (s:Scholar)
CALL {
WITH s
SET s.parent1 = replace(s.parent1," [",",")
SET s.parent1 = replace(s.parent1,"]","")
SET s.parent2 = replace(s.parent2," [",",")
SET s.parent2 = replace(s.parent2,"]","")
} IN TRANSACTIONS of 1000 ROWS;
<!-- extract the IDs -->
:auto MATCH (s:Scholar)
CALL {
WITH s
SET s.parent1_id = trim(split(s.parent1,",")[1])
SET s.parent1 = trim(split(s.parent1,",")[0])
SET s.parent2_id = trim(split(s.parent2,",")[1])
SET s.parent2 = trim(split(s.parent2,",")[0])
} IN TRANSACTIONS of 1000 ROWS;
This then allows us to create relationships between Scholar
s that have other Scholar
s as parents:
MATCH (s:Scholar)
WHERE s.parent1_id IS NOT NULL
WITH s
MATCH (parent:Scholar)
WHERE parent.scholar_indx = s.parent1_id
MERGE (s)-[:CHILD_OF]->(parent);
MATCH (s:Scholar)
WHERE s.parent2_id IS NOT NULL
WITH s
MATCH (parent:Scholar)
WHERE parent.scholar_indx = s.parent2_id
MERGE (s)-[:CHILD_OF]->(parent);
Next step is to create the marriage relationships between Scholar
s. To do that, we first have to split the s.spouse
property and store that as a s.listofspouses
:
:auto MATCH (s:Scholar)
CALL {
WITH s
SET s.listofspouses = split(replace(s.spouse," ",""),",")
} IN TRANSACTIONS OF 1000 ROWS;
Next, we UNWIND
the s.listofspouses
and get a list of scholar_indx
properties that we can match and use to create the [:MARRIED_TO]
relationships.
MATCH (s:Scholar)
UNWIND s.listofspouses as scholarspouse
WITH s, replace(split(scholarspouse,"[")[1],"]","") as scholarspouse_id
WHERE scholarspouse_id IS NOT NULL
MATCH (scholarspousenode:Scholar {scholar_indx: scholarspouse_id})
MERGE (s)-[:MARRIED_TO]->(scholarspousenode);
And then finally, we can create the teacher/student relationships between Scholar
s:
MATCH (s:Scholar)
WITH s, s.students_inds as students_of_scholar
UNWIND students_of_scholar as student
MATCH (st:Scholar {scholar_indx: student})
MERGE (st)-[:STUDENT_OF]->(s)
WITH s, s.teachers_inds as teachers_of_scholar
UNWIND teachers_of_scholar as teacher
MATCH (tea:Scholar {scholar_indx: teacher})
MERGE (tea)-[:TEACHER_OF]->(s);
After having done all of these manipulations, we actually can look at some really interesting subgraphs:
Note: there are some additional data in the dataset (and included in the (:Scholar) nodes) like areas of interest and tags. For the purpose of this exercise - the Narrator networks and the chains of narration for each Hadith - this is not as interesting and therefore we are not splitting that information off into separate nodes and relationships. It would be trivial to do so - but unnecessary at this point.
In the next blogpost, we will go and import the actual Hadiths that are being narrated into our graph.
Looking forward already!
PS: as always all the code/queries are available on github!
PPS: you can find all the parts in this blogpost on the following links
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK