Graph Convolutional Network Model with a Strongly-typed Functional Language

Published on May 17, 2021May 17, 2021 • 38 Likes • 1 Comments

Faisal Waris, PhD Follow

Data Scientist on contract in telecom

My present job requires me to work with network or graphical data formats. Graphical data are not readily amenable to 'shallow' machine learning methods which are generally optimized for tabular data.

Fortunately, recent developments in deep learning have opened the door to effective graph data processing. A seminal work is the Graph Convolutional Model (CGN) model by Thomas Kipf - Semi-Supervised Classification with Graph Convolutional Networks.

Since then, there has been a veritable Cambrian burst in graph processing approaches. For a broader understanding, the reader is referred to this survey paper by Wu et al - A Comprehensive Survey of Graph Neural Networks. An accessible video introduction to the topic by Michael Bronstein is here. Preprint of the book "Deep Learning on Graphs" (Ma & Tang) is available on the web.

Many domains have inherent graphical structures: e.g., social networks, molecular networks; telecommunication networks; map data; web data; etc. Additionally, many seemingly non-graphical problems can be posed as graphical problems. (The well-known Ant Colony optimization algorithm exploits such a correspondence to solve stochastic optimization problems.) Consequently, graph-based models are yielding state-of-the-art results for many graph-structured problems, e.g. Pinterest GCN-based recommender system.

Starting with GCN

While it may be possible to apply an off-the-shelf deep graphical model to a problem at hand, the default is that no such model will be a perfect fit. Practitioners should understand the nuts and bolts of deep graphical models to be able to customize them for their own needs.

With this in mind I took apart the PyTorch version of GCN which is of course written in Python, and re-built it from the ground up in a different language – F#. This was quite an adventure but the result is that that I have a much deeper understanding GCN and should be able to manipulate it for my own needs more easily. The code is available here (https://github.com/fwaris/gcn).

Surprisingly, the more difficult part was not the model itself but the manipulation of sparse matrices to prep the input data. Here are some observations from this exercise:

Code Size: Python=205 vs. F#=203 lines

From the lines-of-code (LoC) measure (a very rough one!) both are about the same
I did not target LoC as a metric when porting the code over from Python
Methodology: I used the LineCount add-in for VS Code to count the lines in ‘.py’ and ‘.fs’ files in both projects, respectively. From the F# project I excluded the lines for TorchSharp.Fun.fs file which is generic code and will eventually be its own package; it is not specific to GCN
Please use the tagged 1.0 version for comparison as the code is likely to change
This is quite the endorsement for F# because F# is a statically type-checked language whereas Python is a dynamically typed one. In general, statically typed languages are more verbose but are safer as most of the code is validated earlier - even before it runs. F# is expressive enough to keep the code size small while still providing static type checks.

Model Train Time (sec): Python=2.6 vs. F#=2.3. (average of 3 runs)

In terms of speed, both are about the same for model training (on the ‘cora’ dataset)
All tests conducted using Nvidia GTX 1080 GPU with CUDA 10.2

Data Load Time (sec): Python=15 vs. F#=2 (average of 3 runs)

Here there is a huge difference. F# is an order of magnitude faster than Python.
In general, compiled languages are much faster than ‘interpreted’ language such as Python at execution speed. Python is especially slow.
The task here is to read raw data text files and convert them into sparse matrices which are then converted into sparse and dense tensors of libtorch (the underlying C++ deep learning library).

Summary

Many problems have native graphical representations. Many others can be posed as graphical problems. Graphical data are not easily handled by traditional ‘shallow’ methods. Thanks to deep graphical models, graph-based problems can now be easily tackled. This area is likely to see continued strong growth in future.

While Python is the dominant language for deep learning now, there are better alternatives that may gain wider acceptance (especially when pipeline code sizes start to become large). One such alternative is F# which achieved the same result as Python but with type safety and faster execution speed, in about the same amount of code.

Graph Convolutional Network Model with a Strongly-typed Functional Language

Graph Convolutional Network Model with a Strongly-typed Functional Language

Faisal Waris, PhD Follow

Data Scientist on contract in telecom

Starting with GCN

Code Size: Python=205 vs. F#=203 lines

Model Train Time (sec): Python=2.6 vs. F#=2.3. (average of 3 runs)

Data Load Time (sec): Python=15 vs. F#=2 (average of 3 runs)

Summary

Recommend

Github GitHub - nglabo/FSharp.Json.GiraffeSerializer: Giraffe JsonSerializer imp...

欧雷说：「如果解决一件事基本不用动脑就可以 handle，那么做这事就是浪费时间；若是...

Data exploration in F#

“要求各地关停比特币挖矿项目”排名微博热搜榜第25位

Ticketed Spaces are Coming to Twitter, Providing Another Way for Creators to Mon...

奇舞周刊第 398 期

数据结构与算法之递归 + 分治

齐聚2021美云智数北京巡展爱玛集团、中粮家佳康、东方雨虹、红星酒业、飞鹤集团等论...

Twitter Shares New Insights into Evolving Entertainment Discussion via Tweets [I...

Go语言

About Joyk