1

reviews.llvm.org became a read-only archive

 8 months ago
source link: https://maskray.me/blog/2023-12-30-reviews.llvm.org-became-read-only-archive
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

reviews.llvm.org became a read-only archive

For approximately 10 years, reviews.llvm.org functioned as the code view site for the LLVM project, utilizing a Phabricator instance. This website hosted numerous invaluable code review discussions. However, following LLVM's transition to GitHub pull requests, there arises a necessity for a read-only archive of the existing Phabricator instance.

The intent is to eliminate a SQL engine. Phabicator operates on a complex database scheme. To minimize time investment, the most feasible approach seems to involve downloading the static HTML pages and employing a lightweight scraping process.

Raphaël Gomès developed phab-archive to serve a read-only archive for Mercurial's Phabricator instance. I have modified the code to suit reviews.llvm.org.

At this juncture, the only requirement is someone with domain access to redirect reviews.llvm.org to the archive website. Then we can obtain a HTTPS certificate.

The file hierarchy is quite straightforward. archive/unprocessed/diffs contains raw HTML pages while templates/diffs contains scraped HTML pages alongside patch files.

% tree archive/unprocessed/diffs | head -n 12
archive/unprocessed/diffs
├── 1
│   ├── D1-4.html
│   ├── D1-5.html
│   └── D1.html
├── 10
│   ├── D10-33.html
│   └── D10.html
├── 100
│   ├── D100000-335683.html
│   ├── D100000-335688.html
│   ├── D100000-335689.html
% tree templates/diffs/ | head -n 20
templates/diffs/
├── 1
│   ├── D1-4.diff
│   ├── D1-4.html
│   ├── D1-5.diff
│   ├── D1-5.html
│   ├── D1.diff
│   └── D1.html
├── 10
│   ├── D10-33.diff
│   ├── D10-33.html
│   ├── D10.diff
│   └── D10.html
├── 100
│   ├── D100000-335683.diff
│   ├── D100000-335683.html
│   ├── D100000-335688.diff
│   ├── D100000-335688.html
│   ├── D100000-335689.diff
│   ├── D100000-335689.html
% cat templates/diffs/1/D1-4.diff
Index: include/llvm/ADT/StringMap.h
===================================================================
--- include/llvm/ADT/StringMap.h
+++ include/llvm/ADT/StringMap.h
@@ -34,7 +34,7 @@
public:
template <typename InitTy>
static void Initialize(StringMapEntry<ValueTy> &T, InitTy InitVal) {
- T.second = InitVal;
+ T.test= InitVal;
}
};
% du -sh archive/unprocessed/
270G archive/unprocessed/
% du -sh templates/diffs
282G templates/diffs

Nginx

I aim to utilize Nginx solely to serve URIs.

/D2 => /diffs/2/D2.html
/D2?id=&download=true => /diffs/2/D2.diff
/D2?id=10 => /diffs/2/D2-10.html
/D2?id=10&download=true => /diffs/2/D2-10.diff

/D123?id=5 => /diffs/123/D123-5.html
/D1234?id=5 => /diffs/123/D1234-5.html

/rL$svn_rev => https://github.com/llvm/llvm-project/commit/$git_commit
/rG$git_commit => https://github.com/llvm/llvm-project/commit/$git_commit

We just need URL mapping and some Nginx location directives.

map_hash_max_size 400000;
map_hash_bucket_size 128;
map $request_uri $svn_rev {
~^/rL([0-9]+) $1;
}
map $svn_rev $git_commit {
include /var/www/phab-archive/svn_url_rewrite.conf;
}

server {
listen 80 default_server;
listen [::]:80 default_server;

if ($git_commit) {
return 301 https://github.com/llvm/llvm-project/commit/$git_commit;
}

root /var/www/phab-archive/www;
server_name _;

types {
text/html html;
text/plain diff;
}

location ~ "^/D(?<diff>.{1,3})$" {
set $ext ".html";
if ($arg_download) { set $ext ".diff"; }
if ($arg_id ~ ^(\d+)$) { rewrite ^ /diffs/$diff/D$diff-$arg_id$ext? last; }
try_files /diffs/$diff/D$diff$ext =404;
}
location ~ ^/D(?<dir>...)(?<tail>.+) {
set $ext ".html";
if ($arg_download) { set $ext ".diff"; }
if ($arg_id ~ ^(\d+)$) { rewrite ^ /diffs/$dir/D$dir$tail-$arg_id$ext? last; }
try_files /diffs/$dir/D$dir$tail$ext =404;
}
}

Share Comments


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK