1

利用GitHub Actions自动检测项目中的问题链接

 2 years ago
source link: https://wiki.eryajf.net/pages/c78b38/#%E6%95%88%E6%9E%9C
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

利用GitHub Actions自动检测项目中的问题链接原创

# 前言

我维护的开源项目 Thanks-Mirror (opens new window) 整理记录了各个包管理器,系统镜像,以及常用软件的好用镜像,随着项目越来越完善,到今天,已经累计整理链接 1091 个,随着时间推移,一些国内镜像可能会停止维护,如何自定感知那些已经失效的链接,就是一个需要考虑的事情了。

本文就介绍一个有意思的小动作,它的主要功能是可以自动扫描仓库内的链接,然后对链接进行请求,根据自定义的规则,自动抛出异常的链接,然后将这些链接创建到issue当中。

# 配置

所用Actions:lycheeverse/lychee-action

使用配置其实非常简单,基本上阅读完官方介绍文档就可以上手使用了,不过官方文档介绍的方式并不是很灵活,官方是借助其开源的项目:lychee (opens new window)来完成检查,本文将针对这个开源项目拓展的配置文件,来实现更加丰富的能力。

首先添加Actions配置文件,e.g. .github/workflows/links-check.yml

name: 🔗 检查链接
on:
  repository_dispatch:
  push:
    branches:
      - main
  workflow_dispatch:
  schedule:
    - cron: "00 18 * * *"
jobs:
  linkChecker:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Link Checker
        id: lychee
        uses: lycheeverse/[email protected]
        env:
          GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
        with:
            # Check all markdown and html files in repo (default)
            args: --config ./.github/config/lychee.toml README.md
            # Use json as output format (instead of markdown)
            format: markdown
            # Use different output file path
            output: ./lychee/out.md
      - name: Create Issue From File
        if: steps.lychee.outputs.exit_code != 0
        uses: peter-evans/create-issue-from-file@v3
        with:
          title: 🔗 链接检查报告
          content-filepath: ./lychee/out.md
          labels: report, automated issue
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

简单介绍这个动作:当有内容提交,以及每天18点会自动运行(当然也可以手动运行),自动检测 README.md文件中的所有链接,使用配置文件 ./.github/config/lychee.toml,结果输出到 ./lychee/out.md,输出格式为Markdown,如果全部检查通过,则不会有任何动作,如果检查失败,则会自动创建issue。

上边内容提到了 .github/config/lychee.toml,这里列出我使用的配置文件:

#############################  Display  #############################

# Verbose program output
verbose = true

# Show progress
progress = true

# Path to summary output file.
# output = "report.md"

#############################  Cache  ###############################

# Enable link caching. This can be helpful to avoid checking the same links on
# multiple runs.
cache = true

#############################  Runtime  #############################

# Number of threads to utilize.
# Defaults to number of cores available to the system if omitted.
threads = 6

# Maximum number of allowed redirects [default: 10]
max_redirects = 10

# Maximum number of concurrent network requests [default: 128]
max_concurrency = 30

#############################  Requests  ############################

# User agent to send with each request
user_agent = "curl/7.83.1"

# Website timeout from connect to response finished
timeout = 10

# Minimum wait time in seconds between retries of failed requests.
retry_wait_time = 2

# Comma-separated list of accepted status codes for valid links.
# Omit to accept all response types.
#accept = "text/html"

# Proceed for server connections considered insecure (invalid TLS)
insecure = true

# Comma-separated list of accepted status codes for valid links.
# Don't work as of yet until https://github.com/lycheeverse/lychee/issues/644
# is resolved
accept = [200,204,301,429,403]

# Only test links with the given scheme (e.g. https)
# Omit to check links with any scheme
#scheme = "https"

# Request method
method = "get"

# Custom request headers
headers = []

#############################  Exclusions  ##########################

# Exclude URLs from checking (supports regex)

# balena base images account for ~1400 request to GitHub, they are
# omitted to avoid being rate limited.
# See https://docs.github.com/en/rest/overview/resources-in-the-rest-api#rate-limiting
# The openvpn link is omitted as trying to auto chek it results in
# a 503, even when it is available.
# The meta-balena link is included in parameterized scripts and as
# a result will always produce a failing link.
# The myorg/myapp link is a dummy address used in an example contract so is omitted.
# The balena/resin API urls will not respond to unauthenticated requests
# The gstatic and googleapis links go 404 and are excluded ever since we started checking HTML
# balenaCLI linux binary URLs always error out since they are generated on run time only
# File URLs are excluded as they aren't checked properly and error out
exclude = [
    "developer.aliyun.com/*",
    "mirrors.ustc.edu.cn/*",
    "eryajf.net/*",
    "rsproxy.cn/*",
    "https://mirrors.cloud.tencent.com/go/",
    "http://maven.aliyun.com/nexus/content/groups/public/",
    "https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/brew.git",
    "https://mirrors.tuna.tsinghua.edu.cn/git/homebrew/homebrew-core.git",
]

# Exclude URLs contained in a file from checking
exclude_file = []

include = []

include_verbatim = true

# Exclude all private IPs from checking
# Equivalent to setting `exclude_private`, `exclude_link_local`, and `exclude_loopback` to true
exclude_all_private = true

# # Exclude private IP address ranges from checking
# exclude_private = false

# # Exclude link-local IP address range from checking
# exclude_link_local = false

# # Exclude loopback IP address range and localhost from checking
# exclude_loopback = false

# Exclude all mail addresses from checking
exclude_mail = true
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111

其中大部分内容都通用,可能需要调整的两个内容是:acceptexclude,一开始我检查的时候,发现所有 developer.aliyun.com在GitHub Actions中访问都是网络失败,猜测应该是ali限制了外部访问,这也能理解,因此就把整个域名全部加到排除的行列了。

总之检查结果需要自己进行一些过滤分析,然后再结合配置文件的含义进行调整。

# PR自动检查

如上action并没有对PR进行检查,你还可以再添加一个动作,专门用于检测PR提交上来的链接:

$ cat link-check-pr.yml

name: Links (Fail Fast)
on:
  pull_request:
jobs:
  linkChecker:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Link Checker
        uses: lycheeverse/[email protected]
        with:
          # Check all markdown and html files in repo (default)
            args: --config ./.github/config/lychee.toml README.md
            # Use json as output format (instead of markdown)
            format: markdown
            # Use different output file path
            output: ./lychee/out.md
            # Fail action on broken links
            fail: true
        env:
          GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

这样当pr时有异常的链接,将会检测失败,以前置预检一些可能是坏的链接合并到项目。

# 效果

检测通过之后的效果如下:

6553b783d2d157ca.png

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK