2

Fake GitHub Star 的生意

 1 year ago
source link: https://blog.gslin.org/archives/2023/03/20/11106/fake-github-star-%e7%9a%84%e7%94%9f%e6%84%8f/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Fake GitHub Star 的生意

昨天在 Hacker News 首頁上看到「Tracking the Fake GitHub Star Black Market (dagster.io)」這篇,原文在「Tracking the Fake GitHub Star Black Market with Dagster, dbt and BigQuery」這邊。

作者群想要偵測 GitHub 上面 fake star 的行為,所以就跑去找黑市買,然後找到了兩家,Baddhi Shop (1000 個 $64) 與 GitHub24 (每個 €0.85,大約是 $0.91),價錢差異很大,「品質」差異也很大:貴的 star 在一個月後還是存在,而便宜的看起來有一些有被 GitHub 偵測到而清除掉:

A month later, all 100 GitHub24 stars still stood, but only three-quarters of the fake Baddhi Shop stars remained. We suspect the rest were purged by GitHub’s integrity teams.

接下來就是想要系統化分析,切入點是 GH Archive 這個服務,可以直接下載 GitHub 全站上的 public evnets 資料:

GH Archive is a project to record the public GitHub timeline, archive it, and make it easily accessible for further analysis.

想要偵測兩種不同的 fake account,第一種是 obvious fake account,定義成這樣:

  • Created in 2022 or later
  • Followers <=1
  • Following <= 1
  • Public gists == 0
  • Public repos <=4
  • Email, hireable, bio, blog, and twitter username are empty
  • Star date == account creation date == account updated date

從定義就可以看出來完全就是灌水帳號,開出來就沒在動的。從 screenshot 可以看出這種帳號長的都一樣:

5Stue3r.gif

另外一種則是透過演算法去分析,這邊拿 unsupervised clustering 類的演算法分析出來的結果,可以看到抓到很多:

LJEYNjS.png

最近 NN 類的 machine learning 演算法太多,看到這些傳統的 machine learning 演算法還是覺得頗新鮮的...

Related

用 GitHub + Netlify + Cloudflare 管理靜態網站...

最近 GitHub Pages 支援 HTTPS (透過 Let's Encrypt,參考「GitHub 透過 Let's Encrypt 提供自訂網域的 HTTPS 服務」這篇),但測了一下不是我想要的效果,就找了一下網路上的資源,結果有找到還算可以的方案... 先把網站放在 GitHub 上。(不需要設定 GitHub Pages) 然後用 Netlify 變成網站並且開啟 HTTPS。(可以選擇使用系統內提供的 Let's Encrypt,會透過 http-01 認證。如果因為 DNS 還沒生效的話也沒關係,可以之後再開。) 然後用 Cloudflare 管理 DNS 的部份 (主要是因為他的 root domain 可以設 CNAME,一般會提到的 ALIAS 就是指這個)。 這樣整個靜態服務都不用自己管理,而且有蠻多 header 可以設定,其中與 GitHub Pages 最主要的差異是 Netlify 支援 301/302…

May 7, 2018

In "CDN"

GitHub 的 API Token 換格式

GitHub 前幾天宣佈更換 API token 的格式:「Authentication token format updates are generally available」,在今年三月初的時候有先公告要換:「Authentication token format updates」。 另外昨天也解釋了換成這樣的優點:「Behind GitHub’s new authentication token formats」。 首先是 token 的字元集合變大了: The character set changed from [a-f0-9] to [A-Za-z0-9_] 另外是增加了 prefix 直接指出是什麼種類的 token: The format now includes a prefix for each token type: ghp_ for Personal Access Tokens…

April 6, 2021

In "API"

GitHub 上的軟體授權分佈

雖然 GitHub 有提供 license 相關的 API 可以查,但因為準確度不高 (只要稍微改到,GitHub 就無法偵測到正確的 license),所以有人決定用 machine learning 的方式另外分析:「Detecting licenses in code with Go and ML」。當然這邊是分析公開的部份: 最大包的是 MIT License,次之是 Apache-2.0 (問號那群先不管),再來是 GPL 家族的各版本。沒有太特別的意外發生...

May 5, 2018

In "Computer"

a611ee8db44c8d03a20edf0bf5a71d80?s=49&d=identicon&r=gAuthor Gea-Suan LinPosted on March 20, 2023Categories Computer, Murmuring, Network, Programming, Service, SpamTags ai, black, dark, detection, fake, github, learning, machine, market, spam, star

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Learn More)

Post navigation


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK