0

提議 OpenJDK 的程式碼 UTF-8 化

 1 year ago
source link: https://blog.gslin.org/archives/2023/03/02/11083/%e6%8f%90%e8%ad%b0-openjdk-%e7%9a%84%e7%a8%8b%e5%bc%8f%e7%a2%bc-utf-8-%e5%8c%96/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

提議 OpenJDK 的程式碼 UTF-8 化

Hacker News 上看到提議把 OpenJDK 的程式碼 UTF-8 化:「Make JDK source code UTF-8 (openjdk.org)」,原始的 jira ticket 在「Make JDK source code UTF-8」這邊。

主要是現在的 source code 沒有統一標準:

Currently, the source code in the JDK is in an ill-defined encoding. There is no official declaration of the encoding used. It is "mostly ASCII", but the relatively few non-ASCII characters used are not well-defined. In many cases, it is latin-1, but I am pretty certain other encodings are used for e.g. Asian translations.

而 mailing list 上看起來還是有想要維持 ASCII 的討論:「Making the source code utf-8」,不過他提出來的理由我覺得不太行,在 console 跑 vi 或是 emacs 我覺得不太是個好理由...

Related

非常經典的 UTF-8...

在 Hacker News 文摘上看到「UTF-8 – “The most elegant hack”」這篇。除了維基百科上的資料以外,Rob Pike 與其他人在 2003 年寫的 mail 也是相當重要的資料。 Ken Thompson 與 Rob Pike 兩位發展出來的 UTF-8 被譽為最優雅的 hack 真的一點都不為過。Unicode 1.0 在 1991 年 10 月公佈。之後就陸陸續續有表示的格式出來... 相容於 ASCII 0-127 的 UTF-1 在 1992 年被提出來,但 parsing performance 並不好。 1992 年 7 月,Dave Prosser 提出 FSS-UTF,很類似後來的 UTF-8…

October 1, 2013

In "Computer"

Branchless UTF-8 解碼器

看到「A Branchless UTF-8 Decoder」這篇,先來回憶一下「非常經典的 UTF-8...」這篇,以及裡面提到的 encoding: 因為當初在設計 UTF-8 時就有考慮到,所以 decoding 很容易用 DFA 解決,也就是寫成一堆 if-then-else 的條件。但現代 CPU 因為 out-of-order execution 以及 pipeline 的設計,遇到 random branch 會有很高的效能損失,所以作者就想要試著寫看看 branchless 的版本。 成效其實還好,尤其是 Clang 上說不定在誤差內: With GCC 6.3.0 on an i7-6700, my decoder is about 20% faster than the DFA decoder in the benchmark. With…

October 9, 2017

In "Computer"

Elasticsearch 1.2.0

由於 Elasticsearch 的想法與實做比起 Solr 吸引人,可以看到愈來愈多團體換過去... 而前幾天 Elasticsearch 的官方放出 1.2.0 與 1.1.2 的消息:「elasticsearch 1.2.0 and 1.1.2 released」。 1.2.0 最大的改變是強制使用 Java 7 了,也就是不能在 Ubuntu 12.04 下安裝 default-jre 了,變成要裝 openjdk-7-jre。(要注意,官方建議的是 Oracle 官方的 JDK,而非 OpenJDK) 如果是 Ubuntu 14.04 就沒這個問題。(因為 default-jre 會裝 Java 7) 另外一個大改變是,之前產生安全問題的 dynamic scripting 預設關掉了,也就是 CVE-2014-3120。 目前我的進度只到看完 mapping,但還沒實際開始塞資料進去玩...

May 25, 2014

In "Computer"

a611ee8db44c8d03a20edf0bf5a71d80?s=49&d=identicon&r=gAuthor Gea-Suan LinPosted on March 2, 2023Categories Computer, Murmuring, ProgrammingTags ascii, code, encoding, jdk, openjdk, source, unicode, utf-8, utf8

Leave a Reply

Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Website

Notify me of follow-up comments by email.

Notify me of new posts by email.

To respond on your own website, enter the URL of your response which should contain a link to this post's permalink URL. Your response will then appear (possibly after moderation) on this page. Want to update or remove your response? Update or delete your post and re-enter your post's URL again. (Learn More)

Post navigation


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK