[Golang] Determine Encoding of HTML Document
source link: https://siongui.github.io/2018/10/26/determine-encoding-of-html-document-in-go/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
[Golang] Determine Encoding of HTML Document
October 26, 2018Given an URL, determine the encoding of the HTML document in Go using golang.org/x/net/html and golang.org/x/text packages. I came across the code snippet from [1], so I extract and re-organize the content to make it search engine friendly.
Install the packages first:
$ go get -u golang.org/x/text $ go get -u golang.org/x/net/html
The following code shows how to determine the encoding of an HTML document given the URL:
url.go | repository | view rawpackage guess import ( "bufio" "fmt" "io" "net/http" "golang.org/x/net/html/charset" "golang.org/x/text/encoding" ) func UrlEncoding(url string) (name string, certain bool, err error) { resp, err := http.Get(url) if err != nil { return } defer resp.Body.Close() if resp.StatusCode != http.StatusOK { err = fmt.Errorf("response status code: %d", resp.StatusCode) return } _, name, certain, err = DetermineEncodingFromReader(resp.Body) return } func DetermineEncodingFromReader(r io.Reader) (e encoding.Encoding, name string, certain bool, err error) { bytes, err := bufio.NewReader(r).Peek(1024) if err != nil { return } e, name, certain = charset.DetermineEncoding(bytes, "") return }
Usage of the above code:
url_test.go | repository | view rawpackage guess import ( "testing" ) func TestUrlEncoding(t *testing.T) { name, _, err := UrlEncoding("http://shenfang.com.tw/") if err != nil { t.Error(err) return } if name != "big5" { t.Error("bad guess!") return } name, _, err = UrlEncoding("https://siongui.github.io/") if err != nil { t.Error(err) return } if name != "utf-8" { t.Error("bad guess!") return } }
If you want to convert the non-utf8 encoded HTML to utf8, see [3].
Tested on: Ubuntu 18.04, Go 1.11.1
References:
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK