4

Idiosyncratic Ruby: US-ASCII-8BIT

 3 years ago
source link: https://idiosyncratic-ruby.com/56-us-ascii-8bit.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

How come that Ruby has two ASCII encodings?

Encoding.name_list.grep(/ASCII/)
# => ["ASCII-8BIT", "US-ASCII"]

Which one is the normal one you should use for ASCII?

Aliases

ASCII-8BIT US-ASCII BINARY ASCII   ANSI_X3.4-1968   646

So, US-ASCII is aliased to ASCII, but then what is ASCII-8BIT for? Encodings' RDoc has some help:

Encoding::ASCII_8BIT is a special encoding that is usually
used for a byte string, not a character string. But as the name insists,
its characters in the range of ASCII are considered as ASCII characters.
This is useful when you use ASCII-8BIT characters with other ASCII
compatible characters.

So basically, it is not a real encoding, but represents an arbitrary stream of bytes (bytes with a value between 0 and 255). It is used for raw byte stream or if you want to make clear that you do not know about a string's encoding!

The ASCII charset only takes 7 bits, so in strict ASCII, the 8th bit should never be set. The allowed byte value range is from 0 to 127. This is what the US-ASCII encoding is all about: It is used when dealing with ASCII encoded strings. Think: "ASCII-7BIT"

A simple example illustrating the difference:

 out_of_ascii_range = 128.chr # => "\x80"
 out_of_ascii_range.force_encoding("US-ASCII").valid_encoding? # => false
 out_of_ascii_range.force_encoding("ASCII-8BIT").valid_encoding? # => true

More Idiosyncratic Ruby


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK