1

Reject octal zeros in IPv4 addresses by Smittyvb · Pull Request #86984 · rust-la...

 2 years ago
source link: https://github.com/rust-lang/rust/pull/86984
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Copy link

Contributor

Smittyvb commented on Jul 8

This fixes #86964 by rejecting octal zeros in IP addresses, such that 192.168.00.00000000 is rejected with a parse error, since having leading zeros in front of another zero indicates it is a zero written in octal notation, which is not allowed in the strict mode specified by RFC 6943 3.1.1. Octal rejection was implemented in #83652, but due to the way it was implemented octal zeros were still allowed.

Copy link

Collaborator

rust-highfive commented on Jul 8

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

Copy link

Member

cuviper commented on Jul 8

I understand we're currently a bit "loose" here, but is there any practical difference? It's a numeric 0 either way you look at it. So with your change, we would stop accepting octal zeros just for the sake of strictness?

Copy link

zacknewman commented on Jul 8

edited

The documentation for std::net::Ipv4Addr pretty clearly states that the octets are in "decimal notation"; thus I personally don't even understand why the octet 001 is rejected since that is a perfectly reasonable base-10 representation of the integer 1. While that octet can be interpreted in any numeral system (including base-8), the octet 127 can also be interpreted in any numeral system starting from base-8. To me, this is inconsistent and very much confusing. It's like the parser says "127 can be in base-8, base-9, base-10, etc.; but we will default to interpreting it as base-10"; but for some reason, the parser doesn't apply that same logic to the octet 001.

The documentation also clearly cites IETF RFC 6943, but Section 3.1.1. in that RFC defines the "strict" form—the form that forbids other numeral systems (e.g., base-8)—as 4 octets in decimal notation separated by a "." such that each octet is between one and three digits. The current behavior doesn't conform to any obvious standard.

Apparently other languages default to interpreting any integer with padded 0s as base-8, but that is not true in most areas of Rust except here (e.g., assert_eq!("010".parse::<u32>().unwrap(), 10u32);).

To me if for some reason we want to forbid 0-padded octets, then it should be applied consistently for all integers not all integers except 0.

Copy link

Member

cuviper commented on Jul 9

AIUI, it was defined in POSIX that a leading 0 in an IPv4 address should indicate octal. Rust was always parsing as decimal, but this could lead to bugs, even security issues, if you have system that combines Rust and POSIX ideas of a given input address. You might validate "010" in one mode and then use it in the other with a different interpretation. The Rust parser was changed to reject octal to avoid this confusion altogether.

You could debate whether "0" alone really has a leading zero to make it octal rather than decimal, but it usually doesn't matter. Longer strings of zeros are more plausibly octal, I guess, but it still seems pedantic to reject it.

Copy link

zacknewman commented on Jul 9

edited

Hm, maybe this is yet another area where my math background is biting me in the butt because it's rather foreign to me to treat leading 0s as "more likely" to be base-8 than any other base since all numeral systems are based on the same fundamental algorithm. Seems overly restrictive to assume leading 0s means base-8 (with the exception of 0 apparently) too since clearly something like 008 is not base-8 due to the 8 not being a symbol in the base-8 numeral system; and since the documentation states each octet is in base-10, the problem of "how to interpret each octet" is already answered: base-10. I'm clearly in the minority with this point-of-view though, so I'll drop it.

Copy link

Member

cuviper commented on Jul 9

Octal 0... goes back to C, at least: https://en.cppreference.com/w/c/language/integer_constant

  • decimal-constant is a non-zero decimal digit (1, 2, 3, 4, 5, 6, 7, 8, 9), followed by zero or more decimal digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)
  • octal-constant is the digit zero (0) followed by zero or more octal digits (0, 1, 2, 3, 4, 5, 6, 7)

(which does mean a lone 0 is technically octal as well in C, not that it matters)

Copy link

Member

RalfJung commented on Jul 9

edited

The documentation for std::net::Ipv4Addr pretty clearly states that the octets are in "decimal notation"; thus I personally don't even understand why the octet 001 is rejected since that is a perfectly reasonable base-10 representation of the integer 1.

The docs should probably be clarified: the octets are in decimal notation, and leading 0s are disallowed (except, as usual, for a plain 0).

This almost matches current behavior, except for 0000. So with this PR, those docs would be correct. Without this PR, the docs should say: the octets are in decimal notation, and leading 0s are disallowed, except for the number 0 itself which may have an arbitrary amount of leading 0s.

(I don't have an opinion between these two options, but this PR does not update the docs so IMO it is incomplete either way.)

Seems overly restrictive to assume leading 0s means base-8

I'm not disagreeing, but this is what other parts of the ecosystem do. It is much better for Rust to reject 011 in IP addresses than for both Rust and POSIX to accept it and interpret it differently -- would you agree to that? Rust does not exist in a vacuum, we have to take into account the surrounding reality here, and reality often uses a leading 0 to indicate octal notation (and yes this makes no mathematical sense; whoever decided that decades ago didn't ask mathematicians it seems, but without a time machine no amount of good arguments is going to change this). We could probably stretch the limits here and reject 011 but accept 008 (since POSIX could not accept the latter), but that seems even more surprising and weird than just rejecting leading 0s outright.

The docs possibly should say that this is why leading 0s are rejected.

Copy link

zacknewman commented on Jul 9

edited

The docs should probably be clarified: the octets are in decimal notation, and leading 0s are disallowed (except, as usual, for a plain 0).

That is perfectly OK to me. While I would prefer something like 001.001.001.001 to be parseable because that is technically valid base-10 (at least mathematically)—I know my Nintendo Switch requires exactly 3 base-10 digits when typing in the IP—I realize that POSIX compatibility is very important.

I'm not disagreeing, but this is what other parts of the ecosystem do. It is much better for Rust to reject 011 in IP addresses than for both Rust and POSIX to accept it and interpret it differently -- would you agree to that? Rust does not exist in a vacuum, we have to take into account the surrounding reality here, and reality often uses a leading 0 to indicate octal notation (and yes this makes no mathematical sense; whoever decided that decades ago didn't ask mathematicians it seems, but without a time machine no amount of good arguments is going to change this). We could probably stretch the limits here and reject 011 but accept 008 (since POSIX could not accept the latter), but that seems even more surprising and weird than just rejecting leading 0s outright.

I absolutely agree. I was not aware POSIX stated that leading 0s meant base-8 until @cuviper pointed that out (i.e., my "complaint" would have been purely about the documentation and nothing else had I known this). I admit that I am probably in the minority when it comes to my background as a Rustacean. I've never written a line of assembly, C, C++, or kernel code in my life—something I'm not proud of, but c'est la vie—so it very well could be the case that "decimal notation" obviously means "mathematical base-10 with the added restriction that leading 0s are not allowed with the exception of 0 itself which is allowed to have any non-negative integer amount of leading 0s" to the majority of other Rustaceans.

Also want to add that what's considered "pedantry" to one may not be "pedantry" to another. While my 30+ years on this planet has shown me that my brain is wired very differently than most, I don't think this is an example of my brain thinking that differently. Case in point, I have a .CSV file of IPv4 addresses. Each row contains exactly 16 UTF-8 code units—with the last Unicode scalar value being a newline (i.e., U+000A). Based on the documentation, I wrote code that used this parsing algorithm; but of course, I was dismayed when I saw it fail. Ideally when one reads the documentation, they should have a good sense of the invariants and what's supposed to work and not. Obviously it didn't take long for me to change my implementation, but it would have been nice if I knew something like 001.001.001.001 would lead to a parsing failure a priori. Honestly even if two examples were added to the documentation that highlighted something like 001.001.001.001 leading to a parsing failure and 0000000.0.0000.00 leading to a parsing success would be enough for someone like me to deduce what "decimal" notation means in this context.

Copy link

Member

RalfJung commented on Jul 9

edited

so it very well could be the case that "decimal notation" obviously means "mathematical base-10 with the added restriction that leading 0s are not allowed with the exception of 0 itself which is allowed to have any non-negative integer amount of leading 0s" to the majority of other Rustaceans.

I don't think it does. :) Leading 0s being code for "base 8" is a weird C thing that Rust did not do for its own integer literals, but that doesn't help us here. And either way the docs should be understandable for people with lots of different backgrounds, so if it is confusing for some, we should fix that! It's near impossible for one person (whoever writes the docs) to foresee all the ways it can be misleading, which is why documentation feedback and bugreports are extremely valuable. :)

So, we are in full agreement that the docs should document this more clearly. Usually, lading 0s are allowed, as you pointed out in your bugreport.

Copy link

Member

RalfJung commented on Jul 9

While I would prefer something like 001.001.001.001 to be parseable because that is technically valid base-10 (at least mathematically)—I know my Nintendo Switch requires exactly 3 base-10 digits when typing in the IP—I realize that POSIX compatibility is very important.

What I could imagine being useful (but I am not a libs person) is a parsing function that explicitly is not POSIX compatible, and that does interpret 010.010.010.010 entirely in decimal. But that should not be the default parsing function. I am not sure if the Rust stdlib is the right place for this, but it might be worth suggesting.

I am honestly quite shocked that POSIX would interpret this as 8.8.8.8, and I did not know that this is the case... but my ping agrees:

$ ping 010.010.010.010
PING 010.010.010.010 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_seq=1 ttl=118 time=11.5 ms
64 bytes from 8.8.8.8: icmp_seq=2 ttl=118 time=10.4 ms

Copy link

Contributor

Author

Smittyvb commented on Jul 9

edited

I've updated the documentation for Ipv4Addr to be more clear as what octal/hex literals are, and added some examples of which ones are rejected.

Copy link

zacknewman commented on Jul 9

edited

Now this probably is a pedantic request—thus something I am not that passionate about and OK with not doing—but an example like the parsing of 0000.0.0.0 leading to a successful result despite having leading 0s is informative as that result does not obviously follow from the documentation (also has the property that more than 3 digits are allowed despite going against the "strict" form mentioned in RFC 6943 Section 3.1.1.).

Once I know how to do "pull requests", I won't ask others to do such a thing.

Copy link

Contributor

marmeladema commented on Jul 11

Does this require libs team signoff or an fcp or something? This is another breaking change after the one that rejected the octal notation. I've encountered the regression at work and will probably encounter this one too when its available in stable.

That being said, I think the PR does make things more consistent and coherent.

Copy link

Member

Mark-Simulacrum commented on Jul 11

r? @joshtriplett for libs-api(?) examination of this issue. I think it probably merits an FCP, at least...

Copy link

Member

joshtriplett commented 17 days ago

I think we should go ahead with this change.

@rfcbot merge

Copy link

rfcbot commented 17 days ago

edited by m-ou-se

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Copy link

Member

BurntSushi commented 16 days ago

edited

I was personally somewhat on the side of just interpreting octets as decimal, even with a leading zero, POSIX be damned. But one thing that convinced me otherwise was the fact that, well, those leading zeroes are indeed interpreted as indicating octal base in many other places. So if our routine accepts the same IP but gives a different result than others, that could indeed be quite surprising. So I think it's the right decision to fail here. (Or alternatively, just implement octal support.)

(I am not aware of the history of rejecting octal in this routine, so I may just be re-treading old ground, but wanted to mention it since there was some discussion above.)

Copy link

rfcbot commented 15 days ago

bellThis is now entering its final comment period, as per the review above. bell

Copy link

rfcbot commented 5 days ago

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

Copy link

Member

m-ou-se commented 8 hours ago

@bors r+

Copy link

Contributor

bors commented 8 hours ago

pushpin Commit 403d269 has been approved by m-ou-se


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK