Github Stabilize `impl From<[(K, V); N]> for HashMap` (and friends) by bst...
source link: https://github.com/rust-lang/rust/pull/84111
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
In addition to allowing HashMap to participate in Into/From conversion, this adds the long-requested ability to use constructor-like syntax for initializing a HashMap:
let map = HashMap::from([ (1, 2), (3, 4), (5, 6) ]);
This addition is highly motivated by existing precedence, e.g. it is already possible to similarly construct a Vec from a fixed-size array:
let vec = Vec::from([1, 2, 3]);
...and it is already possible to collect a Vec of tuples into a HashMap (and vice-versa):
let vec = Vec::from([(1, 2)]); let map: HashMap<_, _> = vec.into_iter().collect(); let vec: Vec<(_, _)> = map.into_iter().collect();
...and of course it is likewise possible to collect a fixed-size array of tuples into a HashMap (but not vice-versa just yet):
let arr = [(1, 2)]; let map: HashMap<_, _> = std::array::IntoIter::new(arr).collect();
Therefore this addition seems like a no-brainer.
As for any impl, this would be insta-stable.
(rust-highfive has picked a reviewer for you, use r? to override)
Note that some HashMap impls are defined for all S: BuildHasher
, others for only S = RandomState
. The latter seemed most appropriate here (to match HashMap::new
and HashMap::with_capacity
), but if that seems too restrictive then let me know.
Note that some HashMap impls are defined for all
S: BuildHasher
, others for onlyS = RandomState
. The latter seemed most appropriate here (to matchHashMap::new
andHashMap::with_capacity
), but if that seems too restrictive then let me know.
If I remember the interaction between impls and stability, it would be forward compatible with a later generalization to all S: BuildHasher, correct?
You would also need to change the signature itself so that it could accept a hasher: HashMap::from(([(1, 2)], hasher))
. The ergonomic hit here seems so severe that it seems hardly worth it. However, I mention it on the off-chance that somebody out there really, really likes to provide their own hasher (though I suggest that a new inherent fn like HashMap::from_with_hasher
might be more appropriate if they just want the constructor syntax).
Also, note that this matches the semantics of vec.into_iter().collect()
in that duplicate keys in the input array are not any kind of error, and will result in the value of the last of the duplicate keys in the array being present in the map.
This also reinforces the RandomState
choice, since collect
doesn't worry about giving users the ability to specify a hasher.
Speaking of collect
, I just took a look at the implementation of FromIterator
and it uses map.extend
rather than iterating manually; is that preferred? It occurs to me that I could also just use collect
here. The for
loop seems like it might be the most straightforward to optimize, but let me know if I'm mistaken; I'm not good enough at assembly to compare the outputs of each approach.
is what happens on duplicate keys the expected behaviour?
this adds the long-requested ability to use constructor-like syntax for initializing a HashMap:
Perhaps in Rust such constructor syntax should give a compile-time error if two keys are equal inside the literal.
@fbstj, the semantics here match what happens when you collect a Vec of tuples into a HashMap: elements are inserted in iteration order, so any duplicate keys will result in the value of the last duplicate key in the input.
@leonardo-m , if Rust ever had first-class HashMap literal syntax then that could be useful, but this particular API is about converting an array into a HashMap, and precedence says that duplicate keys are not considered an error. I imagine there could be a clippy lint if this syntax becomes a popular way of constructing HashMaps.
r? @joshtriplett for T-libs FCP
I don't think determining whether we use for/extend/collect matters too much for this PR, and they all look about equivalent assembly wise (as might be expected/hoped).
Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:
Concerns:
- consistency resolved by #84111 (comment)
- documentation-fixes (#84111 (comment))
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
For consistency, shouldn't we also implement similar From
impls for BTreeMap
, BTreeSet
, HashSet
, VecDeque
, BinaryHeap
, etc? Impls already exist for Vec
and Box<[T]>
.
@rfcbot concern consistency
I absolutely think we should. Do you feel we need to put in all of those impls simultaneously, blocking this one on the rest?
We might as well, I don't think it makes sense to only add this impl for some collections but not others.
Agree we should add impls to all of our collection types, and I think we may as well just do it all at the same time. Should be a pretty quick copy/paste job.
/// ```
fn from(arr: [(K, V); N]) -> Self {
let mut map = HashMap::with_capacity(N);
for (k, v) in crate::array::IntoIter::new(arr) {
sfackler 18 days ago
Member
I'd just delegate to collect here, which is where all of our specialization-based optimizations tend to live.
bstrie 18 days ago
Author
Contributor
Sure thing. Some cursory examination of the assembly revealed that collect was doing "something" a bit differently than the for
or extend
implementations, but I couldn't tell if it was either better or worse.
For consistency, shouldn't we also implement similar From impls for BTreeMap, BTreeSet, HashSet, VecDeque, BinaryHeap, etc? Impls already exist for Vec and Box<[T]>.
Sure, I'll add that to this PR.
You would also need to change the signature itself so that it could accept a hasher:
HashMap::from(([(1, 2)], hasher))
.
You could implicitly default the hasher, just as FromIterator
uses S: BuildHasher + Default
.
Pondering some options here:
With #84147 in FCP merge, we'll soon have [T; N]: IntoIterator
.
And adding FromIterator
to the prelude has the "seems reasonable" from libs in rust-lang/rfcs#3090 (comment).
It seems to me that the combination of those two would mean that this would work:
let map = HashMap::from_iterator([ (1, 2), (3, 4), (5, 6) ]);
And ditto for BinaryHeap::from_iterator([1, 5, 9, 3, 7])
and such.
At which point it's not so obvious to me that offering it with from
as well is necessarily worth it.
I think that "collect
intuition" instead of "from
intuition" might be a better fit here.
Relatedly, how often is it expected that this would be used with array literals vs existing array values?
Between all of this I'd personally be tempted to wait on this change for a few releases to see how the other things affect stuff instead. Or experiment with a macro -- that can be unstable
-- for a bit first.
EDIT: Looks like @cuviper said basically this earlier (#84111 (comment)), but nobody responded to it.
From
has the advantage that it's already in the prelude, but the disadvantage that every collection (std and external) will have to implement it explicitly for arrays.
But once FromIterator
is in the prelude and we normalize C::from_iter(array)
, then most collections will already have implemented FromIterator
to support this pattern.
@scottmcm and @cuviper , sadly from_iter
isn't going to quite work in precisely the way that you expect it to. Consider what happens when you adapt the example to compile with a type that already implements IntoIterator
:
use std::collections::HashMap; use std::iter::FromIterator; fn main() { let map = HashMap::from_iter(vec![ (1, 2), (3, 4), (5, 6) ]); }
Output:
error[E0282]: type annotations needed for `HashMap<i32, i32, S>`
--> src/main.rs:5:15
|
5 | let map = HashMap::from_iter(vec![
| --- ^^^^^^^^^^^^^^^^^^ cannot infer type for type parameter `S` declared on the struct `HashMap`
| |
| consider giving `map` the explicit type `HashMap<i32, i32, S>`, where the type parameter `S` is specified
The fix requires the following:
let map: HashMap<_, _> = HashMap::from_iter(vec![ (1, 2), (3, 4), (5, 6) ]);
Frustratingly, Rust doesn't even require actually using a type hole _
for the S
parameter; it's clearly capable of inferring a type when S
isn't given, but not when S
isn't isn't isn't given.
The reason that this wasn't a problem for the FromIterator
impl is because collect
already always requires an explicit type somewhere, in order to infer a HashMap
in the first place.
This is why both HashMap::new
and HashMap::with_capacity
aren't generic over S
. If they were generic over S
, then a simple let x = HashMap::new();
(a construction that the documentation and Rust in general is extremely proud of, the very poster child of contrast versus C++'s type inference) would be impossible.
Thus this PR follows the lead of HashMap::{new, with_capacity}
and is not generic over the hasher. If Rust's type inference is fixed, or if default type parameters ever work in impl position (and additionally are allowed to compose with const generics), then this could be backward-compatibly generalized. This is intended to serve as a convenient constructor, after all, and the point is defeated if it isn't actually convenient. And even if from_iter
didn't require a type annotation, and even if FromIterator
were in the prelude, I would still prefer HashMap::from
, both for brevity and for conceptual consistency; arrays, not iterators, are Rust's fundamental homogenous collection type. And as the fundamental homogenous collection type, I find it natural that all the homogenous collections in the stdlib should be convertible from it.
@rfcbot concern documentation-fixes
This comment has been hidden.
changed the title
Stabilize impl From<[(K, V); N]> for HashMap
Stabilize impl From<[(K, V); N]> for HashMap
(and friends)
Frustratingly, Rust doesn't even require actually using a type hole _ for the S parameter; it's clearly capable of inferring a type when S isn't given, but not when S isn't isn't isn't given.
Slight point of clarification: there's actually no inference happening here, just syntactic default-type-parameter insertion. That is, when you type HashMap
and provide only two type parameters, the third isn't inferred, it's automatically inserted. This may seem like a subtle distinction, but it means that HashMap<K, V>
is exactly equivalent to writing HashMap<K, V, RandomState>
today, rather than falling back to RandomState
if the S
parameter cannot be inferred from context.
@cramertj Yes, since then I've asked in #lang on Zulip and this appears to be intended behavior, though I still find it surprising. I would expect Rust to first attempt type inference before falling back to the defaults (which would allow HashMap<K, V>
to be trivially desugared to HashMap<K, V, _>
); some ancient issues indicate that this was vaguely planned (with talk of generalizing HashMap::{new, with_capacity}
), but was never pursued. Might be useful to have a new tracking issue or RFC for that behavior.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK