12

Github Stabilize `impl From<[(K, V); N]> for HashMap` (and friends) by bst...

 3 years ago
source link: https://github.com/rust-lang/rust/pull/84111
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Copy link

Contributor

bstrie commented 19 days ago

In addition to allowing HashMap to participate in Into/From conversion, this adds the long-requested ability to use constructor-like syntax for initializing a HashMap:

let map = HashMap::from([
    (1, 2),
    (3, 4),
    (5, 6)
]);

This addition is highly motivated by existing precedence, e.g. it is already possible to similarly construct a Vec from a fixed-size array:

let vec = Vec::from([1, 2, 3]);

...and it is already possible to collect a Vec of tuples into a HashMap (and vice-versa):

let vec = Vec::from([(1, 2)]);
let map: HashMap<_, _> = vec.into_iter().collect();
let vec: Vec<(_, _)> = map.into_iter().collect();

...and of course it is likewise possible to collect a fixed-size array of tuples into a HashMap (but not vice-versa just yet):

let arr = [(1, 2)];
let map: HashMap<_, _> = std::array::IntoIter::new(arr).collect();

Therefore this addition seems like a no-brainer.

As for any impl, this would be insta-stable.

Copy link

Collaborator

rust-highfive commented 19 days ago

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

Copy link

Contributor

Author

bstrie commented 18 days ago

edited

Note that some HashMap impls are defined for all S: BuildHasher, others for only S = RandomState. The latter seemed most appropriate here (to match HashMap::new and HashMap::with_capacity), but if that seems too restrictive then let me know.

Copy link

Contributor

workingjubilee commented 18 days ago

edited

Note that some HashMap impls are defined for all S: BuildHasher, others for only S = RandomState. The latter seemed most appropriate here (to match HashMap::new and HashMap::with_capacity), but if that seems too restrictive then let me know.

If I remember the interaction between impls and stability, it would be forward compatible with a later generalization to all S: BuildHasher, correct?

Copy link

Contributor

Author

bstrie commented 18 days ago

You would also need to change the signature itself so that it could accept a hasher: HashMap::from(([(1, 2)], hasher)). The ergonomic hit here seems so severe that it seems hardly worth it. However, I mention it on the off-chance that somebody out there really, really likes to provide their own hasher (though I suggest that a new inherent fn like HashMap::from_with_hasher might be more appropriate if they just want the constructor syntax).

Copy link

Contributor

Author

bstrie commented 18 days ago

edited

Also, note that this matches the semantics of vec.into_iter().collect() in that duplicate keys in the input array are not any kind of error, and will result in the value of the last of the duplicate keys in the array being present in the map.

This also reinforces the RandomState choice, since collect doesn't worry about giving users the ability to specify a hasher.

Copy link

Contributor

Author

bstrie commented 18 days ago

edited

Speaking of collect, I just took a look at the implementation of FromIterator and it uses map.extend rather than iterating manually; is that preferred? It occurs to me that I could also just use collect here. The for loop seems like it might be the most straightforward to optimize, but let me know if I'm mistaken; I'm not good enough at assembly to compare the outputs of each approach.

Copy link

Contributor

fbstj commented 18 days ago

edited

is what happens on duplicate keys the expected behaviour?

Copy link

leonardo-m commented 18 days ago

edited

this adds the long-requested ability to use constructor-like syntax for initializing a HashMap:

Perhaps in Rust such constructor syntax should give a compile-time error if two keys are equal inside the literal.

Copy link

Contributor

Author

bstrie commented 18 days ago

edited

@fbstj, the semantics here match what happens when you collect a Vec of tuples into a HashMap: elements are inserted in iteration order, so any duplicate keys will result in the value of the last duplicate key in the input.

@leonardo-m , if Rust ever had first-class HashMap literal syntax then that could be useful, but this particular API is about converting an array into a HashMap, and precedence says that duplicate keys are not considered an error. I imagine there could be a clippy lint if this syntax becomes a popular way of constructing HashMaps.

r? @joshtriplett for T-libs FCP

I don't think determining whether we use for/extend/collect matters too much for this PR, and they all look about equivalent assembly wise (as might be expected/hoped).

Copy link

rfcbot commented 18 days ago

edited

Team member @joshtriplett has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Copy link

Contributor

Amanieu commented 18 days ago

For consistency, shouldn't we also implement similar From impls for BTreeMap, BTreeSet, HashSet, VecDeque, BinaryHeap, etc? Impls already exist for Vec and Box<[T]>.

@rfcbot concern consistency

Copy link

Member

joshtriplett commented 18 days ago

I absolutely think we should. Do you feel we need to put in all of those impls simultaneously, blocking this one on the rest?

Copy link

Contributor

Amanieu commented 18 days ago

We might as well, I don't think it makes sense to only add this impl for some collections but not others.

Copy link

Member

sfackler commented 18 days ago

Agree we should add impls to all of our collection types, and I think we may as well just do it all at the same time. Should be a pretty quick copy/paste job.

/// ```

fn from(arr: [(K, V); N]) -> Self {

let mut map = HashMap::with_capacity(N);

for (k, v) in crate::array::IntoIter::new(arr) {

sfackler 18 days ago

Member

I'd just delegate to collect here, which is where all of our specialization-based optimizations tend to live.

bstrie 18 days ago

Author

Contributor

Sure thing. Some cursory examination of the assembly revealed that collect was doing "something" a bit differently than the for or extend implementations, but I couldn't tell if it was either better or worse.

Copy link

Contributor

Author

bstrie commented 18 days ago

For consistency, shouldn't we also implement similar From impls for BTreeMap, BTreeSet, HashSet, VecDeque, BinaryHeap, etc? Impls already exist for Vec and Box<[T]>.

Sure, I'll add that to this PR.

Copy link

Member

cuviper commented 17 days ago

You would also need to change the signature itself so that it could accept a hasher: HashMap::from(([(1, 2)], hasher)).

You could implicitly default the hasher, just as FromIterator uses S: BuildHasher + Default.

Copy link

Member

cuviper commented 17 days ago

Or if we end up adding FromIterator to the prelude, and we figure out IntoIterator for arrays (#65819 / #84147), then HashMap::from_iter([...]) will just work.

Copy link

Member

scottmcm commented 15 days ago

edited

Pondering some options here:

With #84147 in FCP merge, we'll soon have [T; N]: IntoIterator.

And adding FromIterator to the prelude has the "seems reasonable" from libs in rust-lang/rfcs#3090 (comment).

It seems to me that the combination of those two would mean that this would work:

let map = HashMap::from_iterator([
    (1, 2),
    (3, 4),
    (5, 6)
]);

And ditto for BinaryHeap::from_iterator([1, 5, 9, 3, 7]) and such.

At which point it's not so obvious to me that offering it with from as well is necessarily worth it.

I think that "collect intuition" instead of "from intuition" might be a better fit here.

Relatedly, how often is it expected that this would be used with array literals vs existing array values?

Between all of this I'd personally be tempted to wait on this change for a few releases to see how the other things affect stuff instead. Or experiment with a macro -- that can be unstable -- for a bit first.

EDIT: Looks like @cuviper said basically this earlier (#84111 (comment)), but nobody responded to it.

Copy link

Member

cuviper commented 15 days ago

edited

From has the advantage that it's already in the prelude, but the disadvantage that every collection (std and external) will have to implement it explicitly for arrays.

But once FromIterator is in the prelude and we normalize C::from_iter(array), then most collections will already have implemented FromIterator to support this pattern.

Copy link

Contributor

Author

bstrie commented 13 days ago

@scottmcm and @cuviper , sadly from_iter isn't going to quite work in precisely the way that you expect it to. Consider what happens when you adapt the example to compile with a type that already implements IntoIterator:

use std::collections::HashMap;
use std::iter::FromIterator;

fn main() {
    let map = HashMap::from_iter(vec![
        (1, 2),
        (3, 4),
        (5, 6)
    ]);
}

Output:

error[E0282]: type annotations needed for `HashMap<i32, i32, S>`
 --> src/main.rs:5:15
  |
5 |     let map = HashMap::from_iter(vec![
  |         ---   ^^^^^^^^^^^^^^^^^^ cannot infer type for type parameter `S` declared on the struct `HashMap`
  |         |
  |         consider giving `map` the explicit type `HashMap<i32, i32, S>`, where the type parameter `S` is specified

The fix requires the following:

    let map: HashMap<_, _> = HashMap::from_iter(vec![
        (1, 2),
        (3, 4),
        (5, 6)
    ]);

Frustratingly, Rust doesn't even require actually using a type hole _ for the S parameter; it's clearly capable of inferring a type when S isn't given, but not when S isn't isn't isn't given.

The reason that this wasn't a problem for the FromIterator impl is because collect already always requires an explicit type somewhere, in order to infer a HashMap in the first place.

This is why both HashMap::new and HashMap::with_capacity aren't generic over S. If they were generic over S, then a simple let x = HashMap::new(); (a construction that the documentation and Rust in general is extremely proud of, the very poster child of contrast versus C++'s type inference) would be impossible.

Thus this PR follows the lead of HashMap::{new, with_capacity} and is not generic over the hasher. If Rust's type inference is fixed, or if default type parameters ever work in impl position (and additionally are allowed to compose with const generics), then this could be backward-compatibly generalized. This is intended to serve as a convenient constructor, after all, and the point is defeated if it isn't actually convenient. And even if from_iter didn't require a type annotation, and even if FromIterator were in the prelude, I would still prefer HashMap::from, both for brevity and for conceptual consistency; arrays, not iterators, are Rust's fundamental homogenous collection type. And as the fundamental homogenous collection type, I find it natural that all the homogenous collections in the stdlib should be convertible from it.

Copy link

Member

joshtriplett commented 9 days ago

@rfcbot concern documentation-fixes

This comment has been hidden.

bstrie

changed the title Stabilize impl From<[(K, V); N]> for HashMap

Stabilize impl From<[(K, V); N]> for HashMap (and friends)

7 days ago

Copy link

Member

cramertj commented 7 days ago

@bstrie

Frustratingly, Rust doesn't even require actually using a type hole _ for the S parameter; it's clearly capable of inferring a type when S isn't given, but not when S isn't isn't isn't given.

Slight point of clarification: there's actually no inference happening here, just syntactic default-type-parameter insertion. That is, when you type HashMap and provide only two type parameters, the third isn't inferred, it's automatically inserted. This may seem like a subtle distinction, but it means that HashMap<K, V> is exactly equivalent to writing HashMap<K, V, RandomState> today, rather than falling back to RandomState if the S parameter cannot be inferred from context.

Copy link

Contributor

Author

bstrie commented 5 days ago

edited

@cramertj Yes, since then I've asked in #lang on Zulip and this appears to be intended behavior, though I still find it surprising. I would expect Rust to first attempt type inference before falling back to the defaults (which would allow HashMap<K, V> to be trivially desugared to HashMap<K, V, _>); some ancient issues indicate that this was vaguely planned (with talk of generalizing HashMap::{new, with_capacity}), but was never pursued. Might be useful to have a new tracking issue or RFC for that behavior.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK