Github Tracking Issue for `partition_point` · Issue #73831 · rust-lang/rust · Gi...
source link: https://github.com/rust-lang/rust/issues/73831
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Feature gate: #![feature(partition_point)]
This is a tracking issue for [T]::partition_point
.
See also rust-lang/rfcs#2184
Public API
impl<T> [T] { pub fn partition_point<P>(&self, mut pred: P) -> usize where P: FnMut(&T) -> bool; }
Steps / History
- Implementation: #73577
- Final commenting period (FCP): #73831 (comment)
- Stabilization PR: #81012
-
Update documentation of
binary_search
(and/orbinary_search_by
) to point out the existence of this function too.
Unresolved Questions
-
Is there a better name?
partition_point
matches C++, but might be hard to recognize as a binary search.
Contributor
timvermeulen commented on Jul 13, 2020
In order to deduplicate this code with binary_search_by
's implementation, I think we can simply implement partition_point
in terms of binary_search_by
like so:
use crate::cmp::Ordering::{Less, Greater}; impl<T> [T] { pub fn partition_point<P>(&self, mut pred: P) -> usize where P: FnMut(&T) -> bool, { self.binary_search_by(|x| if pred(x) { Less } else { Greater }) .unwrap_or_else(|i| i) } }
Contributor
Author
VillSnow commented on Jul 13, 2020
I prefer my partition_point
code than current binary_search_by
code.
I found current binary_search_by
has a problem that it usually calls one extra comparison.
I tried
- previous
binary_search_by
logic (=logic1), - current
binary_search_by
logic (=logic2), and partition_point
logic (=logic3).
In addition, I also tried to
- remove jump from
partition_point
(logic4).
I'm not sure succeeded to remove, or possibly already done from current code.
https://github.com/VillSnow/compare-binary-searches/blob/master/src/main.rs
You can find logic1, 3, 4 need 3 comparisons to find from 0..7, while logic2 needs 4 calls.
I think this causes significant cost more than optimization trick if users can pass their own predictions.
Here is a typical result of bench. Find 0 from 0..65536 and find 0 from 7 but inserted sleep(1ms) in comparison.
Logic2 is much slower than others in case of custom predication.
logic [i32; 65536] [Slow; 7] logic1 30ns 5.4ms logic2 9ns 7.2ms logic3 10ns 5.6ms logic4 10ns 5.6ms
Contributor
timvermeulen commented on Jul 13, 2020
I found current
binary_search_by
has a problem that it usually calls one extra comparison.
I believe this is intended, it is not optimized for doing the minimum number of comparisons. Make sure to run the benchmarks in https://github.com/rust-lang/rust/blob/master/src/libcore/benches/slice.rs before drawing any conclusions.
And regardless of what the ideal implementation of binary_search_by
looks like, partition_point
should probably still be implemented in terms of it in the way I outlined above
Contributor
Author
VillSnow commented on Jul 14, 2020
I found current
binary_search_by
has a problem that it usually calls one extra comparison.I believe this is intended, it is not optimized for doing the minimum number of comparisons. Make sure to run the benchmarks in https://github.com/rust-lang/rust/blob/master/src/libcore/benches/slice.rs before drawing any conclusions.
AFAIK, this bench is written when logic1 is replaced into logic2, not after logic3. The bench tries to find random element from usize slice, not user type which needs large cost to compare.
If it does not optimize for minimum number of comparisons, itself is a problem, I think.
If user need 1ns faster binary search, they can write by themselves or use 3rd party crate, as well as HashMap
.
And regardless of what the ideal implementation of
binary_search_by
looks like,partition_point
should probably still be implemented in terms of it in the way I outlined above
Is it means partition_point
should call binary_search_by
instead of binary_search_by
calls partition_point
?
I agree because binary_search_by has more information and the cost would be small.
I noticed in passing that this is using FnMut
. That seems strange to me, since there's no guarantee which items will be looked at or in which order by this API, so actually mutating in the closure would seem an anti-pattern. Should it maybe use Fn
instead? (People could always use interior mutability if they really really really want to update things in the closure, but this would discourage it.)
Though I guess sort_by
takes FnMut
...
Contributor
Author
VillSnow commented on Aug 27, 2020
@scottmcm memoization or exterior (web/DB) API access using mut handle?
If the team think all right, I'm glad.
I still concern about code duplication, and I'm watching this PR #74024.
However, how predicate is called is not documented, so even if the algorithm was changed, it never change the stable branch.
I have no objection to merge this.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
Member
m-ou-se commented 29 days ago
Moved from #81012:
@dtolnay's comment (#81012 (comment)):
This method finds the start of the range of elements where pred=false, assuming all elements for which pred=true come before all elements for which pred=false, via binary search.
impl<T> [T] { pub fn partition_point<P>(&self, pred: P) -> usize where P: FnMut(&T) -> bool; }This is a special case of binary_search_by, but reasonably common and not all that clear expressed using binary_search_by.
slice.partition_point(pred) // equivalent to slice.binary_search_by(|x| if pred(x) { Less } else { Greater }).unwrap_err()The polarity choice for whether false or true elements go first is consistent with the stable Iterator::partition (https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.partition), with STL's std::partition_point in C++ (https://en.cppreference.com/w/cpp/algorithm/partition_point), and with the unstable Iterator::is_partitioned (#62544, https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.is_partitioned).
@rfcbot fcp merge
rfcbot commented 29 days ago •
Team member @m-ou-se has proposed to merge this. The next step is review by the rest of the tagged team members:
Concerns:
- name resolved by #73831 (comment)
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
Member
m-ou-se commented 29 days ago
@rfcbot concern name
I was completly unaware we even had this function, and even suggested a few times we might want to add something like this. I guess because the name doesn't remind me of a binary search. This at least means that we should update the documentation of binary_search
to point out the existence of this function, and probably add a doc(alias)
to this function too. But we might want to consider another name that more people would recognize as this type of binary search.
Member
BurntSushi commented 29 days ago
I second @m-ou-se's naming concern. My first reaction was, "what the heck is a partition point?"
I think my preference would be to have "binary search" or at least "search" in the name somewhere. A bit verbose, but maybe binary_search_by_pred
? Not sure.
Contributor
Author
VillSnow commented 28 days ago
The first reason why this is partition_point
is that it borrows C++'s name.
Otherwise, this example shows what it gets is the partition point.
#![feature(partition_point)] use std::iter::Iterator; fn main() { let mut s = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4]; let a = itertools::partition(&mut s, |x| *x > 5); for (i, x) in s.iter().enumerate() { print!("{}{}", if i == a { "|" } else { " " }, x); } // 8 7 8 8 8|2 1 1 2 2 4 let b = s.partition_point(|x| *x > 5); assert_eq!(a, b); // pass }
If the method named binary_search_by_pred
, I expect that it returns any index where the predicate returns true
, while this methods returns first index where returns false
.
partition_point: The idea of partition is uncommon.
binary_search_by: In most of cases very annoying to write predicate.
binary_search_by_pred, and flip predicate: Causes confusion for C++ programmers
Contributor
KodrAus commented 23 days ago
If I remember correctly, we had a long bikeshed over the name and ended up following C++ as a tie-breaker.
partition_point: The idea of partition is uncommon.
FWIW, we've had Iterator::partition
since 1.0.
Barring something obviously better (which doesn't seem forthcoming), using the C++ name seems reasonable. A documentation update to crosslink this from binary_search
definitely sounds good, though.
(Hopefully this method can be resolved without a broader binary search conversation, but if not then I think it should include some sort of plan for equal_range
and such, cc rust-lang/rfcs#2184)
Member
matklad commented 23 days ago
Hopefully this method can be resolved without a broader binary search conversation, but if not then I think it should include some sort of plan for equal_range and such
FWIW, my secret plan is to just submit a pr implementing equal_range
after this is stabilized. Unlike lower/upper bound, it doesn't depend on exclusive/inclusive open question, and hopefully can be shipped without to much deliberation :)
Member
m-ou-se commented 22 days ago
Sounds like this is the best name we can come up with.
So, let's make sure the documentation of the binary_search
functions point to partition_point
as an alternative, and maybe add a doc(alias)
for it as well.
If anybody feels like updating the documentation here: PRs welcome. ^^
Let's kick off the FCP for stabilizing this:
@rfcbot resolve name
rfcbot commented 22 days ago
This is now entering its final comment period, as per the review above.
Contributor
Author
VillSnow commented 21 days ago
I don't know much about doc(alias)
, but does it show this?
It looks like to make a flow from other names. For example, some languages has reduce
, so type 'reduce' at the search box, then hit Iterator::fold
.
However, Rust already has binary_search
for who want to find any item by value. It seems a bit different usage to suggest similar methods.
rfcbot commented 12 days ago
The final comment period, with a disposition to merge, as per the review above, is now complete.
As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.
The RFC will be merged soon.
Member
matklad commented 6 days ago
Contributor
Author
VillSnow commented 5 days ago
Thanks for much assistance.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK