9

Github Tracking Issue for `partition_point` · Issue #73831 · rust-lang/rust · Gi...

 3 years ago
source link: https://github.com/rust-lang/rust/issues/73831
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Contributor

VillSnow commented on Jun 28, 2020

edited by matklad

Feature gate: #![feature(partition_point)]

This is a tracking issue for [T]::partition_point.

See also rust-lang/rfcs#2184

Public API

impl<T> [T] {
    pub fn partition_point<P>(&self, mut pred: P) -> usize
    where
        P: FnMut(&T) -> bool;
}

Steps / History

  • Implementation: #73577
  • Final commenting period (FCP): #73831 (comment)
  • Stabilization PR: #81012
  • Update documentation of binary_search (and/or binary_search_by) to point out the existence of this function too.

Unresolved Questions

  • Is there a better name? partition_point matches C++, but might be hard to recognize as a binary search.

Contributor

timvermeulen commented on Jul 13, 2020

In order to deduplicate this code with binary_search_by's implementation, I think we can simply implement partition_point in terms of binary_search_by like so:

use crate::cmp::Ordering::{Less, Greater};

impl<T> [T] {
    pub fn partition_point<P>(&self, mut pred: P) -> usize
    where
        P: FnMut(&T) -> bool,
    {
        self.binary_search_by(|x| if pred(x) { Less } else { Greater })
            .unwrap_or_else(|i| i)
    }
}

Contributor

Author

VillSnow commented on Jul 13, 2020

@timvermeulen

I prefer my partition_point code than current binary_search_by code.
I found current binary_search_by has a problem that it usually calls one extra comparison.

I tried

  • previous binary_search_by logic (=logic1),
  • current binary_search_by logic (=logic2), and
  • partition_point logic (=logic3).

In addition, I also tried to

  • remove jump from partition_point (logic4).

I'm not sure succeeded to remove, or possibly already done from current code.

https://github.com/VillSnow/compare-binary-searches/blob/master/src/main.rs

You can find logic1, 3, 4 need 3 comparisons to find from 0..7, while logic2 needs 4 calls.
I think this causes significant cost more than optimization trick if users can pass their own predictions.

Here is a typical result of bench. Find 0 from 0..65536 and find 0 from 7 but inserted sleep(1ms) in comparison.
Logic2 is much slower than others in case of custom predication.

logic [i32; 65536] [Slow; 7] logic1 30ns 5.4ms logic2 9ns 7.2ms logic3 10ns 5.6ms logic4 10ns 5.6ms

Contributor

timvermeulen commented on Jul 13, 2020

I found current binary_search_by has a problem that it usually calls one extra comparison.

I believe this is intended, it is not optimized for doing the minimum number of comparisons. Make sure to run the benchmarks in https://github.com/rust-lang/rust/blob/master/src/libcore/benches/slice.rs before drawing any conclusions.

And regardless of what the ideal implementation of binary_search_by looks like, partition_point should probably still be implemented in terms of it in the way I outlined above slightly_smiling_face

Contributor

Author

VillSnow commented on Jul 14, 2020

@timvermeulen

I found current binary_search_by has a problem that it usually calls one extra comparison.

I believe this is intended, it is not optimized for doing the minimum number of comparisons. Make sure to run the benchmarks in https://github.com/rust-lang/rust/blob/master/src/libcore/benches/slice.rs before drawing any conclusions.

AFAIK, this bench is written when logic1 is replaced into logic2, not after logic3. The bench tries to find random element from usize slice, not user type which needs large cost to compare.
If it does not optimize for minimum number of comparisons, itself is a problem, I think.
If user need 1ns faster binary search, they can write by themselves or use 3rd party crate, as well as HashMap.

And regardless of what the ideal implementation of binary_search_by looks like, partition_point should probably still be implemented in terms of it in the way I outlined above slightly_smiling_face

Is it means partition_point should call binary_search_by instead of binary_search_by calls partition_point?
I agree because binary_search_by has more information and the cost would be small.

Member

scottmcm commented on Aug 27, 2020

edited

I noticed in passing that this is using FnMut. That seems strange to me, since there's no guarantee which items will be looked at or in which order by this API, so actually mutating in the closure would seem an anti-pattern. Should it maybe use Fn instead? (People could always use interior mutability if they really really really want to update things in the closure, but this would discourage it.)

Though I guess sort_by takes FnMut...

Contributor

Author

VillSnow commented on Aug 27, 2020

@scottmcm memoization or exterior (web/DB) API access using mut handle?

Member

matklad commented on Jan 10

@VillSnow this has been baking in the nightly for a while, I think this is ready for stabilization. Would like to send a stabilization PR?

Contributor

Author

VillSnow commented on Jan 11

If the team think all right, I'm glad.

I still concern about code duplication, and I'm watching this PR #74024.
However, how predicate is called is not documented, so even if the algorithm was changed, it never change the stable branch.

I have no objection to merge this.

Member

matklad commented on Jan 11

If the team think all right, I'm glad.

Submitting a stabilization PR is a way to figure that out! I am not on the libs team, but this seems stabilization-worthy to me.

I wouldn't worry about impl details -- they can be adjusted after stabilization.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Member

m-ou-se commented 29 days ago

Moved from #81012:

@dtolnay's comment (#81012 (comment)):

@rust-lang/libs

This method finds the start of the range of elements where pred=false, assuming all elements for which pred=true come before all elements for which pred=false, via binary search.

impl<T> [T] {
   pub fn partition_point<P>(&self, pred: P) -> usize
    where
        P: FnMut(&T) -> bool;
}

This is a special case of binary_search_by, but reasonably common and not all that clear expressed using binary_search_by.

slice.partition_point(pred)

// equivalent to
slice.binary_search_by(|x| if pred(x) { Less } else { Greater }).unwrap_err()

The polarity choice for whether false or true elements go first is consistent with the stable Iterator::partition (https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.partition), with STL's std::partition_point in C++ (https://en.cppreference.com/w/cpp/algorithm/partition_point), and with the unstable Iterator::is_partitioned (#62544, https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.is_partitioned).

@rfcbot fcp merge

rfcbot commented 29 days ago

edited

Team member @m-ou-se has proposed to merge this. The next step is review by the rest of the tagged team members:

Concerns:

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Member

m-ou-se commented 29 days ago

@rfcbot concern name

I was completly unaware we even had this function, and even suggested a few times we might want to add something like this. I guess because the name doesn't remind me of a binary search. This at least means that we should update the documentation of binary_search to point out the existence of this function, and probably add a doc(alias) to this function too. But we might want to consider another name that more people would recognize as this type of binary search.

Member

BurntSushi commented 29 days ago

I second @m-ou-se's naming concern. My first reaction was, "what the heck is a partition point?"

I think my preference would be to have "binary search" or at least "search" in the name somewhere. A bit verbose, but maybe binary_search_by_pred? Not sure.

Contributor

Author

VillSnow commented 28 days ago

The first reason why this is partition_point is that it borrows C++'s name.

Otherwise, this example shows what it gets is the partition point.

#![feature(partition_point)]
use std::iter::Iterator;
fn main() {
    let mut s = [2, 7, 1, 8, 2, 8, 1, 8, 2, 8, 4];
    let a = itertools::partition(&mut s, |x| *x > 5);

    for (i, x) in s.iter().enumerate() {
        print!("{}{}", if i == a { "|" } else { " " }, x);
    }
    //  8 7 8 8 8|2 1 1 2 2 4

    let b = s.partition_point(|x| *x > 5);
    assert_eq!(a, b); // pass
}

If the method named binary_search_by_pred, I expect that it returns any index where the predicate returns true, while this methods returns first index where returns false.

partition_point: The idea of partition is uncommon.
binary_search_by: In most of cases very annoying to write predicate.
binary_search_by_pred, and flip predicate: Causes confusion for C++ programmers

Contributor

KodrAus commented 23 days ago

If I remember correctly, we had a long bikeshed over the name and ended up following C++ as a tie-breaker.

Member

scottmcm commented 23 days ago

edited

partition_point: The idea of partition is uncommon.

FWIW, we've had Iterator::partition since 1.0.

Barring something obviously better (which doesn't seem forthcoming), using the C++ name seems reasonable. A documentation update to crosslink this from binary_search definitely sounds good, though.

(Hopefully this method can be resolved without a broader binary search conversation, but if not then I think it should include some sort of plan for equal_range and such, cc rust-lang/rfcs#2184)

Member

matklad commented 23 days ago

Hopefully this method can be resolved without a broader binary search conversation, but if not then I think it should include some sort of plan for equal_range and such

FWIW, my secret plan is to just submit a pr implementing equal_range after this is stabilized. Unlike lower/upper bound, it doesn't depend on exclusive/inclusive open question, and hopefully can be shipped without to much deliberation :)

Member

m-ou-se commented 22 days ago

Sounds like this is the best name we can come up with.

So, let's make sure the documentation of the binary_search functions point to partition_point as an alternative, and maybe add a doc(alias) for it as well.

If anybody feels like updating the documentation here: PRs welcome. ^^

Let's kick off the FCP for stabilizing this:

@rfcbot resolve name

rfcbot commented 22 days ago

bellThis is now entering its final comment period, as per the review above. bell

Contributor

Author

VillSnow commented 21 days ago

I don't know much about doc(alias), but does it show this?

It looks like to make a flow from other names. For example, some languages has reduce, so type 'reduce' at the search box, then hit Iterator::fold.
However, Rust already has binary_search for who want to find any item by value. It seems a bit different usage to suggest similar methods.

rfcbot commented 12 days ago

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

Member

matklad commented 6 days ago

Closed by #81012

Thanks @VillSnow for bringing this all the way from an idea to the stabilization!

Contributor

Author

VillSnow commented 5 days ago

Thanks for much assistance.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK