1

SQL Server: a clustering issue that annoys me

 3 years ago
source link: https://www.codesd.com/item/sql-server-a-clustering-issue-that-annoys-me.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

SQL Server: a clustering issue that annoys me

advertisements

I've been working with SQL Server for the better part of a decade, and this grouping (or partitioning, or ranking...I'm not sure what the answer is!) one has me stumped. Feels like it should be an easy one, too. I'll generalize my problem:

Let's say I have 3 employees (don't worry about them quitting or anything...there's always 3), and I keep up with how I distribute their salaries on a monthly basis.

Month   Employee  PercentOfTotal
--------------------------------
1       Alice     25%
1       Barbara   65%
1       Claire    10%

2       Alice     25%
2       Barbara   50%
2       Claire    25%

3       Alice     25%
3       Barbara   65%
3       Claire    10%

As you can see, I've paid them the same percent in Months 1 and 3, but in Month 2, I've given Alice the same 25%, but Barbara got 50% and Claire got 25%.

What I want to know is all the distinct distributions I've ever given. In this case there would be two -- one for months 1 and 3, and one for month 2.

I'd expect the results to look something like this (NOTE: the ID, or sequencer, or whatever, doesn't matter)

ID      Employee  PercentOfTotal
--------------------------------
X       Alice     25%
X       Barbara   65%
X       Claire    10%

Y       Alice     25%
Y       Barbara   50%
Y       Claire    25%

Seems easy, right? I'm stumped! Anyone have an elegant solution? I just put together this solution while writing this question, which seems to work, but I'm wondering if there's a better way. Or maybe a different way from which I'll learn something.

WITH temp_ids (Month)
AS
(
  SELECT DISTINCT MIN(Month)
    FROM employees_paid
  GROUP BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
  FROM employees_paid EMP
         JOIN temp_ids IDS ON EMP.Month = IDS.Month
GROUP BY EMP.Month, EMP.Employee, EMP.PercentOfTotal

Thanks y'all! -Ricky


I'm assuming performance won't be great (cause of the subquery)

SELECT * FROM employees_paid where Month not in (
     SELECT
          a.Month
     FROM
          employees_paid a
          INNER JOIN employees_paid b ON
               (a.employee = B.employee AND
               a.PercentOfTotal = b.PercentOfTotal AND
               a.Month > b.Month)
     GROUP BY
          a.Month,
          b.Month
     HAVING
          Count(*) = (SELECT COUNT(*) FROM employees_paid c
               where c.Month = a.Month)
     )

  1. The inner SELECT does a self join to identify matching employee and percentage combinations (except those for the same month). The > in the JOIN ensures that only one set of matches is taken i.e. if a Month1 entry = Month3 entry, we get only the Month3-Month1 entry combination instead of Month1-Month3, Month3-Month1 and Month3-Month3.
  2. We then GROUP by COUNT of matched entries for each month-month combination
  3. Then the HAVING excludes months that don't have as many matches as there are month entries
  4. The outer SELECT gets all entries except the ones returned by the inner query (the ones with full set matches)

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK