SQL Server: a clustering issue that annoys me

advertisements

I've been working with SQL Server for the better part of a decade, and this grouping (or partitioning, or ranking...I'm not sure what the answer is!) one has me stumped. Feels like it should be an easy one, too. I'll generalize my problem:

Let's say I have 3 employees (don't worry about them quitting or anything...there's always 3), and I keep up with how I distribute their salaries on a monthly basis.

Month   Employee  PercentOfTotal
--------------------------------
1       Alice     25%
1       Barbara   65%
1       Claire    10%

2       Alice     25%
2       Barbara   50%
2       Claire    25%

3       Alice     25%
3       Barbara   65%
3       Claire    10%

As you can see, I've paid them the same percent in Months 1 and 3, but in Month 2, I've given Alice the same 25%, but Barbara got 50% and Claire got 25%.

What I want to know is all the distinct distributions I've ever given. In this case there would be two -- one for months 1 and 3, and one for month 2.

I'd expect the results to look something like this (NOTE: the ID, or sequencer, or whatever, doesn't matter)

ID      Employee  PercentOfTotal
--------------------------------
X       Alice     25%
X       Barbara   65%
X       Claire    10%

Y       Alice     25%
Y       Barbara   50%
Y       Claire    25%

Seems easy, right? I'm stumped! Anyone have an elegant solution? I just put together this solution while writing this question, which seems to work, but I'm wondering if there's a better way. Or maybe a different way from which I'll learn something.

WITH temp_ids (Month)
AS
(
  SELECT DISTINCT MIN(Month)
    FROM employees_paid
  GROUP BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
  FROM employees_paid EMP
         JOIN temp_ids IDS ON EMP.Month = IDS.Month
GROUP BY EMP.Month, EMP.Employee, EMP.PercentOfTotal

Thanks y'all! -Ricky

I'm assuming performance won't be great (cause of the subquery)

SELECT * FROM employees_paid where Month not in (
     SELECT
          a.Month
     FROM
          employees_paid a
          INNER JOIN employees_paid b ON
               (a.employee = B.employee AND
               a.PercentOfTotal = b.PercentOfTotal AND
               a.Month > b.Month)
     GROUP BY
          a.Month,
          b.Month
     HAVING
          Count(*) = (SELECT COUNT(*) FROM employees_paid c
               where c.Month = a.Month)
     )

The inner SELECT does a self join to identify matching employee and percentage combinations (except those for the same month). The > in the JOIN ensures that only one set of matches is taken i.e. if a Month1 entry = Month3 entry, we get only the Month3-Month1 entry combination instead of Month1-Month3, Month3-Month1 and Month3-Month3.
We then GROUP by COUNT of matched entries for each month-month combination
Then the HAVING excludes months that don't have as many matches as there are month entries
The outer SELECT gets all entries except the ones returned by the inner query (the ones with full set matches)

SQL Server: a clustering issue that annoys me

SQL Server: a clustering issue that annoys me

Recommend

C ++ Confused about Threads

腾讯云 Serverless 建站方案全新升级！

SAP Next Talent Springs Into the No-Ops Dojo

CockroachDB: The Resilient Geo-Distributed SQL Database

We Will Do Anything to Get You to Work for Us Except Pay You Enough

宝马即将发布“最绿色”新能源车！100%再利用可再生材料制成…

将AR眼镜嵌进眼球！成本高达7个亿，让你闭眼就能看片...

《2021抖音电商商家自播白皮书》重磅发布，“八大能力模型”激发自播经营新增长 - 公关...

Real Time Systems in the Real World with AG Grid on Web Rush #149

数码up女朋友最值得买的新·“驯龙高手”【荣耀Magic3 Pro】

About Joyk