SQL Server: a clustering issue that annoys me
source link: https://www.codesd.com/item/sql-server-a-clustering-issue-that-annoys-me.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
SQL Server: a clustering issue that annoys me
I've been working with SQL Server for the better part of a decade, and this grouping (or partitioning, or ranking...I'm not sure what the answer is!) one has me stumped. Feels like it should be an easy one, too. I'll generalize my problem:
Let's say I have 3 employees (don't worry about them quitting or anything...there's always 3), and I keep up with how I distribute their salaries on a monthly basis.
Month Employee PercentOfTotal
--------------------------------
1 Alice 25%
1 Barbara 65%
1 Claire 10%
2 Alice 25%
2 Barbara 50%
2 Claire 25%
3 Alice 25%
3 Barbara 65%
3 Claire 10%
As you can see, I've paid them the same percent in Months 1 and 3, but in Month 2, I've given Alice the same 25%, but Barbara got 50% and Claire got 25%.
What I want to know is all the distinct distributions I've ever given. In this case there would be two -- one for months 1 and 3, and one for month 2.
I'd expect the results to look something like this (NOTE: the ID, or sequencer, or whatever, doesn't matter)
ID Employee PercentOfTotal
--------------------------------
X Alice 25%
X Barbara 65%
X Claire 10%
Y Alice 25%
Y Barbara 50%
Y Claire 25%
Seems easy, right? I'm stumped! Anyone have an elegant solution? I just put together this solution while writing this question, which seems to work, but I'm wondering if there's a better way. Or maybe a different way from which I'll learn something.
WITH temp_ids (Month)
AS
(
SELECT DISTINCT MIN(Month)
FROM employees_paid
GROUP BY PercentOfTotal
)
SELECT EMP.Month, EMP.Employee, EMP.PercentOfTotal
FROM employees_paid EMP
JOIN temp_ids IDS ON EMP.Month = IDS.Month
GROUP BY EMP.Month, EMP.Employee, EMP.PercentOfTotal
Thanks y'all! -Ricky
I'm assuming performance won't be great (cause of the subquery)
SELECT * FROM employees_paid where Month not in (
SELECT
a.Month
FROM
employees_paid a
INNER JOIN employees_paid b ON
(a.employee = B.employee AND
a.PercentOfTotal = b.PercentOfTotal AND
a.Month > b.Month)
GROUP BY
a.Month,
b.Month
HAVING
Count(*) = (SELECT COUNT(*) FROM employees_paid c
where c.Month = a.Month)
)
- The inner SELECT does a self join to identify matching employee and percentage combinations (except those for the same month). The > in the JOIN ensures that only one set of matches is taken i.e. if a Month1 entry = Month3 entry, we get only the Month3-Month1 entry combination instead of Month1-Month3, Month3-Month1 and Month3-Month3.
- We then GROUP by COUNT of matched entries for each month-month combination
- Then the HAVING excludes months that don't have as many matches as there are month entries
- The outer SELECT gets all entries except the ones returned by the inner query (the ones with full set matches)
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK