Finding duplicate rows in a table
Hey all, how many times have you been data profiling and had the question, what data has been duplicated data in a table ?
It's pretty easy to find out quickly using the HAVING clause with the GROUP BY statement.
Below is an example query that you can use to do this, just replace the table and columns to suit your data requirements.
-- This query will return the business key values that are duplicated -- across the Member table in the sample fictitious Insurance database, email me if you require the db
select count(member_biz_key) as BizKeyCount -- Select the column that is to be tested for duplicates ,member_biz_key -- and show the column(s) required to view ,gender
dbo.Member -- Specify your table of interest
member_biz_key -- Ensure you group by the selected columns,
count(member_biz_key) > 1 -- Use the having filter where the count is greater than 1
The below shot is the query result, in all there are 225 duplicated keys out of 10,663 rows, these duplicates are going to mess with your data integrity!