Saturday, March 22, 2008

Reports-Duplicate Files

A document in a Documentum folder is not like your grandmother's network share. No, no, you can have duplicate files in a folder. That's sometimes handy, because sometimes you have two bits of data that you want to store separately. Well, but if your users are making mistakes and checking in documents as new documents with duplicate names, well, you're in trouble.

This was the trickiest report to come up with, but the second shortest. That's because you can use just one statement - that's right, just one select statement - to return all the data you need. Operators are standing by... no, that's not right. My blog is not a gimmicky way to make money. It's something I'm writing down in case I ever go crazy and someone needs to fake being me for a while. Here goes:
select i_folder_id, object_name, a_content_type, count(*) from dm_document where folder ('/Cabinet', descend) group by i_folder_id, object_name, a_content_type having count(*) > 1
Wha...???
Because I'm not a SQL guy (not even close), this took about 20 hours just to construct and complete this command. But it's genius (if I do say so myself). I'll unpack it a bit:
i_folder_id is the object id of the folder containing the document (there's no telling what folder this is without another DQL statement - thanks, DQL), a_content_type is the metadata holding the file type (referred to in my last post), and count(*) - well, I don't really understand what it means. But, it has something to do with duplicates.

As I understand it, count(*) compares all of the items and attributes you select, and sees if any of them are the same. The group by near the end does this, too, somehow. Then, the having bit at the end says, "Just show me the entries that have duplicates within their own folders, and are of the same file type and filename." If you enter this statement and return a PowerShell object, you should have no problem creating a report in Excel.

No comments: