Saturday, March 22, 2008

Reports-Naming Consistency

One of the cool things I am able to do with custom objects is create reports based on, well, whatever my mind can come up with. The powers that be came up with 4 reports that we might want to run on a regular basis: Naming Consistency, Subfolder Creation, Duplicate Files, and Document Volume. The first three are not only metrics we can use to see how well people are doing following instructions (in case you were wondering, they were only doing OK). However, they are also things that users can utilize in "fixing" their problems. The Document Volume report is mostly just for management, to see what kind of volume of documents people are pushing.

The most difficult of these is the Naming Consistency. We have an extensive naming convention document that must be followed if we are to push documents to websites (this is just until workflow is up and running naming our mission-critical documents). I had to learn how to use .NET regular expressions (which are kind of complicated) to compare the names of documents to our standards, then if it failed, see why it failed, and put that on a report in Excel.

Here is an example of a regular expression I used for matching correspondence:
$CorrMatch = [regex] "^[a-z0-9]+\-((20[0-9]{2})(19[0-9]{2,2}))\-[0-1][0-9]\-[0-3][0-9]\-[\s \S]*$"
If you can follow that without having seen regular expressions, you're a better man (or woman) than I. Basically, $CorrMatch is an object of the type, "regex." The caret signifies the beginning of a filename, the dollar sign the end. The brackets enclose sets of characters to match against, the curly braces signify the number of times to repeat a given set. The pipes indicate OR, the backslashes come before a symbol I actually want to search for in the name. The plus sign indicates one or more matches, the asterisk zero or more matches, and the /S and /s are types of characters to search for. Phew.

With this set up, a match for a filename becomes pretty simple:
if ($DctmObject.object_name -match $CorrMatch) {DoStuff}
An example of something that would fit this regular expression is this filename:
230-2008-04-03-Complaint
Documentum doesn't use extensions, as the file format is stored as metadata on the object.

The way I ran this report before was rather brutish. I would go into each of a person's folders, look at each of the folders I wanted to match against, and compare them against the regular expressions I set up. If it failed, I would use more regular expressions to figure out why it failed, and put it into an Excel spreadsheet. One of the problems with this is that people inherit other people's mistakes.

The better way I found for doing this was to load all of the misnamed files into an object, then create a list of unique names of people using Get-Unique, and finally create worksheets for each person and load their misnamed files. This has the advantage of doing all of the Documentum work first, then the matching, and only then invoking Excel to put the data in.

One modification I had to make to get this to work properly was to extend the Folder object into a custom Folder type. We use the Desktop Client (because there was no way we were going to get our employees to adapt to the Webtop Client), so I created a custom type with account and management information on it, then set up a new tab in Documentum to display this new information. This has two primary benefits:
1. No matter where an account folder goes, I can always search them without knowing where they are
2. People in different offices can filter out the buildings that don't pertain to them
A couple other benefits will come in later, when I customize InputAccel to use this new information (it will cut down on the amount of verification that IA needs to use).

I think a faster way to do this would be to save the data as CSV, which PowerShell can do natively. This has the advantage of going around Excel. Controlling Excel with the COM object is slow (but very cool, nonetheless). I will post later on how this works.

I will post a little later on the other three reports. The cool thing about this report was the regex and the custom folder type - the rest was just puzzle work.

No comments: