Documentum and Powershell

Thursday, September 4, 2014

Updated Script

Apparently, people still care about Powershell, and people still care about Documentum.

Consider this my final post on Powershell and Documentum, as I have moved on in my professional life from both technologies.

I was able to find the final version of the script that I created so many years ago. I removed the particulars that are unique to our organization, and removed comments, but it should still work (assuming EMC hasn't changed the way idql32.exe works).

To use this code, put it in a file called, say, "DctmHelpers.ps1". Then, where you want to use it, put at the top of the script, ". DctmHelpers.ps1" making sure to put the full path if it is different from the current path. Then, call it like this:

$AllTheObjects = g-do "select * from dm_document"

Here is the script, in all its glory:

function Get-DocumentumObject {
function GetContentType ($a_content_type){
switch ($a_content_type)
    {
    "crtext"
      {
      $extension = ".txt"
      $format = "Plain text"
      }
    "excel5book"
      {
      $extension = ".xls"
      $format = "Excel spreadsheet"
      }
    "excel8book"
      {
      $extension = ".xls"
      $format = "Excel spreadsheet"
      }
    "excel2sheet"
      {
      $extension = ".xls"
      $format = "Excel spreadsheet"
      }
    "excel"
      {
      $extension = ".xls"
      $format = "Excel spreadsheet"
      }
    "msw8"
      {
      $extension = ".doc"
      $format = "Word document"
      }
    "msw6"
      {
      $extension = ".doc"
      $format = "Word document"
      }
    "doc"
      {
      $extension = ".doc"
      $format = "Word document"
      }
    "msw"
      {
      $extension = ".doc"
      $format = "Word document"
      }
    "msw8template"
      {
      $format = "Word template"
      $extension = ".dot"
      }
    "pdf"
      {
      $format = "Portable Document Format (PDF)"
      $extension = ".pdf"
      }
    "html"
      {
      $extension = ".htm"
      $format = "HyperText Markup Language (HTML)"
      }
    "jpeg"
      {
      $format = "Joint Photographic Experts Group (JPEG)"
      $extension = ".jpg"
      }
    "tiff"
      {
      $format = "Tagged Image File Format (TIFF)"
      $extension = ".tif"
      }
    "png"
      {
      $extension = ".png"
      $format = "Portable Network Graphics (PNG)"
      }
    "bmp"
      {
      $extension = ".bmp"
      $format = "Bitmap"
      }
    "gif"
      {
      $extension = ".gif"
      $format = "Graphics Interchange Format (GIF)"
      }
    "rtf"
      {
      $extension = ".rtf"
      $format = "Rich Text Format (RTF)"
      }
    "wp6"
      {
      $extension = ".wp6"
      $format = "WordPerfect6"
      }
    "pro"
      {
      $extension = ".pro"
      $format = "WordPerfect 6 (PRO)"
      }
    "ppt8"
      {
      $extension = ".ppt"
      $format = "PowerPoint"
      }
    "msg"
      {
      $extension = ".msg"
      $format = "Outlook message"
      }
    "entourage_msg"
      {
      $extension = ".msg"
      $format = "Entourage message"
      }
    "zip"
      {
      $extension = ".zip"
      $format = "Compressed ZIP file"
      }
    "123w"
      {
      $extension = ".123"
      $format = "Lotus 123"
      }
    default
      {
      $extension = ""
      $format = "Unknown"
      }
    }
write-output $extension
write-output $format
}

    $dqlrun = $args[0]

$ContentBoo = $false
if ($dqlrun.contains("a_content_type"))
    {
    $ContentBoo = $true
    }
$SelectStatement = @()
$SelectStatement += "$dqlrun"
$SelectStatement += "go"
$presult = [path to idql32.exe] [repository name, credentials, etc.] $selectstatement | -U -P -n | where-object {$_ -notlike "*row affected*"} | where-object {$_ -notlike "*rows affected*"}
if ($presult -eq $null)
    {
    write-warning "There was an error - you did not enter a valid DQL statement"
    write-output $null
    break
    }

if ($presult.length -eq 2)
    {
    write-warning "No entries matching that statement"
    write-output $null
    break
    }

$result = @()
$split = @($presult[1].split(" ",[StringSplitOptions]::RemoveEmptyEntries))

$headers = @()
$numbers = @()
$start = 0
for ($i=0; $i -lt $split.length)
    {
    $length = $split[$i].length
    $column = $presult[0].substring($start,$length)
    $column = $column.trimend()
    $headers += $column
    $temp = @($start,$length)
    $numbers += $temp
    $start = $start + $length + 2
    $i++
    }

for ($a=2; $a -lt $presult.length)
    {
    $listitem = new-object psobject
    $f = 0
    for ($i=0; $i -lt $headers.length)
      {
      $value = $presult[$a].substring($numbers[$f],$numbers[$f+1])
      $value = $value.trimend()
      $value = $value.replace("'","''")
      $listitem | add-member -membertype noteproperty -name $headers[$i] -value $value
      $i++
      $f = $f+2
      }
    if ($contentboo)
      {
      $contenttype = $listitem.a_content_type
      $ContentInfo = GetContentType $contenttype
      $listitem | add-member -membertype noteproperty -name "Extension" -value $ContentInfo[0]
      $listitem | add-member -membertype noteproperty -name "FormatInfo" -value $ContentInfo[1]
      }
    $result += $listitem
    $a++
    }

write-output $result
}

new-alias -name g-do -value get-documentumobject

Friday, October 3, 2008

"Include" in Powershell

Powershell isn't like a fully-fledged programming language - it's still designed to be a shell and a little more accessible to the average user. That doesn't mean that you can't do many of the things you are able to do in Java or C#.

A handy trick I found out is including other PowerShell files with functions in them as part of a library. All it takes is a period (.) followed by a space ( ) and the full path to the ps1 file. Any functions you put in there will automatically be included in your new PowerShell script. I wrote a function that takes a String (a SELECT statement) and returns a PSObject representing the statement I just gave it.

This can be handy in Documentum so that you don't need to copy large amounts of text into your scripts: you just need to include the library. I have written them for sending messages, running SQL queries, and updating Documentum.

Recently, I learned how to create a .NET link to Documentum in PowerShell. PowerShell isn't handling the errors very well yet, but I may be able to get it some error trapping that means it doesn't crash when it tries to load a program. That will have to wait for another post, however, as I'm on my way home for the weekend.

Monday, March 24, 2008

The Script

$dqlrun = $args[0]
$SelectStatement = @()
$SelectStatement += "$dqlrun"
$SelectStatement += "go"
$presult = $selectstatement [path to idql32.exe] [repository name, credentials, etc.] where-object {$_ -notlike "*row affected*"} where-object {$_ -notlike "*rows affected*"}
$result = @()
$split = @($presult[1].split(" ",[StringSplitOptions]::RemoveEmptyEntries))
$headers = @()
$numbers = @()
$start = 0
for ($i=0; $i -lt $split.length)
{
$length = $split[$i].length
$column = $presult[0].substring($start,$length)
$column = $column.trimend()
$headers += $column
$temp = @($start,$length)
$numbers += $temp
$start = $start + $length + 2
$i++
}
for ($a=2; $a -lt $presult.length)
{
$listitem = new-object psobject
$f = 0
for ($i=0; $i -lt $headers.length)
{
$value = $presult[$a].substring($numbers[$f],$numbers[$f+1])
$value = $value.trimend()
$listitem add-member -membertype noteproperty -name $headers[$i] -value $value
$i++
$f = $f+2
}
$result += $listitem
$a++
}
write-output $result

Output from this script should be a PowerShell object with the results of a Documentum Select statement that you pass it.

Saturday, March 22, 2008

Reports-Duplicate Files

A document in a Documentum folder is not like your grandmother's network share. No, no, you can have duplicate files in a folder. That's sometimes handy, because sometimes you have two bits of data that you want to store separately. Well, but if your users are making mistakes and checking in documents as new documents with duplicate names, well, you're in trouble.

This was the trickiest report to come up with, but the second shortest. That's because you can use just one statement - that's right, just one select statement - to return all the data you need. Operators are standing by... no, that's not right. My blog is not a gimmicky way to make money. It's something I'm writing down in case I ever go crazy and someone needs to fake being me for a while. Here goes:
select i_folder_id, object_name, a_content_type, count(*) from dm_document where folder ('/Cabinet', descend) group by i_folder_id, object_name, a_content_type having count(*) > 1
Wha...???
Because I'm not a SQL guy (not even close), this took about 20 hours just to construct and complete this command. But it's genius (if I do say so myself). I'll unpack it a bit:
i_folder_id is the object id of the folder containing the document (there's no telling what folder this is without another DQL statement - thanks, DQL), a_content_type is the metadata holding the file type (referred to in my last post), and count(*) - well, I don't really understand what it means. But, it has something to do with duplicates.

As I understand it, count(*) compares all of the items and attributes you select, and sees if any of them are the same. The group by near the end does this, too, somehow. Then, the having bit at the end says, "Just show me the entries that have duplicates within their own folders, and are of the same file type and filename." If you enter this statement and return a PowerShell object, you should have no problem creating a report in Excel.

Reports-Naming Consistency

One of the cool things I am able to do with custom objects is create reports based on, well, whatever my mind can come up with. The powers that be came up with 4 reports that we might want to run on a regular basis: Naming Consistency, Subfolder Creation, Duplicate Files, and Document Volume. The first three are not only metrics we can use to see how well people are doing following instructions (in case you were wondering, they were only doing OK). However, they are also things that users can utilize in "fixing" their problems. The Document Volume report is mostly just for management, to see what kind of volume of documents people are pushing.

The most difficult of these is the Naming Consistency. We have an extensive naming convention document that must be followed if we are to push documents to websites (this is just until workflow is up and running naming our mission-critical documents). I had to learn how to use .NET regular expressions (which are kind of complicated) to compare the names of documents to our standards, then if it failed, see why it failed, and put that on a report in Excel.

Here is an example of a regular expression I used for matching correspondence:
$CorrMatch = [regex] "^[a-z0-9]+\-((20[0-9]{2})(19[0-9]{2,2}))\-[0-1][0-9]\-[0-3][0-9]\-[\s \S]*$"
If you can follow that without having seen regular expressions, you're a better man (or woman) than I. Basically, $CorrMatch is an object of the type, "regex." The caret signifies the beginning of a filename, the dollar sign the end. The brackets enclose sets of characters to match against, the curly braces signify the number of times to repeat a given set. The pipes indicate OR, the backslashes come before a symbol I actually want to search for in the name. The plus sign indicates one or more matches, the asterisk zero or more matches, and the /S and /s are types of characters to search for. Phew.

With this set up, a match for a filename becomes pretty simple:
if ($DctmObject.object_name -match $CorrMatch) {DoStuff}
An example of something that would fit this regular expression is this filename:
230-2008-04-03-Complaint
Documentum doesn't use extensions, as the file format is stored as metadata on the object.

The way I ran this report before was rather brutish. I would go into each of a person's folders, look at each of the folders I wanted to match against, and compare them against the regular expressions I set up. If it failed, I would use more regular expressions to figure out why it failed, and put it into an Excel spreadsheet. One of the problems with this is that people inherit other people's mistakes.

The better way I found for doing this was to load all of the misnamed files into an object, then create a list of unique names of people using Get-Unique, and finally create worksheets for each person and load their misnamed files. This has the advantage of doing all of the Documentum work first, then the matching, and only then invoking Excel to put the data in.

One modification I had to make to get this to work properly was to extend the Folder object into a custom Folder type. We use the Desktop Client (because there was no way we were going to get our employees to adapt to the Webtop Client), so I created a custom type with account and management information on it, then set up a new tab in Documentum to display this new information. This has two primary benefits:
1. No matter where an account folder goes, I can always search them without knowing where they are
2. People in different offices can filter out the buildings that don't pertain to them
A couple other benefits will come in later, when I customize InputAccel to use this new information (it will cut down on the amount of verification that IA needs to use).

I think a faster way to do this would be to save the data as CSV, which PowerShell can do natively. This has the advantage of going around Excel. Controlling Excel with the COM object is slow (but very cool, nonetheless). I will post later on how this works.

I will post a little later on the other three reports. The cool thing about this report was the regex and the custom folder type - the rest was just puzzle work.

The Object

One of the difficult things about object-oriented programming for a non-object-oriented programmer is the whole object thing. What is an object? I knew a little just from having looked at this kind of stuff, but I didn't really know what was going on. This was to be my first foray from procedural programming into OO. I knew if I misstepped, it could be deadly (well, not really, but at least not very fun).

My first trick was to pass object within PowerShell, rather than writing to a text file and reading that text file. Reducing disk I/O's would improve the speed of the script and decrease the number of errors if I were to, say, run four scripts at the same time. This is pretty easily accomplished using an array of strings for DQL commands:

$DQLStatement = @()

$DQLStatement += "select * from dm_user"

$DQLStatement += "go"

$DQLStatement idql32.exe

That doesn't use any disk I/O (except for loading idql32.exe from the disk, which I haven't learned to get around yet). But it doesn't get us any closer to the ideal of using objects from Documentum, which is one of the advantages of Documentum in the first place. So, I wrote what I consider to be one of my crowning achievements at the company: Get-DocumentumObject.ps1. This script returns the result from idql32.exe, looks at the headers and lengths, and returns an object based on what it receives. It is much more elegant than what I was doing before. The first one I wrote returned the data to an XML file, but now it returns it as an object right in PowerShell, using the custom PSObject type. I will post it in a later post.

One of the main advantages of this is it only requires one line of code to return a Documentum object from a select statement:
$DctmObject = "select * from dm_user" y:\scripts\Get-DocumentumObject.ps1
At this point, $DctmObject would be a PowerShell object with all the information from the select statement.

A caveat here about this method: it is not perfect at all. If the return from the select statement is wider than a certain length, for some reason PowerShell wraps the text to the next line, causing the object to break. I may someday find a fix for this, but most of the time I don't need to use 40 columns, just a few.

Saturday, January 19, 2008

Dctm-PoSh Integration - Chapter 1

December 17, 2006 - I get a new job at a company in IT for the first time in my life. My official position? "That Documentum Guy." What is Documentum, you might ask? Only one of the most sophisticated pieces of software out there for document management. It is produced by EMC Corp. My job? Make document management happen at my company. And how much Documentum experience had I had? Well, I'd never heard of Documentum. It was going to be a challenge.

Any sysadmin is going to want to be able to effect change on huge amounts of files, programmatically. And how much programming experience had I had? Very little. Batch file programming was my specialty. I remember when I was about 12 making a batch file menu, that would boot the computer with certain options for certain environments that I needed. So, I started out writing Documentum scripts in batch files. Not very scalable, and certainly not very easy.

That's when I discovered PowerShell (PoSH), Microsoft's new command line environment built on the .NET framework. The main difference between PoSH and batch files is that PoSH is object-oriented, not procedural. The main advantage when dealing with scripting Documentum, however, was the ability to write much more complicated logic, such as FOR statements, FOREACH statements, and embedding such statements within each other. I called the project of getting PowerShell to talk to Documentum Big Thunder Mountain, after one of my favourite rides at Disneyland. I wrote the first PoSH script in July of 2007. It wasn't very advanced - all it did was take a look at people's default folder, see if there was a folder in there marked Private, and create one if it didn't exist. I had to teach myself some stuff about parsing text files, and stage one was complete. I was now able to write rudimentary files for connecting to a docbase.

Details of the First Script

I realized early on that I would be writing more than one Big Thunder Mountain script, so I developed three lines of code that would run SELECT statements in Documentum:
echo "select $sourcecolumn from ""dm_folder"" where folder ('/Cabinet/Home Folders')" set-content -path c:\test\run.txt
echo "go" add-content -path c:\test\run.txt
cmd /c c:\test\1.bat c:\test\folders.txt
[1.bat was a batch file I wrote for running idql32.exe, as I couldn't figure out how to pass the correct arguments from PowerShell to the program. More on that later]

PowerShell requires that all variables begin with a dollar sign, so $sourcecolumn was the name of the column I was going to select from the database. In this case, I was looking for object_name.

The next few lines parsed the text file output, and put them into a text file, then loaded them into an object:
findstr /v "$sourcecolumn ---------------" c:\test\folders.txt > c:\test\temp.txt
findstr /v "affected" c:\test\temp.txt > c:\test\folders.txt
remove-item c:\test\temp.txt
$folders = get-content c:\test\folders.txt

At this point I was still using findstr.exe from the old Command Prompt days. Those days, however, were soon to end.

The next few lines included a couple FOREACH statements, designed to go through each folder and see if there was a folder called Private. If there wasn't, it wrote the name of the person to a text file (one text file per person - a horrible way to do administration). If there was, it did not include the person.
foreach ($folder in $folders){$folder = $folder.TrimEnd()
echo "select $sourcecolumn from ""dm_folder"" where folder ('/Cabinet/Home Folders/$folder')" set-content -path c:\test\run.txt
echo "go" add-content -path
c:\test\run.txtcmd /c c:\test\1.bat "c:\test\$folder.usr"
$userfolder = get-content "c:\test\$folder.usr"
foreach ($item in $userfolder){$item = $item.TrimEnd()if ($item -eq "Private")
{remove-item "c:\test\$folder.usr"}}}
get-childitem c:\test\*.usr set-content -path c:\test\noprivate.txt
remove-item c:\test\*.usr

Wow, I am so embarrassed by this script now. I realize now how little I know both about DQL statements and PoSH. As we'll see in the next installment, I learned some tricks.