Background
I inherited responsibility for an archaic build and release framework when my predecessor moved on to leading an application development team. It was heavily customized mostly using PowerShell scripts invoked from command scripts.
The build and release tasks were all scripted in an enormous PowerShell script. My priority was to make this script understandable. It had a large number of functions followed by extensive control logic ending with a large switch based on its task parameter.
A single script was a reasonable solution to the problem of deploying any changes to developers and build servers, but maintenance was a horror. It best suited me to use Windows Explorer to index the functions, and to use Notepad++ for viewing them.
I was very new to PowerShell, and identifying the functions, and their bodies struck me as beyond my ability. My natural tendency is to write a parser, and my colleague is a regex wizard. I was not happy with either approach.
My research lead me to Microsoft’s PowerShell parser: System.Management.Automation.Language.Parser. Getting it to parse a PowerShell script is straightforward:
[System.Management.Automation.Language.Token[]]$tokens=$null
[System.Management.Automation.Language.ParseError[]]$errors=$null
[System.Management.Automation.Language.ScriptBlockAst]$script=$null
$script=[System.Management.Automation.Language.Parser]::ParseFile($path, [ref]$tokens, [ref]$errors)
I spent some time poking around the syntax tree, and discovered that the function definitions are all top level statements. I first walked the statement list and this bit of PowerShell:
$script.EndBlock.Statements | ? { $_.Name -eq 'Out-FormattedXml' } | % { $_.Extent }
yields:
File : C:\Dev\GTMBilling\artifact\Process_Dependencies.ps1
StartScriptPosition : System.Management.Automation.Language.InternalScriptPosition
EndScriptPosition : System.Management.Automation.Language.InternalScriptPosition
StartLineNumber : 7416
StartColumnNumber : 1
EndLineNumber : 7423
EndColumnNumber : 2
Text : Function Out-FormattedXml {
param (
[xml]$Xml,
[string]$FilePath
)
$Xml = $Xml.OuterXML.Replace(' xmlns=""','') # I don't know where this comes from
Format-XMLIndent $Xml -Indent 2 | Out-File $FilePath -Encoding utf8
}
StartOffset : 200999
EndOffset : 201232
So to get the functions I just need to reference the Name and Text properties.
I packaged this into one of two scripts to extract the functions from a named script, and a second one to put them back, using a template of the script stripped of the functions.
Later
I asked my predecessor to stand in for me when I was on leave, and he asked me to document the new version of the script. Specifically, he asked me for the dependencies between the functions. There are 278 of them, and it was not a job I wanted to do.
The AST of the script was the perfect tool to get this information. What I needed was a hash on the function names of the functions. Each record had two lists added of References
and ReferencedBy
links to other objects. Links from the script itself were also useful, so I made it a list of Executables
instead of Functions
.
Finding the references depended on searching the body of the function for commands invoking a function. I followed my natural inclination of walking through the semantic tree, but the program quickly became unmanageable.
The developers had followed the Visitor pattern on the AST. The calls are easy to locate using FindAll
on each statement in the function definition to look for commands and check each if it is one of the functions. With the caller and callee in hand, the Referenced
and ReferencedBy
links are set.
What next?
My inexperience with PowerShell made me look for classes. It was an educational experience worth sharing.
I am revisiting this project to also show calls between modules.