Home for HMNL Enterprise Computing

XML validation in Notes

Ian Tree  10 May 2006 14:55:41

Validating XML in the Notes Client


The increased use of XML in all kinds of application contexts dictates that the Notes Developer needs to be able to handle XML in the Notes Client. This article deals with some of the common problems (and their solutions) that can be encountered when trying to validate XML in the Notes Client.

The code presented in this article may be freely used at no charge, use of the code is covered by the terms of the GNU Lesser General Public License
.
  • Background - what is validation and why do it?
  • Capturing the Input
  • Formatting the Input
  • Parsing the XML
  • Processing the Error Log
  • Putting It All Together
  • Summing Up

Background - what is validation and why do it?


There are two different validation states for an XML Document. The document can be "well-formed" or the document can be "valid", all "valid" XML documents are by definition also "well-formed". A "well-formed" XML document is one that conforms to all of the syntactic and structural rules of the XML language specification. A "valid" XML document is one that is "well-formed" but also conforms to all of the lexical and content rules of the Document Type Definition (DTD). All XML parsers, both SAX and DOM, generate error conditions when they are asked to parse an XML document that is not "well-formed" but only "validating" parsers generate error conditions when presented with an XML document that is not "valid", and then only when asked to do so. In this article we are looking at working with XML in the Notes Client so we will focus only on the behaviour of the LotusScript XML Parsers. The two LotusScript XML parsers are both "validating" parsers, i.e. they can both validate the conformance of an XML document against it's Document Type Definition (DTD), the DOM parser is provided by an instance of the NotesDOMParser class and the SAX parser by an instance of the NotesSAXParser class. The validation behaviour of both parsers is controlled by a property called "InputValidationOption", the property is an integer and can be set to one of the following three values.

VALIDATE_NEVER (0) - Do NOT perform any validation against the DTD.
VALIDATE_ALWAYS (1) - Always validate the document against the DTD.
VALIDATE_AUTO (2) - This is the default setting and performs validation only when a DTD is explicitly specified in the XML Document.

The DTD specifies the relationships between the different elements in an XML document and the attributes that can be applied to each element, including default values an requires items. For a good tutorial on the validation capabilities of XML DOM Parsers visit the W3Schools. The DTD can provide an extremely comprehensive specification for the "validation envelope" for an XML document. If a detailed DTD is provided for an XML document and the document is validated when it is created, stored or captured then the quantity of validation and defensive code that is needed in any application component that consumes the XML can be greatly reduced. It is also a good design strategy (the Fail Early strategy) to validate any input as thoroughly as possible at the time when it is input, rather than waiting until it is used (possibly in the background) and rejecting the input at that point in time.

Capturing the Input


A Rich Text field is, of course, the natural input vehicle for an XML Document; the formatting capabilities, especially hanging indents are a natural requirement for the efficient entry of XML. However, Rich Text and the way that Rich Text Fields interact with the UI present some unique problems in terms of validation. As it is possible to specify a NotesRichTextItem as an input source to a NotesDOMParser it would at first sight appear to be a trivial exercise to construct a QurySave event on a form that validated the XML in a Rich Text Field and inhibited the Save if the XML was invalid. However the way that Rich Text Fields work in the UI presents a problem in as much as the content of the Rich Text Field is not transferred to the Rich Text Item in the backend document until the Save processing is done. The simple validation method that we identified above would perform the not very useful task of validating the XML that was already saved in the document and ignoring the input that was provided in the UI.
The text content of the Rich Text Field that is being edited can be captured by using the NotesUIDocument.FieldGetText() method. Using this method allows us to capture the content as a single String. There is a limitation on the size of the String that is returned by the method but for most applications this is not a real limitation.

Formatting the Input


Having got the XML input in the form of a string it would be possible to pass it to an XML Parser for validation, however, if there were any problems with the XML the parser would treat the input as being on a single line and report back the location of the error as being on line "1" column "x" where x could be a pretty big number, this would not be too useful for anyone trying to correct the problem in the nicely formatted Rich Text. Therefore it is necessary to do some pre-processing on the XML before submitting it to the parser. People tend to format XML in a conventional manner and we can use the rules of convention to re-format the XML String into a Stream that will be close to the form that the user entered it. The rules are quite simple.
Each start of an element begins on a new line
An end element occurs on the same line as the previous start element
There can only be one end element on a line

The sample of XML below follows the rules of convention as stated above.

?xml version="1.0" encoding="UTF-8"?
!-- Default Statistics Definitions for the Domino/SIMON set  --
!DOCTYPE simonstats SYSTEM "simonstats.dtd"
simonstats
   platform name="Domino"
       statsgroup name="Core"
           statistic name="NRPCSessions" nativename="NET.Port.Sessions.Established.Incoming" type="quantity" cumulative="yes"/statistic
           statistic name="HTTPSessions" nativename="Http.Accept.ConnectionsAcceted" type="quantity" cumulative="yes"/statistic
           statistic name="NRPCTransactions" nativename="Server.Trans.Total" type="quantity" cumulative="yes"/statistic
           statistic name="HTTPRequests" nativename="Http.Worker.Total.RequestsProcessed" type="quantity" cumulative="yes"/statistic
       /statsgroup
   /platform
/simonstats


The following code provides a LotusScript function the implements the re-formatting based on the rules of convention. The function takes the string as a parameter and returns the stream containing the formatted XML.

Function FormatStringAsStream(strXMLIn As String) As NotesStream
   Dim sessCurrent As New NotesSession
   Dim nstrXMLOut As NotesStream
   Dim iIndex As Integer
   Dim iStart As Integer
   Dim bTerminal As Boolean

   On Error Goto FmtError
   Set nstrXMLOut = sessCurrent.CreateStream()
   If Trim(strXMLIn) = "" Then
       Set FormatStringAsStream = Nothing
       Exit Function
   End If
   iStart = 1
   iIndex = Instr(iStart, strXMLIn, "")
   While iIndex 0
       If Len(strXMLIn) - iIndex > 1 Then
           bTerminal = False
           '  Determine if the current element is a terminal
           If Mid$(strXMLIn, iIndex + 1, 1) = "/" Then
               bTerminal = True
           End If
           Dim iNewIndex As Integer
           iNewIndex = Instr(iIndex, strXMLIn, "")
           '  Ignore Malformed XML
           iIndex = iNewIndex
           If Not bTerminal Then
               If Mid$(strXMLIn, iIndex - 1, 1) <> "/" Then
                   iNewIndex = Instr(iIndex, strXMLIn, "")
                   If (iNewIndex 0) Then
                       If Len(strXMLIn) > iNewIndex Then
                           If Mid$(strXMLIn, iNewIndex + 1, 1) = "/" Then
                               iNewIndex = Instr(iNewIndex, strXMLIn, "")
                               If iNewIndex 0 Then
                                   iIndex = iNewIndex
                               End If
                           End If
                       End If
                   End If
               End If
           End If
       End If
       showLT()")
       End If
   Wend
   Set FormatStringAsStream = nstrXMLOut
NormalExit:
   Exit Function
FmtError:
   Set FormatStringAsStream = Nothing
   Resume NormalExit
End Function


Parsing the XML


The natural choice of parser to use for validating the XML would be the DOM parser. With the DOM parser there is no need to provide any additional coding, with the SAX parser a minimal subset of the SAX events would need to have event handlers coded. The DOM parser also provides a nice XML log containing all of the errors once it has completed parsing. Unfortunately the natural choice turns out to be the wrong one, for rather a strange reason that has to do with the implementation of parsers in LotusScript. In the Java implementation of parsers in Notes the URI reference for the System ID of the Document Type is interpreted as being a Page in the current database, whereas in the LotusScript implementation it is treated as being file in a directory relative to the Notes Executable directory (what? ed.). When a Document Type specification is added to an XML document one of the attributes tells XML processors where they can find the Document Type Definition (DTD). In the example below the Notes/Domino Java implementation of the parsers the "simonstats.dtd" SYSTEM attribute is expected to be the name of a page in the current database (actually what it does is to change the reference to a fully formed Notes URL of the form notes:///__dbRepId/simonstats.dtd).

       !DOCTYPE simonstats SYSTEM "simonstats.dtd"


The LotusScript implementation treats the same definition in a different way, it treats the specification as an operating system file name in a directory relative to the current notes executable directory. A look at any DXL that is generated uses a Document Type specification in the following form.

       !DOCTYPE document SYSTEM 'xmlschemas/domino_6_5_5.dtd'


And indeed if you look in the Notes executable directory you will find a sub-directory called "xmlschemas" and a file called "domino_6_5_5.dtd". The LotusScript implementation allows you specify a fully qualified file name but does not support any other forms, such as, a notes URL. This is not ideal, the last thing that we want to do in a notes application is have to worry about deploying DTD file to file system (ahhh now I see, thanks. ed.).
This is where the SAX parser comes to the rescue, not because it handles the Document Type attributes any differently from the DOM parser (hardly surprising since the DOM parser is built on top of the SAX parser), but because the SAX parser supports the SAX_ResolveEntity event which is specifically supplied to identify, locate and return resources that are needed by the XML processing. The following code shows the implementation of a class that provides validation of an XML stream, using the SAX parser, it follows the same rules as the Java implementation for locating the DTD and provides a convenient Log as is done with the DOM parser.

'
'   CLASS:  XMLValidationHelper
'
'   This class contains functions for preparing XML streams for validation
'   Author  Ian Tree - HMNL
'   Version 1.2.1/01
'

Class XMLValidationHelper
       Private strClassVersion As String
       Private strLog As String
       Private iErrorCount As Integer

'
'  Constructor
'

   Sub new()
       strClassVersion = "1.2.1/01"
   End Sub

'
'  ValidateXML
'
'  This method will validate the XML Stream passed using the DTD (if it can be located)
'     it will create a DOMParserLog and throw an error if any errors are detected.
'

   Function ValidateXML(nstrXMLIn As NotesStream)
       Dim sessCurrent As New NotesSession
       Dim nspValidate As NotesSAXParser

       Me.strLog = "?xml version=""1.0"" encoding=""UTF-8""?DOMParserLog"
       iErrorCount = 0
       On Error Goto InternalError
       If nstrXMLIn Is Nothing Then
           Me.iErrorCount = Me.iErrorCount + 1
           Me.strLog = Me.strLog + "fatalerrorNo XML Stream was passed to the Validator./fatalerror"
       Else
           If nstrXMLIn.Bytes = 0 Then
               Me.iErrorCount = Me.iErrorCount + 1
               Me.strLog = Me.strLog + "fatalerrorThe XML Stream was passed to the Validator was empty./fatalerror"
           Else
              '  Construct the SAX Parser to do the validation
               Set nspValidate = sessCurrent.CreateSAXParser(nstrXMLIn)
               '   Set the Parser options
               nspValidate.ExitOnFirstFatalError = True
               nspValidate.InputValidationOption = VALIDATE_ALWAYS
               '  Setup handlers for the events that we want to process
               On Event SAX_Error From nspValidate Call HandleSAX_Error
               On Event SAX_FatalError From nspValidate Call HandleSAX_FatalError
               On Event SAX_Warning From nspValidate Call HandleSAX_Warning
               On Event SAX_ResolveEntity From nspValidate Call HandleSAX_ResolveEntity
               Call nspValidate.Parse()
           End If
       End If
NormalExit:
       Me.strLog = Me.strLog + "/DOMParserLog"
       On Error Goto 0
       If Me.iErrorCount > 0 Then            Error 4602         '  Simulate DOM Parser Error
       End If
       Exit Function
InternalError:
       If Err() = 4603 And Me.iErrorCount 0 Then
           Resume NormalExit
       End If
       Me.iErrorCount = Me.iErrorCount + 1
       Me.strLog Me.strLog + "fatalerror line=""0""LotusScript Error in ValidateXML (" + Str(Err()) + ") " + Error$() + " at " + Str(Erl()) + " ./fatalerror"
       Resume NormalExit
   End Function

'  Handler routines for SAX events raised by the ValidateXML function

   '  Handle SAX_Error
   Sub HandleSAX_Error(Source As NotesSAXParser, nsxCurrent As NotesSAXException)
       Me.iErrorCount = Me.iErrorCount + 1
       Me.strLog = Me.strLog + "error line=""" + Str(nsxCurrent.Row) + """" + nsxCurrent.Message + "/error"
   End Sub

'  Handle SAX_FatalError
   Sub HandleSAX_FatalError(Source As NotesSAXParser, nsxCurrent As NotesSAXException)
       Me.iErrorCount = Me.iErrorCount + 1
       Me.strLog = Me.strLog + "fatalerror line=""" + Str(nsxCurrent.Row) + """" + nsxCurrent.Message + "/fatalerror"
   End Sub

'  Handle SAX_Warning
   Sub HandleSAX_Warning(Source As NotesSAXParser, nsxCurrent As NotesSAXException)
       Me.iErrorCount = Me.iErrorCount + 1
       Me.strLog = Me.strLog + "warning line=""" + Str(nsxCurrent.Row) + """" + nsxCurrent.Message + "/warning"
   End Sub

'  Handle SAX_ResolveEntity
   Function HandleSAX_ResolveEntity(Source As NotesSAXParser, Byval strPubID As String, Byval strSysID As String) As Variant
       Dim sessCurrent As New NotesSession
       Dim dbCurrent As NotesDatabase
       Dim ncDesign As NotesNoteCollection
       Dim strNoteID As String
       Dim docPage As NotesDocument
       Dim vName As Variant
       Dim rtiContent As NotesRichTextItem
       Dim strEntityValue As String

       '  Try and locate a page in the current database with the same name as the System ID
       If Trim$(strSysID) = "" Then
           '  Let the default SAX Entity resolver try
           HandleSAX_ResolveEntity = 0
           Exit Function
       End If
       Set dbCurrent = sessCurrent.CurrentDatabase
       Set ncDesign = dbCurrent.CreateNoteCollection(False)
       ncDesign.SelectPages = True
       Call ncDesign.BuildCollection()
       If ncDesign.Count = 0 Then
           '  Let the default SAX Entity resolver try
           HandleSAX_ResolveEntity = 0
           Exit Function
       End If
       '  Loop through the pages trying to locate one with the requested name
       strNoteID = ncDesign.GetFirstNoteId
       While strNoteID <> ""
           Set docPage = dbCurrent.GetDocumentByID(strNoteID)
           vName = docPage.GetItemValue("$TITLE")
           If Trim(Lcase(vName(0))) = Trim(Lcase(strSysID)) Then
               strNoteID = ""
           Else
               Set docPage = Nothing
               strNoteID = ncDesign.GetNextNoteId(strNoteID)
           End If
       Wend
       '   If we managed to find the page then use the content as the entity
       If Not docPage Is Nothing Then
           Set rtiContent = docPage.GetFirstItem("$Body")
           If Not rtiContent Is Nothing Then
               strEntityValue = rtiContent.text
               HandleSAX_ResolveEntity = strEntityValue
           Else
               HandleSAX_ResolveEntity = 0
           End If
       Else
           '  Let the default SAX Entity resolver try
           HandleSAX_ResolveEntity = 0
       End If
   End Function

'  Getters:
   Property Get ClassVersion As String
       ClassVersion = strClassVersion
   End Property
   Property Get PLog As String
       PLog = strLog
   End Property

End Class


Processing the Error Log


The following "utility" functions are provided to process the content of a DOM Parser Error Log. The functions provide a "style sheet" implementation that formats the complete Log into a string suitable for display in a Messagebox.

'
'  FormatParserLogAsString
'
'  This method will format the contents of a Parser Log as a string
'

   Function FormatParserLogAsString(strPLog As String) As String
       Dim sessCurrent As New NotesSession
       Dim strResult As String
       Dim nstrXMLLog As NotesStream
       Dim nstrXSLT As NotesStream
       Dim nstrOut As NotesStream
       Dim xsltTX As NotesXSLTransformer

       strResult = "Error converting Parser Log."
       On Error Goto FmtError
       '  Create a Stream and load the Parser Log Content to it
       Set nstrXMLLog = FormatStringAsStream(strPLog)
       If Not nstrXMLLog Is Nothing Then
           nstrXMLLog.Position = 0
           '  Get a stream containing the XSLT for transforming a Parser Log
           Set nstrXSLT = GetParserLogXSLT()
           '  Create the stream to contain the outpout
           Set nstrOut = sessCurrent.CreateStream()
           '   Create the XSLTransformer and perform the transaformation
           Set xsltTX = sessCurrent.CreateXSLTransformer(nstrXMLLog, nstrXSLT, nstrOut)
           Call xsltTX.Process()
           '   Convert the output stream to a string
           If nstrOut.Bytes = 0 Then
               strResult = "Empty conversion for this log."
           Else
               nstrOut.Position = 0
               strResult = ""
               Dim strLine As String
               While Not nstrOut.IsEOS
                   strLine = nstrOut.ReadText(STMREAD_LINE, EOL_ANY)
                   strResult = strResult + Left$(strLine, Len(strLine) - 2) + Chr(10)
               Wend
           End If
       End If
       Call nstrXMLLog.Close()
       Call nstrXSLT.Close()
       Call nstrOut.Close()
       FormatParserLogAsString = strResult
NormalExit:
       Exit Function
FmtError:
       xsltTX.LogComment = "Error: (" + Str(Err()) + ")" + Error$() + " at " + Str(Erl()) + " while converting Parser Log."
       FormatParserLogAsString = xsltTX.Log
       Resume NormalExit
   End Function

'  GetParserLogXSLT
'
'  This method will return a stream containing the XSLT for formatting a Parser Log
'

   Function GetParserLogXSLT() As NotesStream
       Dim sessCurrent As New NotesSession
       Dim nstrXSLT As NotesStream
       Set nstrXSLT = sessCurrent.CreateStream()
       Call nstrXSLT.WriteText("?xml version=""1.0"" encoding=""UTF-8""?", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:transform version=""1.0"" xmlns:xsl=""http://www.w3.org/1999/XSL/Transform""", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:output method=""text"" /", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:template match=""/""", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:textError(s) have been detected while validating your XML.&#13;&#10;/xsl:text", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:for-each select=""DOMParserLog/fatalerror""", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:textFatal Error: at Line /xsl:text", EOL_CRLF)
 
Call nstrXSLT.WriteText("xsl:value-of select=""line""/",EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:text: /xsl:text", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:value-of select=""."" /", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:text&#13;&#10;/xsl:text", EOL_CRLF)
       Call nstrXSLT.WriteText("/xsl:for-each", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:for-each select=""DOMParserLog/error""", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:textError: at Li /xsl:text", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:value-of select=""@line""/",EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:text: /xsl:text", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:value-of select=""."" /", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:text&#13;&#10;/xsl:text", EOL_CRLF)
       Call nstrXSLT.WriteText("/xsl:for-each", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:for-each select=""DOMParserLog/warning""", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:textWarning: at Line /xsl:text", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:value-of select=""@line""/",EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:text: /xsl:text", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:value-of select=""."" /", EOL_CRLF)
       Call nstrXSLT.WriteText("xsl:text&#13;&#10;/xsl:text", EOL_CRLF)
       Call nstrXSLT.WriteText("/xsl:for-each", EOL_CRLF)
       Call nstrXSLT.WriteText("/xsl:template", EOL_CRLF)
       Call nstrXSLT.WriteText("/xsl:transform", EOL_CRLF)
       nstrXSLT.Position = 0
       Set GetParserLogXSLT = nstrXSLT
   End Function


Putting It All Together


The following QuerySave event handler from a form shows all of the techniques described above intergrated into a complete implementation. The functions described above that are related to formatting of XML documents are all implemented in a single XMLFormatHelper class in this implementation.

Sub Querysave(Source As Notesuidocument, Continue As Variant)
   '  QuerySave event - validate the XML for the Statistics Definitions
   Dim sessCurrent As New NotesSession
   Dim nstrXML As NotesStream
   Dim nstrVHOut As NotesStream
   Dim strUIVal As String
   Dim xfhCurrent As XMLFormatHelper
   Dim xvhCurrent As XMLValidationHelper
   On Error Goto ParseError
   '  Capture the UI field text and convert it to a formatted stream
   strUIVal = Source.FieldGetText("StatsDefs")
   Set xfhCurrent = New XMLFormatHelper()
   Set nstrXML = xfhCurrent.FormatStringAsStream(strUIVal)
   If nstrXML Is Nothing Then
       Messagebox "Unable to format the XML."
       Continue = False
       Exit Sub
   End If
   '  Create a XMLValidationHelper and validate the XML
   Set xvhCurrent = New XMLValidationHelper
   nstrXML.Position = 0
   Call xvhCurrent.ValidateXML(nstrXML)
   Continue = True
NormalExit:
       Exit Sub
ParseError:
   If Not xvhCurrent Is Nothing Then
       Messagebox xfhCurrent.FormatParserLogAsString(xvhCurrent.PLog), MB_OK, "XML Parser Error"
   Else
       Messagebox "Unknown error in QuerySave processing."
   End If
   Continue = False
   Resume NormalExit
End Sub


Summing Up


By adding a few well crafted support routines to your Notes client based applications it is possible to get great additional development value from the use of XML in your applications without a lot of engineering cost. Any LotusScript developer could maintain the QuerySave code presented above without needing to have the detailed knowledge of XML and it's implementation in LotusScript.

Share: