CFLib.org – Common Function Library Project

balanceTags(string)

Last updated November 22, 2012

author

Chris Herdt

Version: 1 | Requires: CF6 | Library: StrLib

Description:
Taking an XHTML excerpt based on character length can frequently produce broken tags and broken elements. This function removes trailing broken tags and adds end tags so that the excerpt is balanced.

Return Values:
Returns a string with any unbalanced opening tags re-balanced with matching closing tags

Example:

<cfset string = '<div><p>Here is some<br/><em>well-balanced</em><br /> <strong>XHTML</strong>'>
<cfoutput>#balanceTags(string)#</cfoutput>
<cfoutput>#balanceTags(Left(string,50))#</cfoutput>
<cfoutput>#balanceTags(Left(string,65))#</cfoutput>
<p>Additional text.</p>

Parameters:

Name Description Required
string A string containing a fragment of XHTML Yes

Full UDF Source:

<!---
 Given unbalanced or truncated XHTML, returns balanced XHTML
 v1.0 by Chris Herdt
 
 @param string      A string containing a fragment of XHTML (Required)
 @return Returns a string with any unbalanced opening tags re-balanced with matching closing tags 
 @author Chris Herdt (cherdt@gmail.com) 
 @version 1.0, November 22, 2012 
--->
<cffunction name="balanceTags" access="public" output="false" returntype="string" hint="Takes (possibly truncated) XHTML content and balances the tags">
    <cfargument name="string" type="string" required="true" hint="The HTML string to balance">
    <cfset var returnString = arguments.string>
    <cfset var tagStack = arrayNew(1)>
    <cfset var tag = "">
    <cfset var tagString = "">
    <cfset var word = "">
    <cfset var startPosition = 1>

    <!--- First, find any broken tags (as opposed to broken elements) --->
    <!--- Loop backwards through text one char at a time --->
    <!--- If you find a < before a > then there is a broken tag --->
    <cfset startPosition = len(returnString)>
    <cfloop condition="startPosition GT 0">
        <cfif find("<",returnString,startPosition) GT 0>
            <!--- This represents a broken tag. Truncate the string at this position --->
            <cfset returnString = left(returnString,startPosition-1)>
            <cfbreak>
        <cfelseif find(">",returnString,startPosition) GT 0>
            <!--- Found the end of the tag (properly ended), so we can break the loop --->
            <cfbreak>
        <cfelse>
            <!--- Check the next char to the left --->
            <cfset startPosition = startPosition-1>
        </cfif>
    </cfloop>

    <!--- Reset start position to begin at the first char --->
    <cfset startPosition = 1>

    <!--- Next, find any broken elements (i.e. start tags without end tags --->
    <cfloop condition="startPosition GT 0 AND startPosition LT len(returnString)">
        <cfset startPosition = find("<",returnString,startPosition)>
        <cfset tag = REFind("</?[A-Za-z]+[^>]*>",returnString,startPosition,true)>
        <cfif tag.len[1] GT 0>
            <cfset tagString = mid(returnString,tag.pos[1],tag.len[1])>
            <!--- Check for self-closing tag --->
            <cfif REFind("\s*/\s*>",tagString) GT 0>
                <!--- Do nothing --->
            <cfelse>
                <cfset word = REFind("/?[A-Za-z]+",tagString,1,true)>
                <cfif word.len[1] GT 0>
                    <!--- Is this a start tag or an end tag? --->
                    <cfif NOT find("/",mid(tagString,word.pos[1],word.len[1]))>
                        <!--- Start tag: Push the word (start tag) onto the stack --->
                        <cfset arrayAppend(tagStack,mid(tagString,word.pos[1],word.len[1]))>
                    <cfelse>
                        <!--- End tag: Pop word --->
                        <cfset arrayDeleteAt(tagStack,arrayLen(tagStack))>
                        <!--- TODO:
                        What if the tags are not only unbalanced, but not properly nested?
                        E.g. "<div><p>Here is <strong>some</p><p>text</strong></p></div>"
                        We could try to insert the end tag before the closing of the parent tag.
                        Although then we would have a stray end tag.
                        --->
                    </cfif>
                </cfif>
            </cfif>
            <!--- Ignore everything that was part of the current tag --->
            <cfset startPosition = tag.pos[1]+tag.len[1]>
            <!--- Advance to next tag --->
            <cfset startPosition = find("<",returnString,startPosition)>
        </cfif>
    </cfloop>

    <!--- While the stack is not empty, pop items from the stack --->
    <cfloop condition="arrayLen(tagStack) GT 0">
        <!--- Add the popped word (top of stack) as an end tag --->
        <cfset returnString = returnString & "</" & tagStack[ArrayLen(tagStack)] & ">">
        <cfset arrayDeleteAt(tagStack,arrayLen(tagStack))>
    </cfloop>

    <cfreturn returnString>
</cffunction>

Search CFLib.org


Latest Additions

Raymond Camden added
QueryDeleteRows
November 04, 2017

Leigh added
nullPad
May 11, 2016

Raymond Camden added
stripHTML
May 10, 2016

Kevin Cotton added
date2ExcelDate
May 05, 2016

Raymond Camden added
CapFirst
April 25, 2016

Created by Raymond Camden / Design by Justin Johnson