Remove or Clean High / Extended ASCII Characters in ColdFusion for XML Safeness


The other week I was reading Ben’s Kinky Solution for removing high characters in ColdFusion strings. It’s an unfortunate fact that ColdFusion’s own tags do not work very well with the conversion of ASCII high characters. So with this in mind and inspired by the conversation in Ben’s post I decided to create a simple solution that was not dependent on external Java objects.

This function will scan through your string, find any characters that have a higher ASCII numeric value then 127 and automatically convert them to a hexadecimal numeric character. Which is then usable in any XML or XHTML document.

The example code can be found below or download from GitHub https://github.com/bengarrett/devtidbits/tree/master/post_42.

<!----- xml-high-safe.cfc 1.0 by Ben Garrett : 11 March 2008 :
Released under the Creative Commons BSD License (http://creativecommons.org/licenses/BSD/) ----->
<cfcomponent output="no">
    <cffunction name="XMLHighSafe" access="public" returntype="string" output="no" hint="This scans through a string, finds any characters that have a higher ASCII numeric value greater than 127 and automatically convert them to a hexadecimal numeric character">
        <cfargument name="text" type="string" required="yes">
        <cfscript>
            var i = 0;
            var tmp = '';
            while(ReFind('[^\x00-\x7F]',text,i,false))
            {
                i = ReFind('[^\x00-\x7F]',text,i,false); // discover high chr and save it's numeric string position.
                tmp = '&##x#FormatBaseN(Asc(Mid(text,i,1)),16)#;'; // obtain the high chr and convert it to a hex numeric chr.
                text = Insert(tmp,text,i); // insert the new hex numeric chr into the string.
                text = RemoveChars(text,i,1); // delete the redundant high chr from string.
                i = i+Len(tmp); // adjust the loop scan for the new chr placement, then continue the loop.
            }
            return text;
        </cfscript>
    </cffunction>
</cfcomponent>
About these ads

6 thoughts on “Remove or Clean High / Extended ASCII Characters in ColdFusion for XML Safeness

  1. This is a useful piece of code, but it is inefficient. You’re performing an ReFind() in the while() statement and then immediately performing it again. That’s wasteful. I tested this against a large string (3+ megabytes, JSON) and it took 40 seconds to do. Revising it to the code listed below brings it to 26 seconds. Not great, but better. I’m still working on something faster. ;)


    var i = 0;
    var tmp = '';
    i = ReFind('[^\x00-\x7F]',text,i,false);
    while(i) {
    tmp = '&##x#FormatBaseN(Asc(Mid(text,i,1)),16)#;';
    text = Insert(tmp,text,i);
    text = RemoveChars(text,i,1);
    i = i+Len(tmp);
    i = ReFind('[^\x00-\x7F]',text,i,false);
    }
    return text;

    • Thanks for that Joel. I guess when I was working towards a solution I wasn’t really to concerned about optimisation but now that you bring it up your revised code certainly looks preferable.

  2. Very useful code – thank you. I have a number of contributors who like to format their text in Microsoft Word, then paste it into web pages and blogs. This cleaned it right up.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s