CFLib.org – Common Function Library Project

solrClean(input)

Last updated October 2, 2012

author

Sami Hoda

Version: 2 | Requires: CF9 | Library: UtilityLib

Description:
Like VerityClean, massages text input to make it Solr compatible. NOTE: requires uCaseWordsForSolr UDF.

Return Values:
Returns a string.

Example:

<cfset cleanSolrSearchText = solrClean(userSearchText) />

Parameters:

Name Description Required
input String to run against Yes

Full UDF Source:

/**
 * Like VerityClean, massages text input to make it Solr compatible.
 * v1.0 by Sami Hoda
 * v2.0 by Daria Norris to deal with wildcard characters used as the first letter of the search
 * v2.1 by Paul Alkema - updated list of characters to escape
 * v2.2 by Adam Cameron - Merge Paul's &amp; Daria's versions of the function, improve some regexes, fix logic error with input argument (was both required and had a default), converted wholly to script
 * 
 * @param input 	 String to run against (Required)
 * @return Returns a string. 
 * @author Sami Hoda (sami@bytestopshere.com) 
 * @version 2.2, October 2, 2012 
 */
string function solrClean(required string input){
	var cleanText = trim(arguments.input);
	// List of bad charecters. "+ - && || ! ( ) { } [ ] ^ " ~ * ? : \" 
	// http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Escaping Special Characters
	var reBadChars = "\+|-|&&|\|\||!|\(|\)|{|}|\[|\]|\^|\""|\~|\*|\?|\:|\\";
	
	// Replace comma with OR
	cleanText = replace(cleanText, "," , " or " , "all");

	// Strip bad characters
	cleanText = reReplace(cleanText, reBadChars, " ", "all");

	// Clean up sequences of space characters
	cleanText = reReplace(cleanText, "\s+", " ", "all");

    // clean up wildcard characters as first characters
    cleanText = reReplace(cleanText, "(^[\*\?]{1,})", "");

	// uCaseWords - and=AND, etc - lcase rest. if keyword is mixed case - solr treats as case-sensitive!
	cleanText = uCaseWordsForSolr(cleanText);
	return trim(cleanText);
}
blog comments powered by Disqus

Search CFLib.org


Latest Additions

Kevin Cotton added
date2ExcelDate
May 5, 2016

Raymond Camden added
CapFirst
April 25, 2016

Chris Wigginton added
loremIpsum
January 18, 2016

Gary Stanton added
calculateArrival...
November 19, 2015

Sebastiaan Naafs - van Dijk added
getDaysInQuarter
November 13, 2015

Created by Raymond Camden / Design by Justin Johnson