Translating HTML characters

When writing cgi scripts for web pages, text that presented to your script is preprocessed by the web server (such as Apache) and special characters must be translated so they can be included in the URL line and QUERY_STRING variable. Spaces are converted to “+“, variable assignments are separated with “&” and other than alphanumeric characters, the rest are converted to hex using % as a flag. For example: + is %2B& is %26J is %4A. Translating these back to their original strings can be a challenge so here is a function that will take any string (typically$QUERY_STRING from a web page form) and translate the string back to the original characters. You would call it with some inline code to extract each variable:

function ConvertHTML

{

# Provide an HTML query string with %hex and +

# characters and echo back original characters

# translate + to space

OLD=”$(echo $@ | tr “+” ” “)”

LEN=${#OLD}

NEW=””

# translate %## to a single char

# Perl handles the hex to ASCII conversion

CNT=1

while [ $CNT -le $LEN ]

do

CHR=”$(echo “$OLD” | cut -c $CNT)”

if [ “$CHR” = “%” ]

then

CNT=$(( CNT+1 ))

HEX=”$(echo “$OLD” | cut -c $CNT-$((CNT+1)))”

CHRHEX=”$(echo $HEX |

perl -nle ‘print join “”, map { chr hex $_ } split ” “;’)”

NEW=”$NEW$CHRHEX”

CNT=$(( CNT+1 ))

else

NEW=”$NEW$CHR”

fi

CNT=$(( CNT+1 ))

done

echo “$NEW”

return

}

## Process $QUERY_STRING (from a web form referral) into
## variable assignments

for PARM in $(echo $QUERY_STRING | tr “&” “n” | tr “;” “n”)

do

VAR=”$(ConvertHTML $PARM)”

VARNAME=”$(echo “$VAR” | cut -f 1 -d =)”

VALUE=”$(echo “$VAR” | cut -f 2- -d =)”

eval “$VARNAME=”$VALUE””

HTMLSTRING=$(echo “$PARM” | cut -f 2- -d =)

eval “${VARNAME}_HTML=”$HTMLSTRING””

done

The above loop will assign two variables: each one named in the QUERY_STRING and the same variable name with “_HTML” appended. That way, the original string can be available as well as the translated HTML string. For example, an HTML document used SUBMIT to call a cgi script, and the QUERY_STRING looks like:

USERNAME=abc&COMPANY=%23company+name%23

The variable names are USERNAME and COMPANY. The & separates each variable assignment, + is for spaces and %23 is the # character. This is the result of the above code:

USERNAME = “abc”, USERNAME_HTML = “abc”

COMPANY = “#company name#”, COMPANY_HTML = “%23company+name%23”

– See more at: http://serviceitdirect.com/blog/translating-html-characters#sthash.gEhxVCZM.dpuf


Tags: